[Novalug] Federal employees - download earnings and leave statement

Omari Norman omari at smileystation.com
Fri May 15 17:00:49 EDT 2009


On Fri, May 15, 2009 at 07:53:12AM -0500, Beartooth wrote:

>> to download them automatically. It uses zsh and curl to download the
>> earnings and leave statements in HTML form.
> 	[...] How widely are you willing to spread your script? I used
> 	to know a whole bunch of computer jocks at the Library of
> 	Congress -- most of whom are probably retired by now, and not
> 	all of whom ever ran linux; but if the script goes beyond
> 	Novalug, there's no telling where it'll stop ....

Hey, spread it 'round...anybody who finds it useful is welcome to it!

This experience has taught me how easy HTTP scripting is, thus
explaining all the spam that goes to HTTP forms and all the captchas.

One person pointed out that I didn't send the script with the first
message...that's just because I needed to change it a bit to remove my
login info from the script, so I wanted to make sure someone was
interested first :) So I'll attach it here.

-- 
Due to some violent content, viewer discretion is advised.
-------------- next part --------------
#!/usr/bin/zsh

# getel
#
# Downloads earnings and leave statements.
#
# This script downloads my earnings and leave statements from the
# USDA National Finance Center's Employee Personal Page. It
# downloads ALL currently available statements into a single
# directory. Then on subsequent runs it will only download
# statements that have not already been downloaded. As files are
# downloaded, emits noise on stderr to show them being downloaded;
# if nothing is downloaded, there is no output.
#
# If you are also paid through NFC, you might find this useful.
#
# This program is by Omari Norman, omari at smileystation.com. There is
# NO WARRANTY, use at your own risk, etc etc. I hereby place this
# script in the public domain. Do whatever you want with it. If you
# improve it and you want to let me know, great!
#
# Thanks to "The Art of HTTP Scripting," which comes with curl; it
# explains how to pull this sort of thing off. Script away all your
# mundane needs!
#
# PROGRAMS YOU WILL NEED
# zsh. I use version 4.3.6
# curl. I use version 7.18.2
# mktemp. I use version 1.5
#
# VARIABLES TO CONFIGURE
# You can enter values in this file or, as I now do, keep them in a
# separate file which will be sourced. These are stored unencrypted;
# if this bothers you, encrypt the file or modify this script so
# that you provide this info on the command line.

# LOGIN
login=

# PASSWORD
password=

# DESTINATION DIRECTORY
# The script downloads every available
# statement and puts it in the dest_dir, named like this:
# $year-$mm-$dd-pp-$pp.html
# where year-mm-dd is the first day of the pay period, and $pp is
# the pay period number.
dest_dir=

# CONFIG FILE
# If you want to keep the login and password in a
# separate file, name it here. getel will source this file if it
# exists. Values found in the file will supersede values given
# above.
config_file=$HOME/Budget/ledger/earnings_and_leave/credentials

# END CONFIGURATION SECTION
####################################################################

function error
{
    echo "getel: error: $1" 1>&2
    exit 1
}

TRAPZERR() {
    error "an unexpected error occurred! exiting."
}

if [[ -n $config_file && -e $config_file ]]; then
    . $config_file
fi

cookieFile=$(mktemp)
trap "rm $cookieFile" EXIT

[[ ! -d $dest_dir ]] && error "Destination directory does not exist."
[[ -z $login ]] && error "No login name given."
[[ -z $password ]] && error "No password given."
cd $dest_dir

setopt no_unset

opts=(--cookie $cookieFile --cookie-jar $cookieFile
      --location --silent --show-error --fail)
root='https://www.nfc.usda.gov/personal'

# curl by default sends the server response to stdout. Usually the
# script doesn't need it, so just send it to devnull.
exec 3>&1
exec 1>/dev/null

# The login sequence is rather strange. However, this sequence does
# work. For further documentation consult sample_pages/getel-test,
# (which Omari has on his hard drive) or just scrutinize the NFC web
# pages.  It appears this sequence is so bizarre because I am going
# straight to the warning page. Perhaps if you go straight to
# index2.asp (which is linked from the NFC home page) it might not
# be strange.  But since this currently works I will leave it as is.

# Get the first warning page. Necessary because this also sets
# cookie(s).
curl $opts "$root/ep_warning.asp"

# Submit the warning page
curl $opts "$root/ep_warning.asp?Accept=I+Agree"

# In a web browser you would not see this page. The page returned by
# the submit would use javascript to load this page, which (IIRC)
# runs more javascript.
curl $opts "$root/index2.asp"

# In a web browser you would not see this page. The page loaded by
# the previous bunch of javascript would open this page.
curl $opts "$root/index1.asp"

# In a web browser (at least, in Firefox) you will see this warning
# page, again, even after you submitted the first one. Yes, you will
# get two warning pages in a row. Submit it again.
curl $opts "$root/ep_warning.asp?Accept=I+Agree"

# Okay, after submitting the last page it will use some javascript
# to load this page. It is the actual login page.
curl $opts "$root/index2.asp"

# Submit login information
agent=$(curl --version | head -n 1)
curl $opts --data-urlencode "form_entry_browser=$agent" \
    --data-urlencode "form_entry_id=$login" \
    --data-urlencode "form_entry_pin=$password" \
    --data-urlencode 'submit_flag=N' \
    --data-urlencode "user_agent=$agent" \
    "$root/errck00.asp?from=index2"

# go to E&L page. Note trailing pipe so it goes to egrep.
curl $opts "$root/epp.asp?ep=v2" |

# get possible pay period numbers
# This pulls the "option" tags from the HTML that is used to build
# the pay period drop-down box on the E&L page (see
# sample_pages/08_epp.asp). It is fortuitous that this is formatted
# in such a way that is accessible to grep (e.g. with line breaks);
# otherwise I would need to run the HTML through a pretty printer
# first.
#
# The form sets the "value" to a code. The form itself is never
# submitted; instead the page uses Javascript to parse the code and
# then construct a URL. Instead of using Javascript, I do comparable
# parsing below.
#
# Formatting of the code:
#
# Digits Meaning
# 1-4    Year of start of pay period
# 5-6    Month of start of pay period (with leading 0 if needed)
# 7-8    Day of start of pay period (with leading 0 if needed)
# 9-10   Agency code (apparently GAO is 97)
# 11-12  Pay period number, with leading 0 if needed
# 13-    Sequence number. Simply starts at 1 for the first code in
#        the list, then increments for each code. There are NO
#        leading zeroes, so as you get more codes they get more
#        digits. This is not needed for the purposes of getel,
#        though apparently the NFC pages use it for other purposes
#
egrep 'option value="[0-9]{13,15}"' | while read code; do
    # eliminate everything but the numeric code
    code=${code#*\"}
    code=${code%\"*}

    date=${code[1,8]}
    year=${date[1,4]}
    month=${date[5,6]}
    day=${date[7,8]}
    agency=${code[9,10]}
    pp=${code[11,12]}
    
    filename="${year}-${month}-${day}-pp-${pp}.html"
    if [[ ! -e $filename ]]; then
        url="$root/epel_xls.asp?type=htm&ppdate=$date&agcy=$agency&pp=$pp"
        curl $opts $url > $filename
        echo "Downloaded $filename" 1>&2
    fi
done

# be nice, logout
curl $opts "$root/ep_logout.asp"


More information about the Novalug mailing list