Log Auditing for fun and profit

Again I find myself in a postion where I am in need of full time work. I was able to sustain myself as a full time freelancer for 8 months (not too shabby!), but now it seems the market is drying up and while not for a lack of effort on my part to find sales people or to promote myself by basically bribing people with a 10% commission I’ve not been able to get enough business to sustain myself any longer. I’ll not go into any of the nasty business of clients who decided they didn’t feel like paying me, or clients that had me draw up proposals only to vanish into the ether – because this post is about fun stuff!

All that being said – I like to be clever. I like to use ingenuity to do basically what everyone else does but put a fancy little twist on it. Historically when someone is looking for a job, they will hit some job search sites like monster and dice and then send their resume to people – never knowing if it gets seen with human eyes, or ever gets any attention. Who knows? Does your resume even get read? If it does, how soon? Wouldnt it be nice to see the time correlation between when you sent your resume to someone and when they actually looked at it – or even if they looked at it at all?

I put my resumes in a public place – not publically linked, but I send the url to people directly – that way when someone goes to look at them I have records in my apache logs. For example, one quick grep command gives me these results: (notice I’m only grepping for December 8th and 9th)

grep resumes atenlabs.com.access.log | egrep ’08\/Dec|09\/Dec’ | egrep -i ‘pdf|doc’

75.212.202.71 – - [08/Dec/2008:15:32:51 -0800] “GET /resumes/dan-resume-2008.pdf HTTP/1.1″ 200 112865 “http://www.atenlabs.com/resumes/” “Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Q312461; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.2; MS-RTC LM 8)”
75.212.202.71 – - [08/Dec/2008:15:33:42 -0800] “GET /resumes/dan-resume-2008b.pdf HTTP/1.1″ 200 118460 “http://www.atenlabs.com/resumes/” “Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Q312461; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.2; MS-RTC LM 8)”
75.212.202.71 – - [08/Dec/2008:15:34:23 -0800] “GET /resumes/dan-resume-2008.pdf HTTP/1.1″ 304 – “http://www.atenlabs.com/resumes/” “Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Q312461; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.2; MS-RTC LM 8)”
75.212.202.71 – - [08/Dec/2008:15:35:16 -0800] “GET /resumes/dan-resume-2008-msword.doc HTTP/1.1″ 200 43008 “http://www.atenlabs.com/resumes/” “Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Q312461; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.2; MS-RTC LM 8)”
75.212.202.71 – - [08/Dec/2008:15:35:23 -0800] “HEAD /resumes/dan-resume-2008-msword.doc HTTP/1.1″ 200 – “-” “Microsoft Office Existence Discovery”
75.212.202.71 – - [08/Dec/2008:15:36:54 -0800] “GET /resumes/dan-resume-2008b.doc HTTP/1.1″ 200 31232 “http://www.atenlabs.com/resumes/” “Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Q312461; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.2; MS-RTC LM 8)”
75.212.202.71 – - [08/Dec/2008:15:36:58 -0800] “HEAD /resumes/dan-resume-2008b.doc HTTP/1.1″ 200 – “-” “Microsoft Office Existence Discovery”
64.128.15.194 – - [08/Dec/2008:18:50:52 -0800] “GET /resumes/dan-resume-2008-msword.doc HTTP/1.1″ 200 43008 “http://www.atenlabs.com/resumes/” “Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13″
64.128.15.194 – - [08/Dec/2008:19:15:04 -0800] “GET /resumes/dan-resume-2008.pdf HTTP/1.1″ 200 112865 “http://www.atenlabs.com/resumes/” “Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30; InfoPath.1; .NET CLR 3.0.04506.648)”
70.179.4.41 – - [08/Dec/2008:23:24:37 -0800] “GET /resumes/dan-resume-2008.pdf HTTP/1.1″ 200 112865 “http://www.atenlabs.com/resumes/” “Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 2.0.50727)”
70.179.4.41 – - [09/Dec/2008:00:15:28 -0800] “GET /resumes/dan-resume-2008.pdf HTTP/1.1″ 200 112865 “http://www.atenlabs.com/resumes/” “Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)”
67.202.54.191 – - [09/Dec/2008:04:42:00 -0800] “GET /resumes/dan-resume-2008-business.pdf HTTP/1.0″ 200 2330 “-” “ia_archiver (+http://www.alexa.com/site/help/webmasters; crawler@alexa.com)”
67.202.54.191 – - [09/Dec/2008:04:42:24 -0800] “GET /resumes/dan-resume-2008.pdf HTTP/1.0″ 200 112865 “-” “ia_archiver (+http://www.alexa.com/site/help/webmasters; crawler@alexa.com)

Interesting – I can see the dates and times of when people clicked on things in the /resumes directory. I can see that my resumes are getting crawled – which may or may not be a good thing – and I can see that the same users are viewing both my business resume and my technical resume.

Lets take this a step further..

grep resumes atenlabs.com.access.log | egrep ’08\/Dec|09\/Dec’ | cut -d” ” -f1 | sort -u

204.14.152.106
64.128.15.194
67.202.54.191
70.179.4.41
75.212.202.71
97.113.157.234

Awesome, I can see unique IPs that viewed my resume in the last two days – but .. who are they? We can find this out too:

for i in `grep resumes atenlabs.com.access.log | egrep ’08\/Dec|09\/Dec’ | cut -d” ” -f1 | sort -u`; do host $i; done

Host 106.152.14.204.in-addr.arpa. not found: 3(NXDOMAIN)
194.15.128.64.in-addr.arpa domain name pointer corp1.referentia.com.
191.54.202.67.in-addr.arpa domain name pointer ec2-67-202-54-191.compute-1.amazonaws.com.
41.4.179.70.in-addr.arpa domain name pointer ip70-179-4-41.sd.sd.cox.net.
71.202.212.75.in-addr.arpa domain name pointer 71.sub-75-212-202.myvzw.com.
234.157.113.97.in-addr.arpa domain name pointer 97-113-157-234.tukw.qwest.net.

Even better! I can see that Referentia, a company that had a very attractive posting has viewed my resume. Good! I sent them my resume TODAY (the 9th) and they viewed it today – perhaps this is a clue that my cover page is doing its job nicely! I can also see that some ‘home’ ip addresses have clicked on my resumes, qwest.net, which I don’t think exists in San Diego, and a myvzw address which is a verizon wireless connection (someone on a laptop, perhaps? Or tethered to a phone..). The ec2 amazon connection sort of worries me – why is an amazon ec2 instance touching my resume? Let’s find out some more info..

grep 67.202.54.191 atenlabs.com.access.log

67.202.54.191 – - [08/Dec/2008:04:18:24 -0800] “GET /robots.txt HTTP/1.0″ 200 36 “-” “ia_archiver (+http://www.alexa.com/site/help/webmasters; crawler@alexa.com)”
67.202.54.191 – - [08/Dec/2008:04:18:24 -0800] “GET /resumes/ HTTP/1.0″ 200 1281 “-” “ia_archiver (+http://www.alexa.com/site/help/webmasters; crawler@alexa.com)”
67.202.54.191 – - [08/Dec/2008:20:56:13 -0800] “GET /robots.txt HTTP/1.0″ 200 36 “-” “ia_archiver (+http://www.alexa.com/site/help/webmasters; crawler@alexa.com)”
67.202.54.191 – - [08/Dec/2008:20:56:14 -0800] “GET /resumes/?C=D;O=A HTTP/1.0″ 200 1691 “-” “ia_archiver (+http://www.alexa.com/site/help/webmasters; crawler@alexa.com)”
67.202.54.191 – - [08/Dec/2008:20:56:20 -0800] “GET /resumes/?C=M;O=A HTTP/1.0″ 200 1691 “-” “ia_archiver (+http://www.alexa.com/site/help/webmasters; crawler@alexa.com)”
67.202.54.191 – - [08/Dec/2008:20:56:26 -0800] “GET /resumes/?C=N;O=D HTTP/1.0″ 200 1691 “-” “ia_archiver (+http://www.alexa.com/site/help/webmasters; crawler@alexa.com)”
67.202.54.191 – - [08/Dec/2008:20:57:14 -0800] “GET /resumes/?C=S;O=A HTTP/1.0″ 200 1691 “-” “ia_archiver (+http://www.alexa.com/site/help/webmasters; crawler@alexa.com)”
67.202.54.191 – - [09/Dec/2008:04:42:00 -0800] “GET /resumes/dan-resume-2008-business.pdf HTTP/1.0″ 200 2330 “-” “ia_archiver (+http://www.alexa.com/site/help/webmasters; crawler@alexa.com)”
67.202.54.191 – - [09/Dec/2008:04:42:24 -0800] “GET /resumes/dan-resume-2008.pdf HTTP/1.0″ 200 112865 “-” “ia_archiver (+http://www.alexa.com/site/help/webmasters; crawler@alexa.com)”

Well thats worrysome – I have personal information in those resumes and I don’t want them to be spidered and put into some search engine, so I’ve gone ahead and added ‘ia_archiver’ to my robots.txt to disallow alexa from touching my resumes. This means that someone who I’ve given my link to has put it into some system. I’ll have to refine my practices more.

Using this methodology you can do things like create reports to see how many of the people you’ve sent your link out to have actually viewed your resume, how many people ignore it and other bits of information that you otherwise would never be able to see.

I plan on writing a little script that will report back how many unique ips have viewed my resume in “the last 5 minutes”, and how many total views there were total in the last five minute, then use that script to create a cacti graph – My current quandry is how to grep a log for “the last five minutes worth of hits”. Rest assured when I get my head wrapped around it, that graph will be added to http://home.thaumatocracy.com/work

Tags: , , , , , , , ,

Leave a Reply