Post Server Log Processing
I decided to make my own log processing script. The script works with WinXP’s command prompt. I realized I needed some of the powerful tools found on *NIX. Namely gzip and the grep command. Thankfully there exists Cygwin which is perhaps best described as a port of *NIX commands to Windows platform.
So starting with three raw logs from the servers (in a gzipped state), I concocted logscleaner.bat which would decompress the gzips, move the decompressed files, rename the logs to suitable names, and then start chunking the three big logs into month worth bite sized amounts. I also chunk those monthly logs into specifics such as 404’s (aka file not found), livejournal related views, etc. Along the way the logs will lose self-browsing (aka records of your own browsing of your websites), browsers favicon requests, requests for java code, search engine indexing, and frontpage extension related requests, webfonts, etc…
The final product are folders January thru December which will contain the following logs:
quepid.log (cleaned and dated with no 404’s livejournal image requests, self-browsing, java, favicons, webfonts, robots, search engine indexing)
linmu.log (cleaned and dated with no 404’s livejournal image requests, self-browsing, java, favicons, webfonts, robots, search engine indexing)
mesmeraj.log (cleaned and dated with no 404’s livejournal image requests, self-browsing, java, favicons, webfonts, robots, search engine indexing)
lj.quepid.log (all the livejournal requests of images)
lj.mesmeraj.log (all the livejournal requests of images)
404.quepid.log (a filtered list of ‘file not found’ occurences)
404.lifeisnotmadeup.log (a filtered list of ‘file not found’ occurences)
404.mesmeraj.log (a filtered list of ‘file not found’ occurences)
I like observing 404’s so I can quickly find bad links, missing or moved files, and potential hacking of my website.
Cleanup.bat basically deletes all the processed logs so you can run logscleaner.bat again.
–logscleaner.bat–
@echo Logs Cleaner by Robert Pace rob@robert-pace.com
@cd C:\wwwlogs\archive
@gzip -d *.gz
@copy *.* c:\wwwlogs
@ren accesslog_mesmeraj.robert-pace.com*. accesslog_mesmeraj.log
@ren accesslog_linmu.robert-pace.com*. accesslog_linmu.log
@ren accesslog_robert-pace.com*. accesslog_quepid.log
@cd C:\wwwlogs
@echo Removing self junk from main logs…
@grep -E -v -f c:\utility\mesmeraj.pat accesslog_mesmeraj.log > mesmeraj.log
@grep -E -v -f c:\utility\linmu.pat accesslog_linmu.log > linmu.log
@grep -E -v -f c:\utility\quepid.pat accesslog_quepid.log > quepid.log
@echo Removing ljpics and lj pics…
@grep -E -e’/images/ljpics/’ accesslog_quepid.log > lj.quepid.log
@grep -E -e’/images/lj’ accesslog_mesmeraj.log > lj.mesmeraj.log
@echo Removing 404s…
@grep -E -e’ 404′ accesslog_quepid.log > pre404.quepid.log
@grep -E -e’ 404′ accesslog_linmu.log > pre404.linmu.log
@grep -E -e’ 404′ accesslog_mesmeraj.log > pre404.mesmeraj.log
@grep -E -v -e’favicon’ -e’robots’ -e’/mesmeraj/’ -e’/linmu/’ pre404.quepid.log > 404.quepid.log
@grep -E -v -e’favicon’ -e’robots’ pre404.linmu.log > 404.linmu.log
@grep -E -v -e’favicon’ -e’robots’ -e’wwwmesmerajcom.png’ pre404.mesmeraj.log > 404.mesmeraj.log
@echo Dating linmu logs…
@grep -E -e’Jan/2005′ linmu.log > c:\wwwlogs\January\linmu.log
@grep -E -e’Feb/2005′ linmu.log > c:\wwwlogs\February\linmu.log
@grep -E -e’Mar/2005′ linmu.log > c:\wwwlogs\March\linmu.log
@grep -E -e’Apr/2005′ linmu.log > c:\wwwlogs\April\linmu.log
@grep -E -e’May/2005′ linmu.log > c:\wwwlogs\May\linmu.log
@grep -E -e’Jun/2005′ linmu.log > c:\wwwlogs\June\linmu.log
@grep -E -e’Jul/2005′ linmu.log > c:\wwwlogs\July\linmu.log
@grep -E -e’Aug/2005′ linmu.log > c:\wwwlogs\August\linmu.log
@grep -E -e’Sep/2005′ linmu.log > c:\wwwlogs\September\linmu.log
@grep -E -e’Oct/2005′ linmu.log > c:\wwwlogs\October\linmu.log
@grep -E -e’Nov/2005′ linmu.log > c:\wwwlogs\November\linmu.log
@grep -E -e’Dec/2005′ linmu.log > c:\wwwlogs\December\linmu.log
@echo Dating mesmeraj logs…
@grep -E -e’Jan/2005′ mesmeraj.log > c:\wwwlogs\January\mesmeraj.log
@grep -E -e’Feb/2005′ mesmeraj.log > c:\wwwlogs\February\mesmeraj.log
@grep -E -e’Mar/2005′ mesmeraj.log > c:\wwwlogs\March\mesmeraj.log
@grep -E -e’Apr/2005′ mesmeraj.log > c:\wwwlogs\April\mesmeraj.log
@grep -E -e’May/2005′ mesmeraj.log > c:\wwwlogs\May\mesmeraj.log
@grep -E -e’Jun/2005′ mesmeraj.log > c:\wwwlogs\June\mesmeraj.log
@grep -E -e’Jul/2005′ mesmeraj.log > c:\wwwlogs\July\mesmeraj.log
@grep -E -e’Aug/2005′ mesmeraj.log > c:\wwwlogs\August\mesmeraj.log
@grep -E -e’Sep/2005′ mesmeraj.log > c:\wwwlogs\September\mesmeraj.log
@grep -E -e’Oct/2005′ mesmeraj.log > c:\wwwlogs\October\mesmeraj.log
@grep -E -e’Nov/2005′ mesmeraj.log > c:\wwwlogs\November\mesmeraj.log
@grep -E -e’Dec/2005′ mesmeraj.log > c:\wwwlogs\December\mesmeraj.log
@echo Dating quepid logs…
@grep -E -e’Jan/2005′ quepid.log > c:\wwwlogs\January\quepid.log
@grep -E -e’Feb/2005′ quepid.log > c:\wwwlogs\February\quepid.log
@grep -E -e’Mar/2005′ quepid.log > c:\wwwlogs\March\quepid.log
@grep -E -e’Apr/2005′ quepid.log > c:\wwwlogs\April\quepid.log
@grep -E -e’May/2005′ quepid.log > c:\wwwlogs\May\quepid.log
@grep -E -e’Jun/2005′ quepid.log > c:\wwwlogs\June\quepid.log
@grep -E -e’Jul/2005′ quepid.log > c:\wwwlogs\July\quepid.log
@grep -E -e’Aug/2005′ quepid.log > c:\wwwlogs\August\quepid.log
@grep -E -e’Sep/2005′ quepid.log > c:\wwwlogs\September\quepid.log
@grep -E -e’Oct/2005′ quepid.log > c:\wwwlogs\October\quepid.log
@grep -E -e’Nov/2005′ quepid.log > c:\wwwlogs\November\quepid.log
@grep -E -e’Dec/2005′ quepid.log > c:\wwwlogs\December\quepid.log
@echo Dating ljpics logs…
@grep -E -e’Jan/2005′ lj.quepid.log > c:\wwwlogs\January\lj.quepid.log
@grep -E -e’Feb/2005′ lj.quepid.log > c:\wwwlogs\February\lj.quepid.log
@grep -E -e’Mar/2005′ lj.quepid.log > c:\wwwlogs\March\lj.quepid.log
@grep -E -e’Apr/2005′ lj.quepid.log > c:\wwwlogs\April\lj.quepid.log
@grep -E -e’May/2005′ lj.quepid.log > c:\wwwlogs\May\lj.quepid.log
@grep -E -e’Jun/2005′ lj.quepid.log > c:\wwwlogs\June\lj.quepid.log
@grep -E -e’Jul/2005′ lj.quepid.log > c:\wwwlogs\July\lj.quepid.log
@grep -E -e’Aug/2005′ lj.quepid.log > c:\wwwlogs\August\lj.quepid.log
@grep -E -e’Sep/2005′ lj.quepid.log > c:\wwwlogs\September\lj.quepid.log
@grep -E -e’Oct/2005′ lj.quepid.log > c:\wwwlogs\October\lj.quepid.log
@grep -E -e’Nov/2005′ lj.quepid.log > c:\wwwlogs\November\lj.quepid.log
@grep -E -e’Dec/2005′ lj.quepid.log > c:\wwwlogs\December\lj.quepid.log
@echo Dating lj pics logs…
@grep -E -e’Jan/2005′ lj.mesmeraj.log > c:\wwwlogs\January\lj.mesmeraj.log
@grep -E -e’Feb/2005′ lj.mesmeraj.log > c:\wwwlogs\February\lj.mesmeraj.log
@grep -E -e’Mar/2005′ lj.mesmeraj.log > c:\wwwlogs\March\lj.mesmeraj.log
@grep -E -e’Apr/2005′ lj.mesmeraj.log > c:\wwwlogs\April\lj.mesmeraj.log
@grep -E -e’May/2005′ lj.mesmeraj.log > c:\wwwlogs\May\lj.mesmeraj.log
@grep -E -e’Jun/2005′ lj.mesmeraj.log > c:\wwwlogs\June\lj.mesmeraj.log
@grep -E -e’Jul/2005′ lj.mesmeraj.log > c:\wwwlogs\July\lj.mesmeraj.log
@grep -E -e’Aug/2005′ lj.mesmeraj.log > c:\wwwlogs\August\lj.mesmeraj.log
@grep -E -e’Sep/2005′ lj.mesmeraj.log > c:\wwwlogs\September\lj.mesmeraj.log
@grep -E -e’Oct/2005′ lj.mesmeraj.log > c:\wwwlogs\October\lj.mesmeraj.log
@grep -E -e’Nov/2005′ lj.mesmeraj.log > c:\wwwlogs\November\lj.mesmeraj.log
@grep -E -e’Dec/2005′ lj.mesmeraj.log > c:\wwwlogs\December\lj.mesmeraj.log
@echo Dating quepid 404s
@grep -E -e’Jan/2005′ 404.quepid.log > c:\wwwlogs\January\404.quepid.log
@grep -E -e’Feb/2005′ 404.quepid.log > c:\wwwlogs\February\404.quepid.log
@grep -E -e’Mar/2005′ 404.quepid.log > c:\wwwlogs\March\404.quepid.log
@grep -E -e’Apr/2005′ 404.quepid.log > c:\wwwlogs\April\404.quepid.log
@grep -E -e’May/2005′ 404.quepid.log > c:\wwwlogs\May\404.quepid.log
@grep -E -e’Jun/2005′ 404.quepid.log > c:\wwwlogs\June\404.quepid.log
@grep -E -e’Jul/2005′ 404.quepid.log > c:\wwwlogs\July\404.quepid.log
@grep -E -e’Aug/2005′ 404.quepid.log > c:\wwwlogs\August\404.quepid.log
@grep -E -e’Sep/2005′ 404.quepid.log > c:\wwwlogs\September\404.quepid.log
@grep -E -e’Oct/2005′ 404.quepid.log > c:\wwwlogs\October\404.quepid.log
@grep -E -e’Nov/2005′ 404.quepid.log > c:\wwwlogs\November\404.quepid.log
@grep -E -e’Dec/2005′ 404.quepid.log > c:\wwwlogs\December\404.quepid.log
@echo Dating linmu 404s
@grep -E -e’Jan/2005′ 404.linmu.log > c:\wwwlogs\January\404.linmu.log
@grep -E -e’Feb/2005′ 404.linmu.log > c:\wwwlogs\February\404.linmu.log
@grep -E -e’Mar/2005′ 404.linmu.log > c:\wwwlogs\March\404.linmu.log
@grep -E -e’Apr/2005′ 404.linmu.log > c:\wwwlogs\April\404.linmu.log
@grep -E -e’May/2005′ 404.linmu.log > c:\wwwlogs\May\404.linmu.log
@grep -E -e’Jun/2005′ 404.linmu.log > c:\wwwlogs\June\404.linmu.log
@grep -E -e’Jul/2005′ 404.linmu.log > c:\wwwlogs\July\404.linmu.log
@grep -E -e’Aug/2005′ 404.linmu.log > c:\wwwlogs\August\404.linmu.log
@grep -E -e’Sep/2005′ 404.linmu.log > c:\wwwlogs\September\404.linmu.log
@grep -E -e’Oct/2005′ 404.linmu.log > c:\wwwlogs\October\404.linmu.log
@grep -E -e’Nov/2005′ 404.linmu.log > c:\wwwlogs\November\404.linmu.log
@grep -E -e’Dec/2005′ 404.linmu.log > c:\wwwlogs\December\404.linmu.log
@echo Dating mesmeraj 404s
@grep -E -e’Jan/2005′ 404.mesmeraj.log > c:\wwwlogs\January\404.mesmeraj.log
@grep -E -e’Feb/2005′ 404.mesmeraj.log > c:\wwwlogs\February\404.mesmeraj.log
@grep -E -e’Mar/2005′ 404.mesmeraj.log > c:\wwwlogs\March\404.mesmeraj.log
@grep -E -e’Apr/2005′ 404.mesmeraj.log > c:\wwwlogs\April\404.mesmeraj.log
@grep -E -e’May/2005′ 404.mesmeraj.log > c:\wwwlogs\May\404.mesmeraj.log
@grep -E -e’Jun/2005′ 404.mesmeraj.log > c:\wwwlogs\June\404.mesmeraj.log
@grep -E -e’Jul/2005′ 404.mesmeraj.log > c:\wwwlogs\July\404.mesmeraj.log
@grep -E -e’Aug/2005′ 404.mesmeraj.log > c:\wwwlogs\August\404.mesmeraj.log
@grep -E -e’Sep/2005′ 404.mesmeraj.log > c:\wwwlogs\September\404.mesmeraj.log
@grep -E -e’Oct/2005′ 404.mesmeraj.log > c:\wwwlogs\October\404.mesmeraj.log
@grep -E -e’Nov/2005′ 404.mesmeraj.log > c:\wwwlogs\November\404.mesmeraj.log
@grep -E -e’Dec/2005′ 404.mesmeraj.log > c:\wwwlogs\December\404.mesmeraj.log
@Echo Cleaning Root…
@del 404.linmu.log
@del 404.mesmeraj.log
@del 404.quepid.log
@del lj.quepid.log
@del lj.mesmeraj.log
@del linmu.log
@del mesmeraj.log
@del quepid.log
@del pre404.linmu.log
@del pre404.mesmeraj.log
@del pre404.quepid.log
@echo Finished!
–cleanup.bat–
@cd C:\wwwlogs
@echo Cleaning Dated Folders…
@cd C:\wwwlogs\January
@del *.log
@cd ..
@cd C:\wwwlogs\February
@del *.log
@cd ..
@cd C:\wwwlogs\March
@del *.log
@cd ..
@cd C:\wwwlogs\April
@del *.log
@cd ..
@cd C:\wwwlogs\May
@del *.log
@cd ..
@cd C:\wwwlogs\June
@del *.log
@cd ..
@cd C:\wwwlogs\July
@del *.log
@cd ..
@cd C:\wwwlogs\August
@del *.log
@cd ..
@cd C:\wwwlogs\September
@del *.log
@cd ..
@cd C:\wwwlogs\October
@del *.log
@cd ..
@cd C:\wwwlogs\November
@del *.log
@cd ..
@cd C:\wwwlogs\December
@del *.log
@cd ..
@Echo Finished Cleaning Up!
*note: the files mesmeraj.pat, linmu.pat, and quepid.pat contain the pattern matches that you want excluded (stripped) from your logs. These files must be a *NIX formatted text (aka linefeeds at the end of each line not carriage returns and linefeeds).