Parsing Access Logs for Bandwidth

The server access logs contain a wealth of information, including asset request size. Using this information - you can build some excellent reports to analyse traffic by domain, by URL, by country etc.

Some common examples have been listed below. They use $(date +%F) - which indicates today's date, but you can adjust to suit.

Traffic By Domain

cd /microcloud/logs_ro/lb1
zcat haproxy-$(date +%F)*.log.gz | ack-grep '/[0-9]+ [0-9]{3} ([0-9]+).+\{(.+)?\|([^\}]+)\}' --output='$3 $1' | tr '[A-Z]' '[a-z]' | sort | awk '{sums[$1] += $2} END { for (i in sums) printf("%s\t%s MB\n", i, sums[i]/1024**2)}' | sort -k2,2 -n | column -ts $'\t'

Traffic By URL

This will just give the top 50 size by URL (hence the | tail -n50)

cd /microcloud/logs_ro/lb1
zcat haproxy-$(date +%F)*.log.gz | ack-grep '/[0-9]+ [0-9]{3} ([0-9]+).+\{(.+)?\|([^\}]+)\} "[a-zA-Z]{1,5} ([^ ]+)' --output='$3$4 $1' | tr '[A-Z]' '[a-z]' | sort | awk '{sums[$1] += $2} END { for (i in sums) printf("%s\t%s MB\n", i, sums[i]/1024**2)}' | sort -k2,2 -n | tail -n50 | column -ts $'\t'

Traffic By File Extension

This will just give the top 50 size by extension (hence the | tail -n50)

cd /microcloud/logs_ro/lb1
zcat haproxy-$(date +%F)*.log.gz | ack-grep '/[0-9]+ [0-9]{3} ([0-9]+).+\{(.+)?\|([^\}]+)\} "[a-zA-Z]{1,5} /.+?\.([a-zA-Z0-9.]{2,})' --output='$4 $1' | tr '[A-Z]' '[a-z]' | sort | awk '{sums[$1] += $2} END { for (i in sums) printf("%s\t%s MB\n", i, sums[i]/1024**2)}' | sort -k2,2 -n | tail -n50 | column -ts $'\t'

Traffic By Country Code

This will just give the top 50 size by country code (hence the | tail -n50)

cd /microcloud/logs_ro/web1
zcat nginx-access-$(date +%F)*.log.gz | ack-grep '"[A-Z]+ /[^ ]+ HTTP[^"]+" [0-9]+ ([0-9]+).+- ([a-zA-Z0-9]+) - "[^"]+"$' --output='$2 $1' | tr '[a-z]' '[A-Z]' | sort | awk '{sums[$1] += $2} END { for (i in sums) printf("%s %s MB\n", i, sums[i]/1024**2)}' | sort -k2,2 -n | tail -n50 | column -ts $'\t'