Identifying and blocking "Bad" Magento traffic

There's a number of common attacks on Magento stores; that can range from aggressive crawlers/bots, to XSS attacks, to severe SQL/compromise attacks. There are two approaches to dealing with attacks of this nature,

  1. Automated - Through the use of a WAF
  2. Manual - By traversing log files for patterns and blocking said patterns/sources as necessary

Below is a list of common attacks and how to identify them by reviewing your log files. This list is by no means definitive, but a good starting point into investigating and understanding who is trying to access your Magento store.

In almost all cases, a proper WAF (web application firewall) is a suitable first line of defense, whereas the suggestions made below are geared towards those without a WAF (or expertise to implement one). MageStack comes as standard with a 3-tier firewall, including a stateless edge firewall, a IPS/IDS L3 firewall and an intelligent learning L7 WAF.

Magento Connect Config Flush Attack

Issue: Accessing the downloader clears the configuration cache
Releases: Older releases
Identification pattern: grep -i "/downloader/index.php?A=" LOG_FILE

nginx-access-2015-04-09-1426086002.log.gz:2015-04-09 00:03:21 UTC example.com - www.example.com - 93.115.92.169 - - [09/Apr/2015:01:03:21 +0100]  "POST /downloader/index.php?A=loggedin HTTP/1.1" 302 5 "https://www.example.com/downloader/index.php?A=loggedin" "x22Mozilla/5.0 (Windows NT 6.1; rv:24.0) Gecko/20100101 Firefox/24.0x22" - Dynamic - Admin - 0.291 - A1 - "-|-|-"

Resolution: Implement web server rules to provide protection for downloader

See Protecting magento admin and downloader

SQL Injection

Issue: SQL injection can lead to total site compromise through changing of Admin password, which then can lead to file/module upload and remote code execution
Releases: Older releases, vulnerable Magento extensions, other vulnerable applications
Identification pattern: >grep -Ei "?.+((SELECT|UPDATE|DELETE).*(FROM|JOIN)|DECLARE.+@|UNION.+SELECT|GROUP_CONCAT|INFORMATION_SCHEMA)" LOG_FILE

nginx-access-2015-03-29.log.gz:2015-03-29 09:31:20 UTC example.com - www.example.com - 194.6.233.33 - - [29/Mar/2015:10:31:20 +0100]  "GET /index.php?option=com_aist&view=vacancylist&contact_id=1+union+select+1,2,3,4,group_concat(username,0x3a,password,0x3a,usertype),6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36+from+jos_users-- HTTP/1.1" 200 6218 "-" "Mozilla/5.0 (Windows NT 6.0) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/23.0.1271.6 Safari/537.11" - Dynamic - Frontend - 0.273 - UA - "-|-|-"

Resolution: Implement a web server rule to block/discard bad traffic.

if ($args ~* "((SELECT|UPDATE|DELETE).*(FROM|JOIN)|DECLARE.+@|UNION.+SELECT|GROUP_CONCAT|INFORMATION_SCHEMA)") {
  return 403;
}

Bad Bots/Crawlers/Scrapers

Issue: Bots/crawlers can consume valuable server resources, leading to reduced performance and possible instability. Genuine crawl bots (such as Google/Bing/Yahoo) should be allowed unrestricted access, but other information gathering bots provide less value. There are a myriad of these identifiable by either IP address, user agent or behaviour (ie. request rate/type).
Releases: All releases
Identification pattern: grep -Ei "TwengaBot|Mail.RU_Bot|Sogou web spider|360Spider|RavenCrawler|Baiduspider|NostoCrawlerBot|MJ12bot|Majestic|Yandex|AhrefsBot|PaperLiBot" LOG_FILE

web1/nginx-access-2015-01-15.log.gz:2015-01-15 00:11:57 UTC example.com - www.example.com - 220.181.125.197 - - [15/Jan/2015:00:11:57 +0000]  "GET /index.php HTTP/1.1" 200 10018 "-" "Sogou web spider/4.0(+https://www.sogou.com/docs/help/webmasters.htm#07)" - Dynamic - Frontend - 1.592 - CN - "-|-|-"

Resolution: Implement a web server rule to block/discard or rate limit bad traffic.

See Rate limiting custom requests