Security

Block by User Agent

By 4th September 2014June 9th, 2017No Comments

Sometimes, your site may fall victim to an overly aggressive or problematic crawl bot. Blocking access to your server for these is very straightforward, with a simple edit to your domain's ___general/example.com.conf file

Eg. To block the Yandex crawl bot

if ($http_user_agent ~* "YandexBot") {
  return 403;
}

Alternatively, if you want to give a discrete message, rather than an outright block (perhaps to allow for humans to contact you if there is an error), then a rewrite would be more suitable

Eg. To redirect all requests to a static HTML page

if ($http_user_agent ~* "YandexBot") {
  rewrite .* /no-crawl-bots.html last;
}

Then just make a normal HTML file in /no-crawl-bots.html with whatever message you would like to pass to the affected user agents.