### ********************************** ### THE ULTIMATE NGINX BAD BOT BLOCKER ### ********************************** ### VERSION INFORMATION # ################################################### ### Version: V3.2017.06.625 ### VERSION INFORMATION ## ### This file implements a checklist / blacklist for good user agents, bad user agents and ### bad referrers. It also has whitelisting for your own IP's and known good IP Ranges ### and also has rate limiting functionality for bad bots who you only want to rate limit ### and not actually block out entirely. It is powerful and also flexible. ### Created By: https://github.com/mitchellkrogza/ ### Repo Url: https://github.com/mitchellkrogza/nginx-ultimate-bad-bot-blocker ### Copyright Mitchell Krog - ### Tested on: nginx/1.10.3 (Ubuntu 16.04) ### This list was developed and is in use on a live Nginx server running some very busy web sites. ### It was built from the ground up using real data from daily logs and is updated almost daily. ### It has been extensively tested for false positives and all additions to the lists of bad user agents, ### spam referers, rogue IP address, scanners, scrapers and domain hijacking sites are extensively checked ### before they are added. It is monitored extensively for any false positives. ### ********* ### Features: ### ********* ### Clear formatting for Ease of Maintenance. ### Alphabetically ordered lists for Ease of Maintenance. ### Extensive Commenting for Ease of Reference. ### Extensive bad_bot list ### Extensive bad_referer list (please excuse the nasty words and domains) ### Simple regex patterns versus complicated messy regex patterns. ### Checks regardless of http / https urls or the lack of any protocol sent. ### IP range blocking / whitelisting. ### Rate Limiting Functions. ### *** PLEASE READ ALL INLINE NOTES ON TESTING !!!! ### I have this set up as an include in nginx.conf as ### Include /etc/nginx/conf.d/globalblacklist.conf ### This is loaded and available for any vhost to use in its config ### Each vhost then just needs the include file mentioned below for it to take effect. ### In Most cases your nginx.conf should already have an include statement as follows ### Include /etc/nginx/conf.d/* ### If that is the case then you can ignore the above include statement as Nginx will ### load anything in the conf.d folder and make it available to all sites. ### All you then need to do is use the include statements below in the server {} block of a vhost file for it to take effect. # server { # #Config stuff here # include /etc/nginx/bots.d/blockbots.conf # include /etc/nginx/bots.d/ddos.conf # #Other config stuff here # } ### Need I say, please don't just copy and paste this without reviewing what bots and ### referers are being blocked, you may want to exclude certain of them ### Also make SURE to whitelist your own IP's in the geo $bad_referer section. ### Know why you are using this or why you want to use it before you do, the implications ### are quite severe. ### *** PLEASE READ INLINE NOTES ON TESTING !!!! ### Note that: ### 0 = allowed - no limits ### 1 = allowed or rate limited less restrictive ### 2 = rate limited more ### 3 = block completely ### NEED I say do a "sudo nginx -t" to test the config is okay after adding these ### and if so then "sudo service nginx reload" for it to take effect. ### *** MAKE SURE TO ADD to your nginx.conf *** ### server_names_hash_bucket_size 64; ### server_names_hash_max_size 4096; ### limit_req_zone $binary_remote_addr zone=flood:50m rate=90r/s; ### limit_conn_zone $binary_remote_addr zone=addr:50m; ### to allow it to load this large set of domains into memory and to set the rate limiting zones for the DDOS filter. ### ADDING YOUR OWN BAD REFERERS ### Fork your own local copy and then ### Send a Pull Request by following the instructions in the Pull_Requests_Here_Please folder. # ********************************* # FIRST BLOCK BY USER-AGENT STRINGS # ********************************* # *************** # PLEASE TEST !!! # *************** # ALWAYS test any User-Agent Strings you add here to make sure you have it right # Use a Chrome Extension called "User-Agent Switcher for Chrome" where you can create your # own custom lists of User-Agents and test them easily against your rules below. # You can also use Curl to test user-agents as per example below # curl -I http://www.yourdomain.com -A "GoogleBot" << 200 OK # curl -I http://www.yourdomain.com -A "80legs" <<< 444 Dropped Connection # Here we also allow specific User Agents to come through that we want to allow # PLEASE NOTE: In all lists below I use Nginx case-insensitive matching ~* # This means regardless of how you type the word, upper or lowercase or mixed it will # be detected by Nginx Regex. Some Names are Capitalised simply for Ease of Reading. # Especially important for both Googlebot and googlebot to be allowed through no? # Now we map all good and bad user agents to a variable called $bad_bot map $http_user_agent $bad_bot { default 0; # *********************************************** # Include your Own Custom List of Bad User Agents # *********************************************** # use the include file below to further customize your own list of additional # user-agents you wish to permanently block # START BLACKLISTED USER AGENTS ### DO NOT EDIT THIS LINE AT ALL ### include /etc/nginx/bots.d/blacklist-user-agents.conf; # END BLACKLISTED USER AGENTS ### DO NOT EDIT THIS LINE AT ALL ### # *********************************************** # Allow Good User-Agent Strings We Know and Trust # *********************************************** # START GOOD BOTS ### DO NOT EDIT THIS LINE AT ALL ### # END GOOD BOTS ### DO NOT EDIT THIS LINE AT ALL ### # END GOOD BOTS ### DO NOT EDIT THIS LINE AT ALL ### # END GOOD BOTS ### DO NOT EDIT THIS LINE AT ALL ### # ************************************************** # User-Agent Strings Allowed Throug but Rate Limited # ************************************************** # Some people block libwww-perl, it us widely used in many valid (non rogue) agents # I allow libwww-perl as I use it for monitoring systems with Munin but it is rate limited # START ALLOWED BOTS ### DO NOT EDIT THIS LINE AT ALL ### # END ALLOWED BOTS ### DO NOT EDIT THIS LINE AT ALL ### # END ALLOWED BOTS ### DO NOT EDIT THIS LINE AT ALL ### # END ALLOWED BOTS ### DO NOT EDIT THIS LINE AT ALL ### # ************************************************************** # Rate Limited User-Agents who get a bit aggressive on bandwidth # ************************************************************** # START LIMITED BOTS ### DO NOT EDIT THIS LINE AT ALL ### # END LIMITED BOTS ### DO NOT EDIT THIS LINE AT ALL ### # END LIMITED BOTS ### DO NOT EDIT THIS LINE AT ALL ### # END LIMITED BOTS ### DO NOT EDIT THIS LINE AT ALL ### # ********************************************* # Bad User-Agent Strings That We Block Outright # ********************************************* # This includes: # Known Vulnerability Scanners (now merged into one section) # START BAD BOTS ### DO NOT EDIT THIS LINE AT ALL ### # END BAD BOTS ### DO NOT EDIT THIS LINE AT ALL ### # END BAD BOTS ### DO NOT EDIT THIS LINE AT ALL ### # END BAD BOTS ### DO NOT EDIT THIS LINE AT ALL ### } # **************************************** # SECOND BLOCK BY REFERER STRINGS AND URLS # **************************************** # Add here all referrer words and URL's that are to blocked. # ***************** # PLEASE TEST !!!! # ***************** # ALWAYS test referers that you add. This is done manually as follows # curl -I http://www.yourdomain.com -e http://anything.adcash.com # curl -I http://www.yourdomain.com -e http://www.goodwebsite.com/not-adcash # curl -I http://www.yourdomain.com -e http://www.betterwebsite.com/not/adcash # This uses curl to send the referer string to your site and you should see an immediate # 403 Forbidden Error or No Response at all if you use the 444 error like I do. # Because of case-insensitive matching any combination of capitilization in the names # will all produce a positive hit - make sure you always test thoroughly and monitor logs # This also does NOT check for a preceding www. nor does it check for it ending in .com # .net .org or any long string attached at the end. It also does not care if the referer # was sent with http https or even ftp. # REAL WORLD EXAMPLE # ******************* # If you were a photographer like me and say took a photo of a "girl" and you then posted # a blog showing everyone your new photo and your blog slug / permalink was # http://www.mysite.com/blog/photo-of-girl/ # You can go and monitor your logs and you will see lots of 444 from other pages on your # site that have been clicked on sending that page as a referer so in the example below # you will generate a 403 error. # curl --referer http://www.mysite.com/blog/photo-of-girl/ http://www.mysite.com/ # So please be careful with these and think carefully before you add new words. # Remember we are trying to keep out the general riff-raff not kill your web sites. # ********************************************************************** # Now we map all bad referer words below to a variable called $bad_words # ********************************************************************** map $http_referer $bad_words { default 0; # ************************* # Bad Referer Word Scanning # ************************* # These are Words and Terms often found tagged onto domains or within url query strings. # Create and Customize Your Own Bad Referrer Words Here using the new Include File Method # New Method Uses the include file below so that when pulling future updates your # customized list of bad referrer words are automatically now included for you # Read Comments inside bad-referrer-words.conf for customization tips. # Updating the main globalblacklist.conf file will not touch your custom include files # START CUSTOM BAD REFERRER WORDS ### DO NOT EDIT THIS LINE AT ALL ### include /etc/nginx/bots.d/bad-referrer-words.conf; # END CUSTOM BAD REFERRER WORDS ### DO NOT EDIT THIS LINE AT ALL ### } # ************************ # Bad Referer Domain Names # ************************ # Now a list of bad referer urls these domains or any combination of them ie .com .net # will be blocked out. Doesn't matter if the protocol is http, https or even ftp # This section includes: # ********************** # Blocking of SEO company Semalt.com (now merged into this one section) # MIRAI Botnet Domains Used for Mass Attacks # Other known bad SEO companies and Ad Hijacking Sites # Sites linked to malware, adware, clickjacking and ransomware # ***************** # PLEASE TEST !!!! # ***************** # ALWAYS test referers that you add. This is done manually as follows # curl -I http://www.yourdomain.com -e http://8gold.com # This uses curl to send the referer string to your site and you should see an immediate # 403 Forbidden Error or No Response at all if you use the 444 error like I do. # Because of case-insensitive matching any combination of capitilization # will all produce a positive hit - make sure you always test. # curl -I http://www.yourdomain.com -e http://NOT-8gold.com # curl -I http://www.yourdomain.com -e http://this.is.not8gOlD.net # curl -I http://www.yourdomain.com -e ftp://8gold.com # curl -I http://www.yourdomain.com -e ftp://www.weare8gold.NET # curl -I http://www.yourdomain.com -e https://subdomain.8gold.com # curl -I http://www.yourdomain.com -e https://NOT8GolD.org # This works exactly like the bad referer word lists above and is very strict !!! # I have gone for the simple stricter approach which blocks all variants for those # who just hop out and but another domain name. # So if you see a bad referer from wearegoogle.com and you want to block them just add # them as "~*wearegoogle.com" don't ever go and do something like "~*google(-|.)" you will # kill all your SEO in a week. Rather also send a Pull Request by following the instructions # in the Pull_Requests_Here_Please folder. # I also include any sites that hotlink images from my sites into the list below. # There are hundreds of image stealing sites out there so this list WILL grow now doubt. # *********************************************************************** # Now we map all good & bad referer urls to variable called #bad_referer # *********************************************************************** map $http_referer $bad_referer { hostnames; default 0; # ************************************ # GOOD REFERERS - Spared from Checking # ************************************ # Add all your own web site domain names and server names in this section # WHITELIST Your Own Domain Names Here using the Include File Method # New Method Uses the include file below so that when pulling future updates your # whitelisted domain names are automatically now included for you. # Read Comments inside whitelist-domains.conf for customization tips. # Updating the main globalblacklist.conf file will not touch your custom include files # START WHITELISTED DOMAINS ### DO NOT EDIT THIS LINE AT ALL ### include /etc/nginx/bots.d/whitelist-domains.conf; # END WHITELISTED DOMAINS ### DO NOT EDIT THIS LINE AT ALL ### # ******************************************* # CUSTOM BAD REFERERS - Add your Own # ******************************************* # Add any extra bad referers in the following include file to have them # permanently included and blocked - avoid duplicates in your custom file # START CUSTOM BAD REFERRERS ### DO NOT EDIT THIS LINE AT ALL ### include /etc/nginx/bots.d/custom-bad-referrers.conf; # END CUSTOM BAD REFERRERS ### DO NOT EDIT THIS LINE AT ALL ### # START BAD REFERERS ### DO NOT EDIT THIS LINE AT ALL ### # END BAD REFERERS ### DO NOT EDIT THIS LINE AT ALL ### # END BAD REFERERS ### DO NOT EDIT THIS LINE AT ALL ### # END BAD REFERERS ### DO NOT EDIT THIS LINE AT ALL ### } # *********************************************** # WHITELISTING AND BLACKLISTING IP ADDRESS RANGES # *********************************************** # Geo directive to deny and also whitelist certain ip addresses geo $validate_client { # ******************** # First Our Safety Net # ******************** # Anything not matching our rules is allowed through with default 0; default 0; # *********************************** # Whitelist all your OWN IP addresses # *********************************** # WHITELIST all your own IP addresses using the include file below. # New Method Uses the include file below so that when pulling future updates your # whitelisted IP addresses are automatically now included for you. # Read Comments inside whitelist-ips.conf for customization tips. # Updating the main globalblacklist.conf file will not touch your custom include files # START WHITELISTED IP RANGES ### DO NOT EDIT THIS LINE AT ALL ### include /etc/nginx/bots.d/whitelist-ips.conf; # END WHITELISTED IP RANGES ### DO NOT EDIT THIS LINE AT ALL ### # *********** # Google Bots # *********** # For Safety Sake Google's Known BOT IP Ranges are all white listed in case you add # anything lower down that you mistakenly picked up as a bad bot. # UA "AdsBot-Google (+http://www.google.com/adsbot.html)" # UA "DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; Googlebot-Mobile/2.3; +http://www.google.com/bot.html)" # UA "Googlebot-Image/1.0" # UA "Googlebot/2.1 (+http://www.google.com/bot.html)" # UA "Googlebot/2.1 (+http://www.googlebot.com/bot.html)" # UA "Googlebot/Test (+http://www.googlebot.com/bot.html)" # UA "Googlebot/Test" # UA "Mediapartners-Google/2.1 (+http://www.googlebot.com/bot.html)" # UA "Mediapartners-Google/2.1" # UA "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" # UA "SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)" # UA "gsa-crawler (Enterprise; S4-E9LJ2B82FJJAA; me@mycompany.com)" # START GOOGLE IP RANGES ### DO NOT EDIT THIS LINE AT ALL ### # END GOOGLE IP RANGES ### DO NOT EDIT THIS LINE AT ALL ### # END GOOGLE IP RANGES ### DO NOT EDIT THIS LINE AT ALL ### # END GOOGLE IP RANGES ### DO NOT EDIT THIS LINE AT ALL ### # ********* # Bing Bots # ********* # START BING IP RANGES ### DO NOT EDIT THIS LINE AT ALL ### # END BING IP RANGES ### DO NOT EDIT THIS LINE AT ALL ### # END BING IP RANGES ### DO NOT EDIT THIS LINE AT ALL ### # END BING IP RANGES ### DO NOT EDIT THIS LINE AT ALL ### # START CLOUDFLARE IP RANGES ### DO NOT EDIT THIS LINE AT ALL ### # END CLOUDFLARE IP RANGES ### DO NOT EDIT THIS LINE AT ALL ### # END CLOUDFLARE IP RANGES ### DO NOT EDIT THIS LINE AT ALL ### # END CLOUDFLARE IP RANGES ### DO NOT EDIT THIS LINE AT ALL ### # ************************* # Wordpress Theme Detectors # ************************* # START WP THEME DETECTORS ### DO NOT EDIT THIS LINE AT ALL ### # END WP THEME DETECTORS ### DO NOT EDIT THIS LINE AT ALL ### # END WP THEME DETECTORS ### DO NOT EDIT THIS LINE AT ALL ### # END WP THEME DETECTORS ### DO NOT EDIT THIS LINE AT ALL ### # **************************************** # NIBBLER - SEO testing and reporting tool # **************************************** # See - http://nibbler.silktide.com/ # START NIBBLER ### DO NOT EDIT THIS LINE AT ALL ### # END NIBBLER ### DO NOT EDIT THIS LINE AT ALL ### # END NIBBLER ### DO NOT EDIT THIS LINE AT ALL ### # END NIBBLER ### DO NOT EDIT THIS LINE AT ALL ### # **************************** # Known Bad IP's and IP Ranges # ************************************************* # Blacklist IP addresses and IP Ranges Customizable # ************************************************* # BLACKLIST all your IP addresses and Ranges using the new include file below. # New Method Uses the include file below so that when pulling future updates your # Custom Blacklisted IP addresses are automatically now included for you. # Read Comments inside blacklist-ips.conf for customization tips. # Updating the main globalblacklist.conf file will not touch your custom include files # START BLACKLISTED IPS ### DO NOT EDIT THIS LINE AT ALL ### include /etc/nginx/bots.d/blacklist-ips.conf; # END BLACKLISTED IPS ### DO NOT EDIT THIS LINE AT ALL ### } # Keep own IPs out of DDOS Filter # Add your own IP addresses and ranges below to spare them from the rate # limiting DDOS filter (one per line) # This now automatically includes the whitelist-ips.conf file so you only # need to edit that include file and have it include here too for you geo $ratelimited { default 1; # START WHITELISTED IP RANGES2 ### DO NOT EDIT THIS LINE AT ALL ### include /etc/nginx/bots.d/whitelist-ips.conf; # END WHITELISTED IP RANGES2 ### DO NOT EDIT THIS LINE AT ALL ### } # ***************************************** # MAP BAD BOTS TO OUR RATE LIMITER FUNCTION # ***************************************** map $bad_bot $bot_iplimit { 0 ""; 1 ""; 2 $binary_remote_addr; } # *********************** # SET RATE LIMITING ZONES # *********************** # BAD BOT RATE LIMITING ZONE # limits for Zone $bad_bot = 1 # Nothing Set - you can set a different zone limiter here if you like # We issue a 444 response instead to all bad bots. # limits for Zone $bad_bot = 2 # this rate limiting will only take effect if you change any of the bots and change # their block value from 1 to 2. limit_conn_zone $bot_iplimit zone=bot2_connlimit:16m; limit_req_zone $bot_iplimit zone=bot2_reqlimitip:16m rate=2r/s; ### *** MAKE SURE TO ADD to your nginx.conf *** ### server_names_hash_bucket_size 64; ### server_names_hash_max_size 4096; ### limit_req_zone $binary_remote_addr zone=flood:50m rate=90r/s; ### limit_conn_zone $binary_remote_addr zone=addr:50m; ### to allow it to load this large set of domains into memory and to set the rate limiting zones for the DDOS filter. ### THE END of the Long and Winding Road ### Also check out my Ultimate Apache Bad Bot Blocker on Github ### https://github.com/mitchellkrogza/apache-ultimate-bad-bot-blocker