nginx-ultimate-bad-bot-blocker/travisCI/globalblacklist.template
2017-06-28 15:35:23 +02:00

482 lines
20 KiB
Text
Executable file

### **********************************
### THE ULTIMATE NGINX BAD BOT BLOCKER
### **********************************
### VERSION INFORMATION #
###################################################
### Version: V3.2017.06.625
### VERSION INFORMATION ##
### This file implements a checklist / blacklist for good user agents, bad user agents and
### bad referrers. It also has whitelisting for your own IP's and known good IP Ranges
### and also has rate limiting functionality for bad bots who you only want to rate limit
### and not actually block out entirely. It is powerful and also flexible.
### Created By: https://github.com/mitchellkrogza/
### Repo Url: https://github.com/mitchellkrogza/nginx-ultimate-bad-bot-blocker
### Copyright Mitchell Krog - <mitchellkrog@gmail.com>
### Tested on: nginx/1.10.3 (Ubuntu 16.04)
### This list was developed and is in use on a live Nginx server running some very busy web sites.
### It was built from the ground up using real data from daily logs and is updated almost daily.
### It has been extensively tested for false positives and all additions to the lists of bad user agents,
### spam referers, rogue IP address, scanners, scrapers and domain hijacking sites are extensively checked
### before they are added. It is monitored extensively for any false positives.
### *********
### Features:
### *********
### Clear formatting for Ease of Maintenance.
### Alphabetically ordered lists for Ease of Maintenance.
### Extensive Commenting for Ease of Reference.
### Extensive bad_bot list
### Extensive bad_referer list (please excuse the nasty words and domains)
### Simple regex patterns versus complicated messy regex patterns.
### Checks regardless of http / https urls or the lack of any protocol sent.
### IP range blocking / whitelisting.
### Rate Limiting Functions.
### *** PLEASE READ ALL INLINE NOTES ON TESTING !!!!
### I have this set up as an include in nginx.conf as
### Include /etc/nginx/conf.d/globalblacklist.conf
### This is loaded and available for any vhost to use in its config
### Each vhost then just needs the include file mentioned below for it to take effect.
### In Most cases your nginx.conf should already have an include statement as follows
### Include /etc/nginx/conf.d/*
### If that is the case then you can ignore the above include statement as Nginx will
### load anything in the conf.d folder and make it available to all sites.
### All you then need to do is use the include statements below in the server {} block of a vhost file for it to take effect.
# server {
# #Config stuff here
# include /etc/nginx/bots.d/blockbots.conf
# include /etc/nginx/bots.d/ddos.conf
# #Other config stuff here
# }
### Need I say, please don't just copy and paste this without reviewing what bots and
### referers are being blocked, you may want to exclude certain of them
### Also make SURE to whitelist your own IP's in the geo $bad_referer section.
### Know why you are using this or why you want to use it before you do, the implications
### are quite severe.
### *** PLEASE READ INLINE NOTES ON TESTING !!!!
### Note that:
### 0 = allowed - no limits
### 1 = allowed or rate limited less restrictive
### 2 = rate limited more
### 3 = block completely
### NEED I say do a "sudo nginx -t" to test the config is okay after adding these
### and if so then "sudo service nginx reload" for it to take effect.
### *** MAKE SURE TO ADD to your nginx.conf ***
### server_names_hash_bucket_size 64;
### server_names_hash_max_size 4096;
### limit_req_zone $binary_remote_addr zone=flood:50m rate=90r/s;
### limit_conn_zone $binary_remote_addr zone=addr:50m;
### to allow it to load this large set of domains into memory and to set the rate limiting zones for the DDOS filter.
### ADDING YOUR OWN BAD REFERERS
### Fork your own local copy and then
### Send a Pull Request by following the instructions in the Pull_Requests_Here_Please folder.
# *********************************
# FIRST BLOCK BY USER-AGENT STRINGS
# *********************************
# ***************
# PLEASE TEST !!!
# ***************
# ALWAYS test any User-Agent Strings you add here to make sure you have it right
# Use a Chrome Extension called "User-Agent Switcher for Chrome" where you can create your
# own custom lists of User-Agents and test them easily against your rules below.
# You can also use Curl to test user-agents as per example below
# curl -I http://www.yourdomain.com -A "GoogleBot" << 200 OK
# curl -I http://www.yourdomain.com -A "80legs" <<< 444 Dropped Connection
# Here we also allow specific User Agents to come through that we want to allow
# PLEASE NOTE: In all lists below I use Nginx case-insensitive matching ~*
# This means regardless of how you type the word, upper or lowercase or mixed it will
# be detected by Nginx Regex. Some Names are Capitalised simply for Ease of Reading.
# Especially important for both Googlebot and googlebot to be allowed through no?
# Now we map all good and bad user agents to a variable called $bad_bot
map $http_user_agent $bad_bot {
default 0;
# ***********************************************
# Include your Own Custom List of Bad User Agents
# ***********************************************
# use the include file below to further customize your own list of additional
# user-agents you wish to permanently block
# START BLACKLISTED USER AGENTS ### DO NOT EDIT THIS LINE AT ALL ###
include /etc/nginx/bots.d/blacklist-user-agents.conf;
# END BLACKLISTED USER AGENTS ### DO NOT EDIT THIS LINE AT ALL ###
# ***********************************************
# Allow Good User-Agent Strings We Know and Trust
# ***********************************************
# START GOOD BOTS ### DO NOT EDIT THIS LINE AT ALL ###
# END GOOD BOTS ### DO NOT EDIT THIS LINE AT ALL ###
# END GOOD BOTS ### DO NOT EDIT THIS LINE AT ALL ###
# END GOOD BOTS ### DO NOT EDIT THIS LINE AT ALL ###
# **************************************************
# User-Agent Strings Allowed Throug but Rate Limited
# **************************************************
# Some people block libwww-perl, it us widely used in many valid (non rogue) agents
# I allow libwww-perl as I use it for monitoring systems with Munin but it is rate limited
# START ALLOWED BOTS ### DO NOT EDIT THIS LINE AT ALL ###
# END ALLOWED BOTS ### DO NOT EDIT THIS LINE AT ALL ###
# END ALLOWED BOTS ### DO NOT EDIT THIS LINE AT ALL ###
# END ALLOWED BOTS ### DO NOT EDIT THIS LINE AT ALL ###
# **************************************************************
# Rate Limited User-Agents who get a bit aggressive on bandwidth
# **************************************************************
# START LIMITED BOTS ### DO NOT EDIT THIS LINE AT ALL ###
# END LIMITED BOTS ### DO NOT EDIT THIS LINE AT ALL ###
# END LIMITED BOTS ### DO NOT EDIT THIS LINE AT ALL ###
# END LIMITED BOTS ### DO NOT EDIT THIS LINE AT ALL ###
# *********************************************
# Bad User-Agent Strings That We Block Outright
# *********************************************
# This includes:
# Known Vulnerability Scanners (now merged into one section)
# START BAD BOTS ### DO NOT EDIT THIS LINE AT ALL ###
# END BAD BOTS ### DO NOT EDIT THIS LINE AT ALL ###
# END BAD BOTS ### DO NOT EDIT THIS LINE AT ALL ###
# END BAD BOTS ### DO NOT EDIT THIS LINE AT ALL ###
}
# ****************************************
# SECOND BLOCK BY REFERER STRINGS AND URLS
# ****************************************
# Add here all referrer words and URL's that are to blocked.
# *****************
# PLEASE TEST !!!!
# *****************
# ALWAYS test referers that you add. This is done manually as follows
# curl -I http://www.yourdomain.com -e http://anything.adcash.com
# curl -I http://www.yourdomain.com -e http://www.goodwebsite.com/not-adcash
# curl -I http://www.yourdomain.com -e http://www.betterwebsite.com/not/adcash
# This uses curl to send the referer string to your site and you should see an immediate
# 403 Forbidden Error or No Response at all if you use the 444 error like I do.
# Because of case-insensitive matching any combination of capitilization in the names
# will all produce a positive hit - make sure you always test thoroughly and monitor logs
# This also does NOT check for a preceding www. nor does it check for it ending in .com
# .net .org or any long string attached at the end. It also does not care if the referer
# was sent with http https or even ftp.
# REAL WORLD EXAMPLE
# *******************
# If you were a photographer like me and say took a photo of a "girl" and you then posted
# a blog showing everyone your new photo and your blog slug / permalink was
# http://www.mysite.com/blog/photo-of-girl/
# You can go and monitor your logs and you will see lots of 444 from other pages on your
# site that have been clicked on sending that page as a referer so in the example below
# you will generate a 403 error.
# curl --referer http://www.mysite.com/blog/photo-of-girl/ http://www.mysite.com/
# So please be careful with these and think carefully before you add new words.
# Remember we are trying to keep out the general riff-raff not kill your web sites.
# **********************************************************************
# Now we map all bad referer words below to a variable called $bad_words
# **********************************************************************
map $http_referer $bad_words {
default 0;
# *************************
# Bad Referer Word Scanning
# *************************
# These are Words and Terms often found tagged onto domains or within url query strings.
# Create and Customize Your Own Bad Referrer Words Here using the new Include File Method
# New Method Uses the include file below so that when pulling future updates your
# customized list of bad referrer words are automatically now included for you
# Read Comments inside bad-referrer-words.conf for customization tips.
# Updating the main globalblacklist.conf file will not touch your custom include files
# START CUSTOM BAD REFERRER WORDS ### DO NOT EDIT THIS LINE AT ALL ###
include /etc/nginx/bots.d/bad-referrer-words.conf;
# END CUSTOM BAD REFERRER WORDS ### DO NOT EDIT THIS LINE AT ALL ###
}
# ************************
# Bad Referer Domain Names
# ************************
# Now a list of bad referer urls these domains or any combination of them ie .com .net
# will be blocked out. Doesn't matter if the protocol is http, https or even ftp
# This section includes:
# **********************
# Blocking of SEO company Semalt.com (now merged into this one section)
# MIRAI Botnet Domains Used for Mass Attacks
# Other known bad SEO companies and Ad Hijacking Sites
# Sites linked to malware, adware, clickjacking and ransomware
# *****************
# PLEASE TEST !!!!
# *****************
# ALWAYS test referers that you add. This is done manually as follows
# curl -I http://www.yourdomain.com -e http://8gold.com
# This uses curl to send the referer string to your site and you should see an immediate
# 403 Forbidden Error or No Response at all if you use the 444 error like I do.
# Because of case-insensitive matching any combination of capitilization
# will all produce a positive hit - make sure you always test.
# curl -I http://www.yourdomain.com -e http://NOT-8gold.com
# curl -I http://www.yourdomain.com -e http://this.is.not8gOlD.net
# curl -I http://www.yourdomain.com -e ftp://8gold.com
# curl -I http://www.yourdomain.com -e ftp://www.weare8gold.NET
# curl -I http://www.yourdomain.com -e https://subdomain.8gold.com
# curl -I http://www.yourdomain.com -e https://NOT8GolD.org
# This works exactly like the bad referer word lists above and is very strict !!!
# I have gone for the simple stricter approach which blocks all variants for those
# who just hop out and but another domain name.
# So if you see a bad referer from wearegoogle.com and you want to block them just add
# them as "~*wearegoogle.com" don't ever go and do something like "~*google(-|.)" you will
# kill all your SEO in a week. Rather also send a Pull Request by following the instructions
# in the Pull_Requests_Here_Please folder.
# I also include any sites that hotlink images from my sites into the list below.
# There are hundreds of image stealing sites out there so this list WILL grow now doubt.
# ***********************************************************************
# Now we map all good & bad referer urls to variable called #bad_referer
# ***********************************************************************
map $http_referer $bad_referer {
hostnames;
default 0;
# ************************************
# GOOD REFERERS - Spared from Checking
# ************************************
# Add all your own web site domain names and server names in this section
# WHITELIST Your Own Domain Names Here using the Include File Method
# New Method Uses the include file below so that when pulling future updates your
# whitelisted domain names are automatically now included for you.
# Read Comments inside whitelist-domains.conf for customization tips.
# Updating the main globalblacklist.conf file will not touch your custom include files
# START WHITELISTED DOMAINS ### DO NOT EDIT THIS LINE AT ALL ###
include /etc/nginx/bots.d/whitelist-domains.conf;
# END WHITELISTED DOMAINS ### DO NOT EDIT THIS LINE AT ALL ###
# *******************************************
# CUSTOM BAD REFERERS - Add your Own
# *******************************************
# Add any extra bad referers in the following include file to have them
# permanently included and blocked - avoid duplicates in your custom file
# START CUSTOM BAD REFERRERS ### DO NOT EDIT THIS LINE AT ALL ###
include /etc/nginx/bots.d/custom-bad-referrers.conf;
# END CUSTOM BAD REFERRERS ### DO NOT EDIT THIS LINE AT ALL ###
# START BAD REFERERS ### DO NOT EDIT THIS LINE AT ALL ###
# END BAD REFERERS ### DO NOT EDIT THIS LINE AT ALL ###
# END BAD REFERERS ### DO NOT EDIT THIS LINE AT ALL ###
# END BAD REFERERS ### DO NOT EDIT THIS LINE AT ALL ###
}
# ***********************************************
# WHITELISTING AND BLACKLISTING IP ADDRESS RANGES
# ***********************************************
# Geo directive to deny and also whitelist certain ip addresses
geo $validate_client {
# ********************
# First Our Safety Net
# ********************
# Anything not matching our rules is allowed through with default 0;
default 0;
# ***********************************
# Whitelist all your OWN IP addresses
# ***********************************
# WHITELIST all your own IP addresses using the include file below.
# New Method Uses the include file below so that when pulling future updates your
# whitelisted IP addresses are automatically now included for you.
# Read Comments inside whitelist-ips.conf for customization tips.
# Updating the main globalblacklist.conf file will not touch your custom include files
# START WHITELISTED IP RANGES ### DO NOT EDIT THIS LINE AT ALL ###
include /etc/nginx/bots.d/whitelist-ips.conf;
# END WHITELISTED IP RANGES ### DO NOT EDIT THIS LINE AT ALL ###
# ***********
# Google Bots
# ***********
# For Safety Sake Google's Known BOT IP Ranges are all white listed in case you add
# anything lower down that you mistakenly picked up as a bad bot.
# UA "AdsBot-Google (+http://www.google.com/adsbot.html)"
# UA "DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; Googlebot-Mobile/2.3; +http://www.google.com/bot.html)"
# UA "Googlebot-Image/1.0"
# UA "Googlebot/2.1 (+http://www.google.com/bot.html)"
# UA "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"
# UA "Googlebot/Test (+http://www.googlebot.com/bot.html)"
# UA "Googlebot/Test"
# UA "Mediapartners-Google/2.1 (+http://www.googlebot.com/bot.html)"
# UA "Mediapartners-Google/2.1"
# UA "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
# UA "SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)"
# UA "gsa-crawler (Enterprise; S4-E9LJ2B82FJJAA; me@mycompany.com)"
# START GOOGLE IP RANGES ### DO NOT EDIT THIS LINE AT ALL ###
# END GOOGLE IP RANGES ### DO NOT EDIT THIS LINE AT ALL ###
# END GOOGLE IP RANGES ### DO NOT EDIT THIS LINE AT ALL ###
# END GOOGLE IP RANGES ### DO NOT EDIT THIS LINE AT ALL ###
# *********
# Bing Bots
# *********
# START BING IP RANGES ### DO NOT EDIT THIS LINE AT ALL ###
# END BING IP RANGES ### DO NOT EDIT THIS LINE AT ALL ###
# END BING IP RANGES ### DO NOT EDIT THIS LINE AT ALL ###
# END BING IP RANGES ### DO NOT EDIT THIS LINE AT ALL ###
# START CLOUDFLARE IP RANGES ### DO NOT EDIT THIS LINE AT ALL ###
# END CLOUDFLARE IP RANGES ### DO NOT EDIT THIS LINE AT ALL ###
# END CLOUDFLARE IP RANGES ### DO NOT EDIT THIS LINE AT ALL ###
# END CLOUDFLARE IP RANGES ### DO NOT EDIT THIS LINE AT ALL ###
# *************************
# Wordpress Theme Detectors
# *************************
# START WP THEME DETECTORS ### DO NOT EDIT THIS LINE AT ALL ###
# END WP THEME DETECTORS ### DO NOT EDIT THIS LINE AT ALL ###
# END WP THEME DETECTORS ### DO NOT EDIT THIS LINE AT ALL ###
# END WP THEME DETECTORS ### DO NOT EDIT THIS LINE AT ALL ###
# ****************************************
# NIBBLER - SEO testing and reporting tool
# ****************************************
# See - http://nibbler.silktide.com/
# START NIBBLER ### DO NOT EDIT THIS LINE AT ALL ###
# END NIBBLER ### DO NOT EDIT THIS LINE AT ALL ###
# END NIBBLER ### DO NOT EDIT THIS LINE AT ALL ###
# END NIBBLER ### DO NOT EDIT THIS LINE AT ALL ###
# ****************************
# Known Bad IP's and IP Ranges
# *************************************************
# Blacklist IP addresses and IP Ranges Customizable
# *************************************************
# BLACKLIST all your IP addresses and Ranges using the new include file below.
# New Method Uses the include file below so that when pulling future updates your
# Custom Blacklisted IP addresses are automatically now included for you.
# Read Comments inside blacklist-ips.conf for customization tips.
# Updating the main globalblacklist.conf file will not touch your custom include files
# START BLACKLISTED IPS ### DO NOT EDIT THIS LINE AT ALL ###
include /etc/nginx/bots.d/blacklist-ips.conf;
# END BLACKLISTED IPS ### DO NOT EDIT THIS LINE AT ALL ###
}
# Keep own IPs out of DDOS Filter
# Add your own IP addresses and ranges below to spare them from the rate
# limiting DDOS filter (one per line)
# This now automatically includes the whitelist-ips.conf file so you only
# need to edit that include file and have it include here too for you
geo $ratelimited {
default 1;
# START WHITELISTED IP RANGES2 ### DO NOT EDIT THIS LINE AT ALL ###
include /etc/nginx/bots.d/whitelist-ips.conf;
# END WHITELISTED IP RANGES2 ### DO NOT EDIT THIS LINE AT ALL ###
}
# *****************************************
# MAP BAD BOTS TO OUR RATE LIMITER FUNCTION
# *****************************************
map $bad_bot $bot_iplimit {
0 "";
1 "";
2 $binary_remote_addr;
}
# ***********************
# SET RATE LIMITING ZONES
# ***********************
# BAD BOT RATE LIMITING ZONE
# limits for Zone $bad_bot = 1
# Nothing Set - you can set a different zone limiter here if you like
# We issue a 444 response instead to all bad bots.
# limits for Zone $bad_bot = 2
# this rate limiting will only take effect if you change any of the bots and change
# their block value from 1 to 2.
limit_conn_zone $bot_iplimit zone=bot2_connlimit:16m;
limit_req_zone $bot_iplimit zone=bot2_reqlimitip:16m rate=2r/s;
### *** MAKE SURE TO ADD to your nginx.conf ***
### server_names_hash_bucket_size 64;
### server_names_hash_max_size 4096;
### limit_req_zone $binary_remote_addr zone=flood:50m rate=90r/s;
### limit_conn_zone $binary_remote_addr zone=addr:50m;
### to allow it to load this large set of domains into memory and to set the rate limiting zones for the DDOS filter.
### THE END of the Long and Winding Road
### Also check out my Ultimate Apache Bad Bot Blocker on Github
### https://github.com/mitchellkrogza/apache-ultimate-bad-bot-blocker