How-To: Prevent SPAM with Apache’s mod security
WordPress is a great piece of software to run a blog, it is flexible, has tons of plugins are developed for it and updates are really easy to do. To fight spam comments, there is already the Akismet plugin that does a really good job.
While Akismet catches the spam comments and put them in a separate location, making it easy to delete them, as the number of spam grows, WordPress can take long to empty the purge the flush comments and the best option becomes to use a manual SQL query to flush them.
In this article, we will see how we can use RBL to prevent spammer from posting to WordPress’s comment page and at the same time, lift a bit of load from the server.
While the rules work for WordPress, with a bit of modifications, it will be easy to get this setup working for any kind of blog/website.
This setup was tested on Debian Wheezy, it is assumed that you already have a working Apache server and a working instance of WordPress running.
In a nutshell, what we are going to do here is to use ModSecurity to inspect the web requests, and check against some RBL if the IP that attempt to post a comment is a known spammer. If it is the case, we will deny access to the page, avoiding any access to Akismet service and the database.
RBL works by doing in DNS lookup for a special hostname, depending on the returned address, we will know if the IP is blacklisted or not. It is lightweight and subsequent calls will be faster as the DNS entry will be cached.
Installation
To be able to use the RBL functionality in ModSecurity, we need at least version 2.7, Debian Wheezy ships version 2.6.6 but luckily, the debian-backports repository has version 2.8.0. So the first step is going to enable Debian backports.
echo "deb http://http.debian.net/debian wheezy-backports main" > /etc/apt/sources.list.d/wheezy-backports.list
apt-get update
And then install libapache2-mod-security2:
apt-get install libapache2-mod-security2<br /> # a2enmod security2
Configuration
Now, edit your virtual host and add the following within the VirtualHost namespace:
<IfModule security2_module>
SecRuleEngine On
SecRequestBodyAccess Off
SecRule REQUEST_METHOD "POST" "id:'400010',chain,drop,log,msg:'Spam host detected by zen.spamhaus.org'"
SecRule REQUEST_URI "\/wp-comments-post\.php" chain
SecRule REMOTE_ADDR "@rbl zen.spamhaus.org"
SecRule REQUEST_METHOD "POST" "id:'400011',chain,drop,log,msg:'Spam host detected by netblockbl.spamgrouper.com'"
SecRule REQUEST_URI "\/wp-comments-post\.php" chain
SecRule REMOTE_ADDR "@rbl netblockbl.spamgrouper.com"
</IfModule>
And finally, check that the config is correct and if it is, reload apache:
apache2ctl -t
/etc/init.d/apache2 reload
What this does is to enable ModSecurity engine (SecRuleEngine On), to only look at the HTTP headers (SecRequestBodyAccess Off) and finally set the rules up. We have 2 similar rules checking 2 different DNSBL. If the first one does not match the IP, the second one may. I have found zen.spamhaus.org to catch most of them, and netblockbl.spamgrouper.com to catch some of the ones that went through.
We are chaining 3 rules, and if the 3 match, we will take the actions from the first rule. As a sidenote, when chaining rules, the disruptive actions must appear in the first rule (drop here).
So, our first rule says that we want to match the HTTP method POST. We chain this rule and if all chained rule match, we will drop the request, and log it with a message along the line: Spam host detected by _dnsbl_provider_. The next rule check if the URI is /wp-comments-post.php, which is the page use to post comments and also get chained, finally, the third rule check that the remote IP is known to the DNSRBL.
If all 3 conditions are met, the request is dropped and does not even reach WordPress.
Another good DNSBL that catched IP that made it through the 2 first one is Project Honeypot. You need to sign up and get your BL API key, once you have it, you can add it to the list and fill in the key with the SecHttpBlKey directive as follow:
SecHttpBlKey my_key_here
SecRule REQUEST_METHOD "POST" "id:'400012',chain,drop,log,msg:'Spam host detected by dnsbl.httpbl.org'"
SecRule REQUEST_URI "\/wp-comments-post\.php" chain
SecRule REMOTE_ADDR "@rbl dnsbl.httpbl.org"
Avoiding too many lookup
We can improve the logic above by marking blacklisting an IP that matched against a DNSBL by blacklisting them from within ModSecurity for some time. This way, if we see an IP that we know is a spammer, we will avoid querying the DNS for no reason.
What we will do here, is to blacklist an IP for 7 days if we already had a match against a DNSBL. We will only do that for POST to the comment page (but we could be more aggressive and just blacklist against any pages.).
Change the rules to look like:
<IfModule security2_module>
SecRuleEngine On
SecRequestBodyAccess Off
SecAction "id:400000,phase:1,initcol:IP=%{REMOTE_ADDR},pass,nolog"
SecRule IP:spam "@gt 0" "id:400001,phase:1,chain,drop,msg:'Spam host %{REMOTE_ADDR} already blacklisted'"
SecRule REQUEST_METHOD "POST" chain
SecRule REQUEST_URI "\/wp-comments-post\.php"
SecRule REQUEST_METHOD "POST" "id:'400010',chain,drop,log,msg:'Spam host detected by zen.spamhaus.org'"
SecRule REQUEST_URI "\/wp-comments-post\.php" chain
SecRule REMOTE_ADDR "@rbl zen.spamhaus.org" "setvar:IP.spam=1,expirevar:IP.spam=604800"
SecRule REQUEST_METHOD "POST" "id:'400011',chain,drop,log,msg:'Spam host detected by netblockbl.spamgrouper.com'"
SecRule REQUEST_URI "\/wp-comments-post\.php" chain
SecRule REMOTE_ADDR "@rbl netblockbl.spamgrouper.com" "setvar:IP.spam=1,expirevar:IP.spam=604800"
SecHttpBlKey my_key_here
SecRule REQUEST_METHOD "POST" "id:'400012',chain,drop,log,msg:'Spam host detected by dnsbl.httpbl.org'"
SecRule REQUEST_URI "\/wp-comments-post\.php" chain
SecRule REMOTE_ADDR "@rbl dnsbl.httpbl.org" "setvar:IP.spam=1,expirevar:IP.spam=604800"
</IfModule>
What is new? First, we will initialise a collection called IP, then, if the value of IP.spam is greater than 0, we will chain the rule, and drop and log the message ‘Spam host %{REMOTE_ADDR} already blacklisted’ if all other chained rule match. In the chained rules, we check that the query is POSTing to the comment page.
Now, when we detect a spam with a DNSBL, we set the variable IP.spam to 1 and make that variable expire after 7 days (604800 seconds).
Conclusion
Using this technique, I have brought the number of comments marked as spam by Akismet down by 50x!!!. I can now let spams stack up in the spam comment list and wordpress can still delete them quickly when clicking on Empty Spam.
Not only does it make the experience more enjoyable, but by tackling the spams earlier, you save some processing resources, avoid useless writes to DB and query to Akismet along with some bandwith as the 403 page is much more lightweight then a render blog page.
By keeping track of known spamming IP, we can also bypass DNS queries.