Posts Tagged ‘SpamAssassin’

Spam And The St. John’s Server

23 December 2010

I am the IT Department for St. John’s Episcopal School and Church, and as such have to keep the network, servers, and workstations running. One the biggest problems I have had in the past couple years is spam. Not the canned variety, but email spam.

Spam has been a running problem for a number of years, but was generally manageable. Starting around the beginning of 2009, the amount and virility of the spam got much worse. I was getting a lot of complaints from my users, especially the science teacher in the building, who was very persistent in her complaints (usually, I heard about this at night in bed. About the SPAM, that is!).

I started looking for some solutions, and after a bit settled on using SpamAssassin. It had good reviews and was free for a not-for-profit. I installed it towards the end of school (to minimize user impact if something went wrong), and spent some time each evening for a couple weeks having spams that my users found run back through the SpamAssassin tool. After a bit, the spam load started trending down (see the chart below).

I don’t know why this was. I didn’t have a way to automagically collect metrics on the spam coming in, but my gut told me that it was in the 150-200 per day range, which is about what the chart shows. My guess is that as I started automatically rejecting spam at the server, the spam was not able to call home (a lot of it had embedded single-pixel loads from a remote server, intended as a return receipt from our IP address).

Whatever the reason, spam dropped a lot at the user level. I rarely got complaints about it, and what was getting through was getting manually killed by the users. For a couple months, I put in a couple hours ever couple days running the spam though manual analysis, tightening the reject criteria.

This happy situation persisted for more than a year. Daily spam was in the 100 or less range, and the users rarely got any.

Then, over the summer of 2010, my resident science teacher started getting a LOT of spam. You can see that spike, up in the 800-1000 range. It dropped off a little bit, but she was getting a significant amount of spam that SpamAssassin was not catching.

My analysis showed that most of it was coming in as if it was being sent from our own domain. I had whitelisted all school- and church-related email addresses I could get, including the teachers, of course, and that made SpamAssassin think that the stuff was legitimate. I did some more tuning of the filter criteria, but we were clearly in a world of hurt.

I was reading one of my daily set of admin/security emails, and the spam blocking service Spamhaus was mentioned. I went and checked it out. Spamhaus maintains a set of know spammers domains and IP addresses. They will feed this to you, and some institutions like schools and churches could use it free if you were below a certain volume of traffic.

I went back and did some grepping in my mail logs, and found that I was getting connections on the order of 1% of the maximum Spamhaus criteria. The reviews of the service were good also.

So one evening in early November, I made the changes to my Sendmail MC file, rebuilt it, and then sent myself a batch of emails to make sure that I had not broken the server. While I was doing this, I checked the log file to make sure the messages had come in OK, and there was… a log entry from the Spamhaus service. It had zorched an attempt to connect and deliver some spam, it seems.

Well, that was pretty cool, I thought. I headed home, and checked again when I got there. The number of blocks was up to about 50! I asked people in the school and church to tell me if they got spam, or more importantly, if they didn’t get an email from someone who had sent it.

As the weeks went by, the number of blocks was steadily increasing. I decided to do a bit of analysis. First, I made a script that traversed a set of files I had set up for each user to dump spam into. I had to tweak the grep commands a couple times, but ended up with a file that had one line in it for each spam, including the date.

I then wrote another script that pulled the same data out of my maillog file; that was easy. So now I had about 10MB of data.

Next, I wrote a short (~100 lines) Visual Basic for Applications (VBA) app that made that data graphical. I used VBA since I wanted to use Excel to plot the data, and VBA was built into Excel, and it handles flat data files well. I had some issues with getting Excel out of compatibility mode, since I needed a spreadsheet that was about 1000 cells across. The VBA app got the earliest and latest dates so I could get the width of the spreadsheet right, then it read each line, and for the date extracted from each record, it incremented the cell value. There were two lines of data – one for SpamAssassin actions, and one for Spamhaus Zen actions.

The data were remarkable:

The data showed that the Spamhaus Zen blocker essentially killed off most of the spike in spam that started last summer. It took the level down to the previous level, and once again my users are not complaining of spam. So the Spamhaus Zen function blocks connections to the St. John’s server at the attempt to connect to the sendmail. This saves CPU cycles since sendmail doesn’t have to process as many messages through SpamAssassin.

One thing I need to add to my data collection is how many actual messages make it to the users. That is a project for the next couple weeks.

This was interesting project in several ways. I had not done any VBA programming. It was EASY. I have a lot of lines of code of Visual Basic under my belt, so the syntax was already there. I had to do a bit of research to get the specific Excel commands, but they were not too bad. The resulting spreadsheet was about 500 cells wide, and had three rows of data in it, and that spreadsheet caused my rather beefy laptop to get really, really slow.

I used VBA since I knew that Excel made nice graphs, and that’s what I had on my laptop. I don’t know if OpenOffice Calc does the same or similar. I want to find out what kind of scripting (if any) OO has. I also want to have a look at tools like GnuPlot (I know this exists, I just don’t know if it will work for this) to see if I can have the server generate this automagically every week or so.

Always a new skill!