Improving Spam Detection

July 29, 2009

The Technology Review, published by MIT, has an article today reporting on some research done at Georgia Tech on  improving methods of identifying spam.  Spam is an enormous problem; as I’ve mentioned before, it’s estimated that more than 90% of E-mail messages sent on the Internet are spam.  ISPs and other providers of E-mai services spend a huge amount to detect and remove spam, lest it completely overwhelm legitimate E-mail.

The researchers looked at some contextual characteristics of E-mail that are not normally examined by spam filters, which usually focus on the message content:

The system, known as SNARE (Spatio-temporal Network-level Automatic Reputation Engine), scores each incoming e-mail based on a variety of new criteria that can be gleaned from a single packet of data.

For example, the standard mail transfer protocol [SMTP’] used to transfer mail across the Internet uses port 25.  A normal mail server will have several other ports open for communication, in addition to port 25.  (For example, it might have port 22 open for Secure Shell connections.)   Machines dedicated to sending spam typically only open port 25.  The researchers also found that, by using the approximate mapping of IP addresses to geographic locations, they could identify regions that were particularly likely to harbor spammers.  In addition, spam tends to travel a longet distance, geographically, than legitimate E-mail.

The idea of using additional characteristics to identify spam is in principle a good one, and looking at these characteristics is relatively cheap in terms of resource consumption.  Still, almost since the transmission of the first E-mail, there has been an arms race going on between the spammers and those trying to deter or defeat them.  The results of this research are potentially quite useful, but spam is probably going to be with us for a long time.

%d bloggers like this: