Spam – you know what it is, you expect it, you don’t like it, and it really annoys you when you get any. It is almost an accepted if irritating inconvenience. Reminded of the Monty Python “Spam with Spam, spam with eggs…” sketch – there is junk spam, advertising spam, well meaning mail that just happens to be spam today, malware, phishing email, spam phishing, viruses, compromises, all manner of badness – delivered – day by day, hour by hour, minute by minute into your inbox. This is a short story about spam arriving with us – forming the second part of two articles on spam and email.
So what can you do – and what do we do to check over the email before it makes it’s way even near your inbox.
Tools to fight the good fight
Old school, and very much still the go to for great results.
These are often referred to as DNSBL or RBL or DNSRBL or just plain old Black Listing (BL) and the mechanism that sits behind these is one of DNS.
While this is not to do with resolving where the server lives, it is everything to do with a lightweight fast protocol and thus means to figure out whether a server has a reputation issue – or not.
Depending on the value returned when querying the server depends on whether the server has a currently documented issue with sending spam email.
It is lightweight, low-ish impact to the person offering the service too – and tools exist for maintaining DNS. Happy days.
There are many to pick from out there. Too many if anything. To the point that some people try to turn it into a money spinning operation, or others do not delist. People get annoyed when they are unable to maintain a legitimate server service, or when there is a listing for apparently no reason.
In reality there are hundreds to choose from but the list of people you don’t want to be involved with is short – to the point of two, maybe three hosts.
…and if you really want to have three SORBS.
Spamhaus and Barracuda represent the top of the tree for the two sides of the operation – the non-commercial and the commercial. Sorbs – well, they are an also ran – however seem to have something to offer in terms of detection. However getting delisted has been a bit of a dark art in the past.
These companies tend to work through honeypots, and other such mechanisms – along with salting lists with known email addresses to allow their use and circulation to be tracked.
For these two – their work goes beyond the mundain service delivery – having seen Spamhaus at conferences talking about their research into Spam, the drivers, and the involvement of organised crime. It got dark very quickly – however it is something to keep in mind. This is about revenue generation for the spammers, and someone somewhere must be having some success through it, or phishing, or malware for it to continue. It goes without saying that these companies are ‘unpopular’ with the powers that be in that particular field are commonly under attack and in court. For all their pain – we appreciate their service.
A little out of favour these days – however still has its place in the world. Bending the rules of the RFC but within the spirit of the standards comes Grey Listing.
What Greylisting does is as the name would suggest – a middle ground between Black and White listing.
The theory is that dumb machines or webservers that are trying to stay below the radar while churning out vast quantities of spam will want to avoid using a local or remote email server. They will want to avoid that as the mail will show in the queue. Mail that shows can be monitored, logging checked, tied back to an account, if not a file and eradicated. This is good for admins – bad for spammers. So direct delivery is a winner in this respect.
Up until the mail server says to the sender “420: I am a little busy right now – can you call me again in five?”
The 4xx error code is within the rules and a soft failure. The English part wont say that but you get the message. They can try as many times as they like – but their email will not be accepted for a few minutes.
Now this is a real bind to Joe Spammer. As he as 900,000 mails to send before she gets rumbled, so on they go with the next email.
At this point the Greylist is everyone’s friend. Great success.
When a mail server with a queue sees the 4xx – it sticks it back in the queue with a timer, and re tries it, and again and again, and eventually it gets accepted. The gaps between the retries are logarithmic. Very very quickly at first, then longer and longer until it is either accepted or fails.
Once a mail is accepted from a given sender, to a given recipient from a certain server it is put in a database either permanently or for a period of time. Mail from this combination is not delayed again. Everyone is happy.
So does this work.
Here are some stats from a mail server that deals in the nasty nasty world of forwarding. Have a look at the figures here. These are over a prolonged period however I would expect these to add up to the same percentages be that over an hour, day, month, year.
Yes that’s 98% of all email being spam.
No alarms. No surprises. This is the norm.
16240 items, matching 901284 requests, are currently whitelisted 0 items, matching 0 requests, are currently blacklisted 1552 items, matching 1563 requests, are currently greylisted Of 1586328 items that were initially greylisted: - 19573 ( 1.2%) became whitelisted - 1566755 ( 98.8%) expired from the greylist
Server Policy Framework (SPF) is a neat little solution again based on DNS – this time proper DNS from your NS for your domain.
You set up a text (TXT) record within your zone file. It is a pretty simple kind of thing and human readable. The format looks a little like this
v=spf1 a mx include:hostinguk.net -all
It is not Shakespeare – I will give it that – but it does a job, and is easy to read.
What it is saying there is that this is SPF version 1 – read it as such. Allow any A records I have to send email, allow any MX (mail servers) I have to send email, and allow anything that hostinguk.net has in their SPF record to send email as me.
The most vital part there – the part that gives it teeth is the ‘minus all’ at the end there. What that does is says that anything that doesn’t come from any of these locations – drop it. Bounce it back to the sender saying no, permanent failure (550 or similar) and no doubt with a message saying “Failed SPF…” or similar.
Setting one of these within your zone file means that servers that are SPF aware will not accept email from your domain unless it came from somewhere you have said – in this statement – is allowed to send as you. Otherwise – anyone can… shocking that bit isn’t it? Not really, but not realising that probably is 😉
Another mechanism that works through the wonders of DNS from the zone file for your domain.
DKIM stands for Domain Keys Identified Mail. Again this is a mechanism to help with spoofing and tampering.
It can be used to ensure that the email, and possibly attachment has not been modified since the email was sent from the server.
This does however mean that any email that is forwarded is going to fail on this.
There are a lot of people out there using forwarding. This is an education as opposed to technical matter. However it means this can cause as many issues as it resolves.
These are not visible to the end user, so they do not offer an end to end digital signature in the same way as SMIME or PGP or GPG.
… or rather – the avoidance of it.
Forwarding is a tool that is often made available in control panels, configurations, settings.
Unless this is very much limited – then this is an open door for spam.
For example a domain with a “catchall” (the bane of many a postmaster/sysadmin) it will be as happy passing on email for email@example.com as it will be for firstname.lastname@example.org .
This means the forwarding server will likely quickly become blacklisted as being seen as a source of spam – despite merely doing as it was asked. It is as happy forwarding a real address as it is a completely made up one – making it a huge target, an open door to take the heat off of the original sender.
If you can avoid forwarding – do so. If you are forwarding elsewhere, and dont need to send as that user, change the MX to point to the other server, and configure it as an alias.
More collective intelligence at play. This time genuine mail servers that are sending out stats from the emails they receive. Those that are bounced, those that are accepted, numbers, all that good stuff. “If only we could use that super power for good?!” – Ahh – but we can!
Reputation based platforms give mail servers from which a server originates from a history, and a reputation. They will monitor trends of volume, and what of that volume is rejected as spam by hosts in the field.
So what use is this? Well – it is plenty. You can use this as a metric to score email based on which server it has come from. A server that is sending out a lot of confirmed spam and that is sending out more email than usual, or for that matter has found itself in a generally bad neighbourhood is more likely to be spewing mail in your direction.
Used alongside other methods this is a good way to help weight a given email as spam or ham.
Who would have thought it right? There is still a space in this world for plain old filtering by words.
These lists can entertain the admin while perusing configuration files – however can equally be surprisingly effective at some levels.
As entertaining list as it may turn out to be, a list has a lot going for it sometimes the simple ones are the best. No sniggering at the back please as I know you are running through what you might put in a list like that spare a thought for afflicted places such as Scunthorpe… use with caution. As such this is best used to help paint a picture – a wider picture – a weighted value – this may nudge it in the right direction.
This is computing after-all – you cannot go too far in any direction before the maths gets complicated. Enter the wonder of Bayesian Probability, or Bayesian Theory. In short – pattern finding, learning, and working out probability from that.
Without getting all funny symbols and graphs in a nutshell you present some maths with a reasonably equal collection of spam, and ham… which have a fancy name of a corpus. It then looks at patterns of words for one and the other, and figures out (with a bit of luck) what a spam looks like.
Again – this is a weighting. It is going to look through the mails and arrive at a number. The number can then be used to weight the email in terms of whether it is likely to be spam or not.
This was surprisingly (possibly not to a mathematician) effective. To the point you will see messages containing junk quite often. Random phrases – that have been pulled from pages and dictionaries, and legitimate sources. These fulfil two tasks – bucking the trend of what a spam email would usually look like – and also ‘poisoning’ the maths you are working with with false data.
Virus scanning will pick out those with generally unpleasant payloads hidden in their MIME content / attachments.
This is harder than it sounds, as it assumes that this is a common piece of code. The only way to get around this is to use ‘heuristics’ – and also weight (yes – again) an emails score based on how much it looks like it could be a virus based on specific code.
No one likes emails with shady links in. Step in the likes of Google and their SafeBrowsing list.
These are the same kinds of lists that your browser says “WHOOAH! STOP” in red when you are about to visit a page that it thinks maybe something that you are not expecting. Usually in terms of malware of phishing.
This is something that is reliant on this being detected in the first place, but will hit links to sites on the head straight away. This is of course easily countered through the use of various URL shortening applications to side step this in the hope you are using a browser that does not do this check once the URL is put back to its usual length.
No one solution has the answer.
Speaking from running services with either greylisting or RBL’s front of house as the first bar to clear – both will return around the same stats in terms of returning servers. It is how you want to approach the issue.
Here at Hosting UK we use a blend of technologies, as well as vendor supplied bespoke solutions in our newer platforms. These layers allow us to weed out the majority of the bad email before it reaches you.
Statistically speaking for every email you receive – ten have been rejected.
However – that doesn’t stop the ones from getting through being REALLY irritating!
If you are having an issue with spam on your email service – we offer a number of solutions to meet different requirements. DO not hesitate to get in touch with technical or sales to discuss the matter further.
Please note: A large influx of spam all of a sudden, specifically ones that appear to be non deliveries, or bounces – this is indicative of a compromised email account or someone spoofing your address.