May 30, 2008 at 5:27 pm
A small technical exercise testing the effectiveness of gmail to combat spam, which on their site says “Fast, searchable email with less spam”.

This is true mostly. However, there is a downside to gmail spam filters w.r.t false-positives, i.e (filter mistakenly flags a good mail as spam)
These numbers are particularly high in case of gmail. I’m comparing it to, say using Spam-Assassin , or Akismet as your spam filter. I’ve had a sufficient number of good mails, not so useful, but not spam either-> which have been flagged off as spam.
This is unacceptable. In my case, I’ve the patience, and also a weird interest to go through my spam messages everday. But, busy people would consider this to be a “costly nuisance”.
The Nuisance is obvious. It’s also costly as the flagged message might have critical business information.
Most people however do not worry much about this. Because they trust their spam filters to weed out only the bad guys. As Matt Cutts of Google says about Google Search so elegantly,

“Sure,I could stop all the spam in the world if I didn’t have to return any search results.” 🙂 [source]

What people don’t like, however, is the appearance of { stimulating/enlarging/Nigerian King bequeathing his 100 million $ /cheap adobe software or Rolex-es / mortgage loans at unbelievable rates } mails in their inbox. This is totally irritating, and this does more damage to the lay-user than the previously mentioned almost-rare case.
Spam filters involve many techniques which are constantly under development. These chiefly include heuristic[ drawing from previous experiences, a learning method], bayesian filters, simple database matching, matching ip’s with regular spammers, and lots more involving complex probability/statistical models beyond my wildest dreams.
So, being the evil chap that I am, I did this nonsensical thing. I sent a spam mail to myself.

Self Spam
Granted, this is stupid. Gmail should place a trust on the sender[ me], and the sending server[], and hence classify this as a legitimate mail.
So I repeated this exercise with a well-known email spoofer, . What this nifty project [written by two brilliant chaps from University of waterloo] does is pretty simple.
As the site says – “It allows you to send an email, that looks like its sent from someone else.” Or in simple terms.
I can send an email from id’s like mukeshambani[at]reliance[dot]in, or soniagandhirocks[at]congressrocks[dot]gov[dot]in.
[ note the delicate usage of “rocks”, Blogger belongs to google. I don’t want no risks]. A screenshot of pranketh’s page.
Pranketh Page
Yes. I know your doubt. If its so easy, why do nefarious miscreants use stupid yahoo/gmail id’s to threaten people, or send smokescreen bomb-footage, to the extremely retarded tv channel Aaj Tak[ self-proclaimed to be “Sarv-Shresht].
The point is the email server, folks. Your id is spoofed. But the smtp server name. No no. I couldn’t dream in my distant dreams to get a smtp[ Since I don’t happen to be a chinese (govt-sponsored) hacker 😦 ]. As a test case, try sending a prankethmail to your own id. And check the “original mail” option in gmail.
Ah. The post digresses from its core issue. Lets come back to the main point, shall we.
An important reason to gmail, not blocking my self-spam, was that it trusts my id, and its servers.
I tried sending the same text through pranketh. Guess what. It thrashes the mail left and right, before it even leaves their servers. Why?. They use Aksimet.
Pranketh Spam

My advice to gmail [ In the remotest probability that Matt Cutts is reading this,], and other mail providers is this.
Google’s servers are checking all our mail-contents for generating their automated ads and stuff anyways. So, there is no illusion of privacy. So, the next time, I’m sending a mail, check the contents before hand. Warn me if its spammish. Keep the thresholds appropriately such that I’m not regularly annoyed with these warnings.
If all smtp servers start this routine, we can see at least some major changes.

1. All the worlds emails would take a longer time to reach their destination. [ There’s got to be some catch.This is it]

2. Say I send one mail to a big bunch of people, it’d be scanned for spam-behaviour only once. Then some certification can be piggy-backed along, saying its reliable, and not spam. The experts can handle that bit. Not too difficult.

3. Botnets prevention. Say, some dumbo privately runs a smtp server, and has been been subjected to a backdoor/trojan attack. And this is currently acting as a zombie sending out bunch of viagrish mails to innocent people, who’ve left their email id’s lying out in the open.

I’m not saying you give up your earlier approach. That’d be foolish. But if its absolutely obvious that a mail is spammy[ self-spam for eg.]. Block it before it leaves your grounds.

Now, a general warning to all those who think spams are obnoxious. Your might be the prettiest email id around. But don’t leave it on some arbit website, for all the world to see. One syntax-based text crawler and you get thousands of them.
Believe me, some of these spammers are millionaires[ Not the Nigerian kind]. And run their business professionally. And have awesome technical expertise too.
If anything, don’t make their job easier. Let them just fight it out with the big-guys[ yahoo, google, msft et al].

I reiterate. If you desperately want to put your email id on the net, use images like these.

email id
And, a word of caution. Even this is not safe. Within 1-2 years, google image search is going to search the contents in images. And character recognition. Piece of cake.
So, what do you do next. Do not fear, I have the ideas.
1. When you must, put your email id’s with re-captcha. I’d written a post about this some time back. My email id through this schema would be .Go to their website and register for their free service. The only reason this idea is safe, however, is because Spammers, like all of us, are average-ramesh hard-working people. They do not have time to fill your captchas. Where as, your friends and people who want to see your email id so desperately, do.
2. Use images, but this time, write them with 3-d blocks. Even by extreme image processing hacking standards, this is nearly safe for 5-6 years.
3. Do not put email id’s on the internet.

This is probably the first in a series of spam-related posts to come.

P.S: I kinda remembered the first word of my blog’s title, and how I hadn’t paid any attention to it, for the past few months.

And, let me clarify. I love gmail.



RSS feed for comments on this post. TrackBack URI

  1. Good post, this…
    Yeah Ive notice, gmail spam filter is not reliable, you need to go thru the subject atleast before emptying the folder…
    One more thing I have observed, one ID gets spamish mail, and the other one that i use everywhere never gets any…

  2. “””note the delicate usage of “rocks”, Blogger belongs to google. I don’t want no risks””” — LOL

    Well, legitimate mail going to spam box is very irritating. My friend in PESIT failed to apply for two companies because the mails landed in SPAM. When he came to know about such mails from his friends, it was too late.

    And since you have this habit of reading spam, tell me, have u made some analysis on the kind of spam messages ?

  3. @tuna – thnx. About u not getting any spam @ ur popular id. You must be really, really lucky.

    @sagar- too bad for your friend. Spam is yet another email folder, so you’ve got to look into it with interest.

    Analysis. Ya. On many aspects. Geographical location of frequent spam-senders. A drastic contrast in literary skills[ some so refined, it’d drive a poet mad. Some so retarded, they look like youtube comments]. Typical word/letter shifting techniques to avoid simple word filters. Image, pdf techniques. Viral marketing. and much more… I think that’ll be the subjects of some future posts.

  4. so i send you some such mail from my proper ID… should that be classified spam? for all google knows, i might be merely forwarding it to you coz you like spam, or coz i thought it was funny. google supposes mail originating from people wouldn’t exactly read like automated spam and so skips the entire process… if i wanted to advertize rolex watches and im sending out those mails personally, i’d make sure grammar’s correct etc, and it;s not just a collection of keywords and links. this set of rules doesn’t apply to human users.

    why don’t you report that mail as spam?
    send the same one to a bunch of us, and we;ll all mark it spam again, and let’s see where that gets us. or rather, your mail ID.

    since you research spam so much, do tell me what people gain out of sending spam to gmail/yahoo IDs? they get filtered, obviously, and are read only by spamaniacs like you… why do they bother? or is in the hope of a false negative?

    btw, nitk mail server was used for routing spam… any mail from would automatically be classified spam. that was what the ‘blacklisted’ meant, not coz people were downloading loads of adult content, as it was popularly thought.
    wait… were you the one who told me this?

  5. @priya:
    Corrections :
    Gmail doesn’t skip the entire process because a friend of mine is sending the spammy mail. The thresholds are a bit relaxed thats all.
    Yes, you’re right on the fact that, if may people report a particular mail of mine as spam, it’d be classified as spam, henceforth, and the sender[ me], would be put under higher scrutiny for further mails.
    There is a mistake in your assumption that all gmail/yahoo users are humans. Once the account has been done, sending mails can be automated with bots. And popular mail provider accounts can be compromised. There are countless, Outlook, thunderbird users out there. One remote trojan, backdoor exploit, and his client becomes a bot.

    Spammers do not see the actual nature of email id’s they send mails to. They’ve a bunch of id’s in a database, the server keeps mailing to them. With luck, some might pass through filters, and a small portion of them might go to retarded users, who want to purchase such stuff. Since a lot of mail sent in the world is spam, the revenue share is pretty high.
    I think its similar to the long-tail effect in economics. I’m not so sure though.

    Ya, about nitk server. That was one of the reasons. Adult content was more serious than thought. Apparently there were some objectionable up-loads as well.
    And, Yup. I think I told you that.

  6. hey… an experiment on these lines would be nice.
    and uploads? whoa……

  7. Experiment is all fine. But I’ve a pretty good email id to begin with. I’ve sent and received plenty of ‘humorous’ and off-beat spam messages from friends. No, not those irritating forwards. The real deal.

    I propose a quirky solution. Lets have two spam folders. One “friendly-spam”, and other as “spam-spam”. Thankfully gmail gives the ‘labels” feature.
    I currently maintain labels like interesting-spam, and crappy-mails[ most annoying forwards go here] etc.
    Apparently there was this chap, who had not deleted even one of his spam mails since 1997. He put forth a detailed analysis of that. It appeared on Slashdot. And naturally he got slash- dotted, if you know what I mean.
    Surprisingly, even the slashdot geeks found the idea repulsive, and jobless in particular.
    I wonder why.

