Should have guessed…
November 17, 2008 at 12:20 am | In Technical, arbit, chappar, humour, nitk, sarcasm | 10 CommentsTags: chappar, creepy tech, gender analyzer, google translate, hinglish, machine learning, text classification, uclassify
While riding on the internets, and surfing the tubes, I came across this nifty site called Gender Analyzer. Using free text classifier algorithms from a site called Uclassify, this site aims to judge whether a blog/website is written by a woman or a man. A very active research topic.
I tried out using some known standard cases, and here’s the goldmine.
Gosh, I didn’t know that Machine Learning had become so accurate these days. Be paranoid, very.
Incidentally, Chappar, when you were on wordpress, your manliness rating was 83%. Did anything special happen during the transition phase?
A thousand apologies, plus one extra, just in case.
And to those who might think of an oh-so-brilliant, “Look who’s talking !!!”,line. I’m at 71%. Muha ha ha.
P.S: Incidentally again, this is the 2nd in the chappar series of posts, the first one having been written nearly 2 years ago.[hyper-link to click in case you're bored]
Update: Google Hindi translation of this post is too funny.
excerpt: जबकि internets पर, घुड़सवारी और ट्यूबों सर्फिंग, मैं इस गंधा साइट भर में आया जेंडर विश्लेषक कहते हैं
lol [link]
10 Comments »
RSS feed for comments on this post. TrackBack URI
Leave a comment
Blog at WordPress.com. | Theme: Pool by Borja Fernandez.
Entries and comments feeds.




My blog happenes to be at 61%
And i happened to give the same website’s URL as the input string. 85%. Not bad.
God only knows how that algorithm works!!
Comment by Kitta — November 17, 2008 #
the training set wasn’t right, i think.
Comment by wanderlust — November 17, 2008 #
@kitta- firstly, congrats on 61. The algorithm is available for download. My hunch is that it uses some variant of boosting algorithms. Neural net is a bit tedious for this. It needs to have a semantic-language processing toolset, which might be tougher than the classifier itself. @priya- the problem with machine learning usually lies with an insufficient or incorrect dataset.here they use 2000 blogs, most of which could be very localised w.r.t country, agegroup, topic etc…with more classifications, the learning data will grow, and hopefully become more accurate. Having said that, i’m quite satisfied with its accuracy;-), a few outliers being funny exceptions.
Comment by Logik — November 17, 2008 #
i found one paper which distinguishes between men and women writers by the types of words they use. apparently some words are ‘male’ and some are ‘female’. it uses some weighting function to determine which is more dominant.
Comment by wanderlust — November 17, 2008 #
and when i put in some jane austen text in that implementation, it gave a wrong answer. ditto for agatha christie and joanne rowling.
Comment by wanderlust — November 17, 2008 #
Probably it is yet to account for women’s lib… Results might improve in future versions.
Comment by Logik — November 17, 2008 #
Wow. A sexist program.
Comment by tarantinofan — November 18, 2008 #
[...] Issues and Personality Disorders Thanks to Logik for this one. I needed some [...]
Pingback by Gender Issues and Personality Disorders « Tumble-dried Pillsbury — November 18, 2008 #
it translates ‘nifty’ to गंधा?
Comment by wanderlust — November 21, 2008 #
That one was totally off the mark, but translating “ride” to घुड़सवारी was pretty ‘horsey’.
Comment by Logik — November 21, 2008 #