
 |
503 spam challenge |
 |
|
This challenge was inspired by SEO contest which I read about in the lecture notes accompanying Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press. 2007.
More interesting than the challenge (you can see the original below), is how it was won:
(no longer active...) This spam challenge was part of the 3rd assignment for SI 503 W07
There are two parts to this challenge. First, you are trying to get the best cosine similarity score for the following query:
why did the discombobulated shepherd take a circumbendibus
You have a character limit of 400, and you are only allowed to paste in complete sentences you found on the web (that is, if we Google your sentences, we expect to find them out there, and not because you just put them in your blog :) . In other words, those sentences should not have been planted by you online). On the backend, the "ladamic" search engine is going to be doing simple TFIDF (with case folding, punctuation removal, but no stemming) using document frequencies sampled from the web and then take a cosine between the above query and your text.
You only have one shot at submitting the text, and if you are the highest scoring document, you will get an extra 10 points (out of 100) on the assignment.
Your second task will be to try and get the highest PageRank, because this will in principle assure your high position among the search results (though for the purposes of this assignment, we are keeping your textual matching score and your PageRank separate). You can select up to two people to link to.
If you have the highest PageRank, you will get an extra 20 points on this assignment, which you may share with other people.
|