SI 503 Winter 2007: Search and Retrieval


SI503 home

cTools site


spam challenge

related courses


Winter 2007:




USB 1230

Office hour:
Tues. 3-4pm
3082 West Hall

503 spam challenge


This challenge was inspired by SEO contest which I read about in the lecture notes accompanying Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press. 2007.

More interesting than the challenge (you can see the original below), is how it was won:

Cosine similarity entries
PageRank results


(no longer active...) This spam challenge was part of the 3rd assignment for SI 503 W07

There are two parts to this challenge. First, you are trying to get the best cosine similarity score for the following query:

why did the discombobulated shepherd take a circumbendibus

You have a character limit of 400, and you are only allowed to paste in complete sentences you found on the web (that is, if we Google your sentences, we expect to find them out there, and not because you just put them in your blog :) . In other words, those sentences should not have been planted by you online). On the backend, the "ladamic" search engine is going to be doing simple TFIDF (with case folding, punctuation removal, but no stemming) using document frequencies sampled from the web and then take a cosine between the above query and your text.

You only have one shot at submitting the text, and if you are the highest scoring document, you will get an extra 10 points (out of 100) on the assignment.

Your second task will be to try and get the highest PageRank, because this will in principle assure your high position among the search results (though for the purposes of this assignment, we are keeping your textual matching score and your PageRank separate). You can select up to two people to link to.

If you have the highest PageRank, you will get an extra 20 points on this assignment, which you may share with other people.

your uniqname:
full name:

Textual match challenge
careful: you may be able to enter more than 400 characters - it's up to you make sure you do not exceed the limit

Your document:

PageRank challenge
first uniqname to link to
second uniqname to link to