Parallelizing ListNet Training using Spark
With the increase in the size of training datasets for machine learning algorithms, scalability of learning has become increasingly important to achieving continuing improvements in ranking accuracy. As part of class project in my Concepts of Information Retrieval course, I provided a parallel implementation of ListNet, a Learning to Rank algorithm. I demonstrated how training times of machine learning algorithms can be reduced using parallelism from distributed cluster computing systems like Spark. I will be presenting a poster based on this work at SIGIR 2012.
[report | code]
Twitter Sentiment Analysis
With the advent of Web 2.0, there has been an upsurge in the amount of user generated content.
Sentiment analysis attempts to identify the view point in a text span. As part of a group
project in my
Data Intensive Computing for Text Analysis course, we used machine learning techniques to do
sentiment analysis of a Twitter dataset to assess people's opinion about President Obama's job
approval. We saw that semi-supervised learning approaches performed the best. We also learned that Twitter
tweets might be a good indicator of immediate reaction to policy decisions etc. but tweets are
not that good a reflector of long term job approval of the President.
[report | code]
Interaction Design for an eNotice Board
Physical cork boards have traditionally been a good way to advertise. However, there are certain
problems that arise with cork boards such as the management of postings, space constraints, and
providing a stable store for information that a user wants to collect. Even with these issues,
cork boards provide some unique attributes such as locality-based advertising and a social point
of interaction. As part of a group project in my Human Computer Interaction course, we tried to model the metaphor of a cork board using
a digital interface which captures the physical aspect of the cork boards and combines it with
the powers of a digital electronic display board. The project involved design interactions using
which users could pull content from and push content to the board.
[ report | code]