Catherine Finegan-Dollak

About Me

I am a third year PhD student at the University of Michigan, where I study natural language processing (NLP) as part of the CLAIR group, supervised by Professor Dragomir Radev.

I received my bachelor's degree from Boston College and my juris doctorate from the University of Virginia School of Law. I practiced law for several years (have a look at my resume for more details). My favorite part of my job was trying to understand the technology involved in patent cases I litigated. My least favorite part was document review: reading thousands of documents to figure out which were relevant to a case and to pick out the few needles of useful information in a massive haystack of emails and corporate documents. When I saw IBM's Watson on Jeopardy in 2011, I could not help but be fascinated, both by the potential application of taking over the tedious parts of my job and by the apparent ability of a machine to understand language.

In 2013, I left my job as a litigator to study NLP full time. I love my new job. My working life is a mixture of (1) hearing really smart people explain wonderfully clever technology, (2) working on my own research to try to push the boundaries of what technology can do, and (3) teaching undergraduates the fundamental concepts of computer science.

Research Interests

I am interested in semantics: What information is in this document, how can we represent it, and what can we do with that representation? I have been reading about semantic parsing and a variety of meaning representations, as well as machine reading and machine understanding of text. Currently I am exploring how semantics can be used for automatic summarization.

Like the rest of the NLP community, I am also quite interested in how deep learning can be applied to the problems I work on.

Recent Projects

Strategies for Summarizing Sophisticated Documents

Summarizing sophisticated documents, such as legal cases and scientific journal articles, has the potential to help highly trained professionals keep abreast of information that can help them be better at their jobs. However, this type of document creates special challenges for automatic document summarizers. One such challenge is the presence of exceptionally long sentences. These may contain information that belongs in an extractive summary, but including the entire sentence may crowd out important information. We therefore evaluated several methods of shortening sentences for summarization.

ISBN Recommender

This summer, I worked as part of an interdisciplinary team building an educational tool for students of comparative literature. The overarching goal of the project was to help students find and understand connections among translated works. The NLP portion of the team collected a corpus of reviews of relevant books and built a recommendation engine based on text similarity between reviews.

Using NLP to Identify Exonerations

In 2013, at least 87 people were exonerated after being imprisoned for crimes they did not commit. Courts and prosecutors do not report exonerations to any central authority, so the National Registry of Exonerations tries to identify stories of exonerations on the web, in order to assemble a database of information about these cases. We are working on a system to help automate the process. Work so far has focused on information retrieval strategies to help find relevant news stories and named entity recognition to help extract the names of potential exonerees from such stories; future work should combine these pieces with a classifier to help find the names most likely to belong to exonerees.

Class Projects

Is There an Elephant in the Room? Identifying Objects in Cartoon Images from Captions (undertaken as part of a course on natural language processing): We explored methods of determining what objects appear in a New Yorker cartoon image based on words included in proposed captions for the image.

Improving Brain-Computer Interface Accuracy Through Latency Compensation (undertaken as part of a course on machine learning): We examined machine-learning methods to interpret delayed or noisy EEG signals to improve brain-computer interfaces for people with severe disabilities.


For the Fall 2015 semester, I am a Graduate Student Instructor (GSI) for EECS 595 / SI 561 / LING 541 - Natural Language Processing. In the past, I have been a GSI for EECS 484, Database Management Systems, and EECS 280, Programming and Introductory Data Structures.


EECS 595: Natural Language Processing
EECS 543: Knowledge-Based Systems
EECS 445: Intro to Machine Learning
EECS 599: Directed Study
EECS 484: Database Management Systems
EECS 492: Intro to Artificial Intelligence
EECS 482: Intro to Operating Systems


Office: 3861 BBB
2260 Hayward St.
Ann Arbor, MI 48109