Catherine Finegan-Dollak
Home | Publications | My Resume

About Me

I am a PhD candidate at the University of Michigan, where I study natural language processing (NLP) as part of the CLAIR group, supervised by Dragomir Radev.

I received my bachelor's degree from Boston College and my juris doctorate from the University of Virginia School of Law. I practiced law for several years. My favorite part of my job was trying to understand the technology involved in patent cases I litigated. My least favorite part was document review: reading thousands of documents to figure out which were relevant to a case and to pick out the few needles of useful information in a massive haystack of emails and corporate documents. When I saw IBM's Watson on Jeopardy in 2011, I could not help but be fascinated, both by the potential application of taking over the tedious parts of my job and by the apparent ability of a machine to understand language.

In 2013, I left my job as a litigator to study NLP full time. I love my job. My working life is a mixture of (1) hearing really smart people explain wonderfully clever technology, (2) working on my own research to try to push the boundaries of what technology can do, and (3) teaching undergraduates the fundamental concepts of computer science.

Research Interests

I am interested in semantics: What information is in this text, how can we represent it, and what can we do with that representation? Like the rest of the NLP community, I am also quite interested in how deep learning can be applied to the problems I work on.

Recent Projects

Neural Models for Text to SQL

How can we represent the meaning of a question as a query to a SQL database? State-of-the-art systems have applied a sequence-to-sequence machine translation model to this problem. Can we improve upon this using knowledge about the database the user is querying? What about by looking at the context of the question? [Ongoing work]

Effects of Text Corpus Properties on Short Text Clustering Performance

Clustering similar short texts is a two-step process: first, measure the similarity between each pair of texts; second, use that information to group like texts together. We found that the type of corpus influences the best method to use for each step, and that the choice of method for the first step can have an enormous impact on the second step. For example, linguistically creative datasets may benefit from semantic similarity metrics using word embeddings and deep learning, but these metrics make the use of some sophisticated clustering algorithms impracticable. PDF

Strategies for Summarizing Sophisticated Documents

Summarizing sophisticated documents, such as legal cases and scientific journal articles, has the potential to help highly trained professionals keep abreast of information that can help them be better at their jobs. However, this type of document creates special challenges for automatic document summarizers. One such challenge is the presence of exceptionally long sentences. These may contain information that belongs in an extractive summary, but including the entire sentence may crowd out important information. We therefore evaluated several methods of shortening sentences for summarization. PDF


Computer science is a fantastic subject to study and a useful skill for many careers, yet it's one that very few women pursue. Interventions at other colleges and universities have shown that this does not have to be the case. Helping young women discover how much fun this field can be is a cause that means a lot to me. I helped found the UMich CS KickStart program, which provides a week-long computer camp for incoming freshman women, introducing them to programming, the CS department, and a cohort of other women with similar interests. I also volunteer to teach for CS KickStart, Girls Encoded, and similar programs.


I have been a Graduate Student Instructor (GSI) for the following courses:


EECS 595: Natural Language Processing
EECS 543: Knowledge-Based Systems
EECS 445: Intro to Machine Learning
EECS 592: Advanced Artificial Intelligence
EECS 599: Directed Study
EECS 586: Design and Analysis of Algorithms
EECS 484: Database Management Systems
EECS 492: Intro to Artificial Intelligence
EECS 482: Intro to Operating Systems


Office: 3861 BBB
2260 Hayward St.
Ann Arbor, MI 48109