Current Projects

Morphological Inference

Drawing on current work on unsupervised morphological inference from monolingual text, I am investigating methods for performing morphological inference from bitext. Assuming that one half of the bitext is in English, it is possible to parse the English and transfer the analysis to the foreign text via statistical word alignment. This approach allows both for improved morpheme segmentation and for glossing of the foreign morphemes, something which is not possible in monolingual approaches.

This work is reported in my dissertation.

Bitext Extraction from Linguistic Documents

One significant, yet underutilized, source of digital linguistic data is traditional print sources, including grammars, lexicons, and texts. In this line of research, I am exploring methods for automatically extracting spans of foreign-language text and correspoding glosses. Challenges include dealing with noisy OCR and performing language identification at the word level.

This work is reported in my dissertation.

Past Projects

Probabilistic Sound Change Reconstruction

Given a probaility distribution over possible sound changes, it is possible to evaluate the likelihoods of competing reconstruction hypotheses. I investigated ways of defining this probability distribution, as well as using these probabilities to drive a search over possible reconstructions and phylogenies.

This project was the basis for my QRP, the paper that is required for advancement to candidacy in the Linguistics department at U-M.

A Semantic Corpus of Baseball

With Ezra Keshet and Stephen Tyndall, I worked on the development of a corpus of semantically-annotated transcripts of radio broadcasts of baseball games. The resulting corpus pairs naturally-occurring English sentences, as spoken by the commentators, with formal representations of in-game events (balls, strikes, hits, outs, stolen bases, etc.).

This work was presented as a poster and short paper at the 11th International Workshop on Computational Semantics:

  • Ezra Keshet, Terry Szymanski, and Stephen Tyndall. 2011. BALLGAME: A Corpus for Computational Semantics. Proceedings of IWCS 9. [PDF]