Current Projects
Morphological Inference
Drawing on current work on unsupervised morphological inference from monolingual text, I am investigating methods for performing morphological inference from bitext. Assuming that one half of the bitext is in English, it is possible to parse the English and transfer the analysis to the foreign text via statistical word alignment. This approach allows both for improved morpheme segmentation and for glossing of the foreign morphemes, something which is not possible in monolingual approaches.
This work is reported in my dissertation.
Bitext Extraction from Linguistic Documents
One significant, yet underutilized, source of digital linguistic data is traditional print sources, including grammars, lexicons, and texts. In this line of research, I am exploring methods for automatically extracting spans of foreign-language text and correspoding glosses. Challenges include dealing with noisy OCR and performing language identification at the word level.
This work is reported in my dissertation.