I finished my PhD in Computer Science from University of Michigan, Ann Arbor under Prof. Dragomir Radev and have been working with Microsoft as Applied Scientist starting June, 2015. You can find my complete resume here.
My specialization is in the field of Natural Language Processing; my thesis research was focussed on analyzing and summarizing text in scientific papers. You can find more details about my work in my thesis and other publications linked below.
I maintain a blog here for posting bits of creative writing.
NLP Driven Models for Automatically Generating Survey Articles for Scientific Topics.
This thesis presents new methods that use natural language processing (NLP) driven models for summarizing research in scientific fields. Given a topic query in the form of a text string, we present methods for finding research articles relevant to the topic as well as summarization algorithms that use lexical and discourse information present in the text of these articles to generate coherent and readable extractive summaries of past research on the topic. In addition to summarizing prior research, good survey articles should also forecast future trends. With this motivation, we present work on forecasting future impact of scientific publications using NLP driven features.
, Singled Spaced version
- Rahul Jha, Amjad Abu-Jbara, Vahed Qazvinian and Dragomir Radev. NLP Driven Citation Analysis for Scientometrics. Journal of Natural Language Engineering, 2016. (PDF)
- Rahul Jha, Catherine Finegan-Dollak, Ben King, Reed Coke and Dragomir Radev. Content Models for Survey Generation: A Factoid-Based Evaluation. Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2015. (PDF)
- Kokil Jaidka, Muthu Kumar Chandrasekaran, Rahul Jha, Christopher Jones, Min-Yen Kan, Ankur Khanna, Diego Molla-Aliod, Dragomir R. Radev, Francesco Ronzano, and Horacio Saggion. The computational linguistics summarization pilot task. In Proceedings of TAC, 2014. (PDF)
- Rahul Jha, Reed Coke and Dragomir Radev. Surveyor: A system for generating coherent survey articles for scientific topics. In Proceedings of Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015. (PDF)
- Rahul Jha, Amjad Abu-Jbara and Dragomir Radev. A system for summarizing scientific topics starting from keywords. In Proceedings of The Association for Computational Linguistics (short paper), 2013. (PDF, Data)
- Benjamin King, Rahul Jha, and Dragomir Radev. Heterogeneous Networks and Their Applications: Scientometrics, Name Disambiguation, and Topic Modeling. In Transactions of the Association for Computational Linguistics, 2014. (PDF)
- Benjamin King, Rahul Jha, and Dragomir Radev. Random walk factoid annotation for collective discourse. In Proceedings of The Association for Computational Linguistics (short paper), 2013.
- Amjad Abu-Jbara, Rahul Jha, Eric Morley, and Dragomir Radev. Experimental results on the native language identification shared task. In Proceedings of The NAACL 2013 Workshop on Native Language Identification, 2013.
- Rahul Jha and Dragomir Radev. An unsupervised method for learning probabilistic first order logic models from unstructured clinical text. In ICML Workshop on Learning from Unstructured Clinical Text, Bellevue, Washington, USA, July 2nd, 2011. (PDF)
- Ahmed Hassan, Amjad Abu-Jbara, Rahul Jha and Dragomir Radev. Identifying the semantic orientation of foreign words. in 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, Oregon, USA, June 19-24, 2011. (PDF)
- Carroll, a logic form generator (Github project)
- I re-implemented POS and NER code for the "NLP from Scratch" paper by Collobert et al. 2011. You can find the two scripts I used here. This is built on top of Collobert's Torch library, which can be found here: http://torch.ch. You'll need to get the data yourself, but hopefully this can help you get started.