Datasets

The ACL Anthology Network (AAN) [Download]
- The AAN corpus includes three networks: paper citation, author citation, and author collaboration constructed ftom the ACL Anthology data. It also includes abstracts, full texts, and citations sentences of the ACL Anthology papers.

Diversity in Collective Discourse
- 25 sets of citations and 25 sets of news headlines.
- Each dataset has a "*.txt" file that has 1 summary per line, and a "*.ann" file that has lines of the following format: < factoid id > < tab > < nugget >
- To detect which nuggets/facts a citation contains, one should perform basic string matching.
- For extensive analysis see (Qazvinian and Radev 2011).

Single Paper Summarization (Release 2010)
- Citations to 25 highly cited papers from 5 different domains: Text Summarization, Question Answering, Machien Translation, Textual Entailment, and Dependency Parsing.
- Each dataset has a "*.txt" file that has 1 citation per line, and a "*.ann" file that has lines of the following format: < fact id > < tab > < nugget >
- To detect which nuggets/facts a citation contains, one should perform basic string matching.