Vahed Qazvinian


  • Introduction: C-LexRank is a summarization system that leverages the diversity of perspectives in a set of documents written about the same subject (e.g., set of citation sentences to a specific paper).

  • Description: C-LexRank first builds the similarity network in which documents are represented as nodes and undirected edges are weighted using the cosine similarity between node pairs. On the similarity network, C-LexRank employs Newman's network community detection method, which is a hierarchical agglomeration algorithm that greedily optimizes network modularity. Finally, it calculates LexRank within each cluster to find the most salient sentences of each community, and picks the sentences with respect to their salience from different clusters (See Qazvinian & Radev 2008 for more details).

  • Download: For academic use only: C-LexRank.v1.0.tar.gz

  • Dependencies: This code depends on the vector space models and community detection methods implemented in CLAIRLIB (http://clairlib.org).

  • Usage: To run the code you need to call the following perl command:
     %perl C-LexRank.pl limit input 
    Here, limit is the summary length in number of sentences, and input is a list file, one document per line.

  • Please cite this paper when you use this code:
      author    = {Qazvinian, Vahed  and  Radev, Dragomir R.},
      title     = {Scientific Paper Summarization Using Citation Summary Networks},
      booktitle = {Proceedings of the 22nd International Conference 
                           on Computational Linguistics (Coling 2008)},
      month     = {August},
      year      = {2008},
      address   = {Manchester, UK},
      publisher = {Coling 2008 Organizing Committee},
      pages     = {689--696},