This page contains links to some network data sets I've compiled over the
years. All of these are free for scientific use to the best of my
knowledge, meaning that the original authors have already made the data
freely available, or that I have consulted the authors and received
permission to the post the data here, or that the data are mine. If you
make use of any of these data, please cite the original sources.
The data sets are in GML format. For a description of GML see here.
GML can be read by many network analysis packages, including Gephi and Cytoscape. I've written a simple
parser in C that will read the files into a data structure. It's available
here. There are many features of GML not
supported by this parser, but it will read the files in this repository
just fine. There is a Python parser for GML available as part of the
NetworkX package here and
another in the igraph package,
which can be used from C, Python, or R. If you know of or develop other
software (Java, C++, Perl, R, Matlab, etc.) that reads GML, let me know.
Zachary's karate club: social network of
friendships between 34 members of a karate club at a US university in the
1970s. Please cite W. W. Zachary, An information flow model for conflict
and fission in small groups, Journal of Anthropological Research33, 452-473 (1977).
Les Miserables: coappearance network of
characters in the novel Les Miserables. Please cite D. E. Knuth,
The Stanford GraphBase: A Platform for Combinatorial Computing,
Addison-Wesley, Reading, MA (1993).
Word adjacencies: adjacency network of common
adjectives and nouns in the novel David Copperfield by Charles
Dickens. Please cite M. E. J. Newman, Phys. Rev. E74,
American College football: network of
American football games between Division IA colleges during regular season
Fall 2000. Please cite M. Girvan and M. E. J. Newman,
Proc. Natl. Acad. Sci. USA99, 7821-7826 (2002).
Dolphin social network: an undirected social
network of frequent associations between 62 dolphins in a community living
off Doubtful Sound, New Zealand. Please cite D. Lusseau, K. Schneider,
O. J. Boisseau, P. Haase, E. Slooten, and S. M. Dawson, Behavioral
Ecology and Sociobiology54, 396-405 (2003). Thanks to David
Lusseau for permission to post these data on this web site.
Political blogs: A directed network of
hyperlinks between weblogs on US politics, recorded in 2005 by Adamic and
Glance. Please cite L. A. Adamic and N. Glance, "The political blogosphere
and the 2004 US Election", in Proceedings of the WWW-2005 Workshop on the
Weblogging Ecosystem (2005). Thanks to Lada Adamic for permission to post
these data on this web site.
Books about US politics: A network of books
about US politics published around the time of the 2004 presidential
election and sold by the online bookseller Amazon.com. Edges between books
represent frequent copurchasing of books by the same buyers. The network
was compiled by V. Krebs and is unpublished, but can found on Krebs' web site. Thanks to Valdis Krebs for
permission to post these data on this web site.
Neural network: A directed, weighted
network representing the neural network of C. Elegans. Data compiled by
D. Watts and S. Strogatz and made available on the web here. Please cite
D. J. Watts and S. H. Strogatz, Nature393, 440-442 (1998).
Original experimental data taken from J. G. White, E. Southgate,
J. N. Thompson, and S. Brenner, Phil. Trans. R. Soc. London314, 1-340 (1986).
Power grid: An undirected, unweighted network
representing the topology of the Western States Power Grid of the United
States. Data compiled by D. Watts and S. Strogatz and made available on
the web here. Please
cite D. J. Watts and S. H. Strogatz, Nature393, 440-442
Condensed matter collaborations 2003:
updated network of coauthorships between scientists posting preprints on
the Condensed Matter E-Print
Archive. This version includes all preprints posted between Jan 1,
1995 and June 30, 2003. The largest component of this network, which
contains 27519 scientists, has been used by several authors as a test-bed
for community-finding algorithms for large networks; see for example
J. Duch and A. Arenas, Phys. Rev. E72, 027104 (2005). These
data can be cited as M. E. J. Newman, Proc. Natl. Acad. Sci. USA98, 404-409 (2001).
Coauthorships in network science:
coauthorship network of scientists working on network theory and
experiment, as compiled by M. Newman in May 2006. A figure depicting the
largest component of this network can be found here. These
data can be cited as M. E. J. Newman, Phys. Rev. E74, 036104
Internet: a symmetrized snapshot of the
structure of the Internet at the level of autonomous systems, reconstructed
from BGP tables posted by the University
of Oregon Route Views Project. This snapshot was created by Mark
Newman from data for July 22, 2006 and is not previously published.
Other sources of network data
There are a number of other pages on the web from which you can download
network data. Here are a few that I am aware of:
data sets: Social network data sets released with the UCINet software
by Steve Borgatti et al.
data sets: Example data sets released with the Pajek software by
Vladimir Batagelj and Andrej Mrvar.
data sets: A set of very large data sets, including some non-network
data sets, compiled by the School of Library and Information Science at
Indiana University. Network data sets include the NBER data set of US
patent citations and a data set of links between articles in the on-line
Duncan Watts' data
sets: Data compiled by Prof. Duncan Watts and collaborators at Columbia
University, including data on the structure of the Western States Power
Grid and the neural network of the worm C. Elegans.
data sets: Data compiled by Prof. Albert-Laszlo Barabasi and
collaborators at the University of Notre Dame, including web data and
Arenas's data sets: Data compiled by Prof. Alexandre Arenas and
collaborators at Universidad Rovira i Virgili, including metabolic network
data and the network from their study of the collaboration patterns of jazz