UP504 • Gathering Data, Constructing Tables and Graphs

last updated: Wednesday, February 21, 2007

Assignment Two

due Wednesday, March 7 [new date]
results to be posted here

see also notes on downloading data

return to UP504 main page

Elements of this page:
Part One:  gathering data and constructing a data table
suggestions on finding data sets
Part Two:  graphing the data (including data set to download)
products to be handed in
criteria for grading
assignment suggestions/hints

 

Assignment

This assignment consists of two parts. The first part is to gather data and construct a data table. The second part is to generate graphic figures, both from the data that you generate and from a data set made available to the class.  Please follow the instructions carefully.

You are to work in teams of two students. (Note: in group projects, you are to turn in a single, integrated write-up. All group members receive the same grade.)
 

Skills to be learned:


Part 1: Gathering Data and Constructing a Data Table

You are to gather data on an urban planning or policy related theme. You may obtain the data from a source of your choice: e.g., United Nations, World Bank, Bureau of Labor Statistics, etc. You might also look at non-governmental sources. (Please do not use data from the Decennial U.S. Census.) The goals of the assignment are to gain experience with locating data sets, selecting variables and cases, loading them into spreadsheets, and properly documenting, labeling and citing your data. The variables, unit of analysis, time and space frame are your decision. Convert this data into an Excel spreadsheet form (or some other appropriate spreadsheet format). (You are to turn in only the printed table; do not turn in the raw data or an electronic version of the data.)
 
 

For some ideas on accessing web-based data sets, see the:

Common mistakes made in data tables:
an example of a flawed data table (an MS Word document)
the same table with annotations of problems
(Note: you might view the first table, identify as many problems as you can, and then check the second version to see the answers.)

 

 

* the source of your data should not already be in spreadsheet format

* the data should NOT be from the Decennial US Census

* each student should generate a unique data set

* at least 25 cases and at least 5 variables.

* properly cite the source of the data (and its original format)

* clearly label all variables (units, date, etc.); avoid awkward abbreviations

* provide a complete title to the table

* explain all unclear variables

* properly handle all missing cases

* if useful, provide an extra column(s) with converted variables (e.g., percents, dollar amounts adjusted for inflation, ranks, etc.). Be consistent with number formats; do not include excessive/insignificant digits (e.g., 7.4235%).

* sort the cases based on some logical criterion (e.g., alphabetical, geographic, or ranked by a key variable -- such as population size).

* properly justify the column alignment (e.g., right-justify numbers, line up decimal points)

* if possible, have the table on one page (this may require some reduction in font size)

NOTE: The second data does NOT need to be related in any way to the first data set we provide you (on public transportation). Simply choose a data set of interest to you.


 


Part 2: Graphing the Data
Generate two graphs: one from your individual data set in Part 1, and one from the data set provided (see below). (You might experiment with alternative formats for each, and then select the best.) Be careful in your selection of the type of graph and the variables. (See Tufte for guidelines.) Clearly label the graph elements, with a title and sources.

For the first data set (from Part 1), create a chart that illustrates what you think are the most interesting patterns in the data. (Exploratory Data Analysis)
 

For the second data set (from the 2000 US Census), create a graph that reveals an interesting relationship between the percent of people who commute by public transportation and one or several other variables. (Structured data analysis). Note: there are MANY more variables here than you can display, so choose a few carefully. There is certainly a wide variety of ways to answer this question, and I look forward to seeing the range of results.

DOWNLOAD the data set (up504w07assign2data.xls)


Products (follow instructions carefully)

to be handed in (in paper format) on March 7 (during class)

one data table (Part 1)

two graphs (Part 2)

to be uploaded onto your ifs space before class on March 7 (instructions below)

one data table (Part 1)

two graphs (Part 2)

 

Class Presentations

You are to make your two graphs available for presentation during class on Wednesday March 7

See guidelines on developing web pages.

Be sure to use this standardized url format:

www-personal.umich.edu/~youruniquename/up504/assign2/index.html

no spaces, all lowercase (no caps)


 
results:  student graphs

Once you create your file and upload it into your html directory on your ifs space, check the link on the above page to make sure that the link works. (Best: check from another computer WITHOUT being logged onto to your own IFS space.) Be sure that all your files are uploaded into your ifs space -- and not just sitting in your local harddrive.

 

Graphs printed from Excel versus graphs published on the web

Note:  sometimes a graph that works well in Excel does not not look as good on the web, or vice versa.   (There are many reasons for this:  the web is in color, but printed graphs usually in black and white;  screen size vs. paper size;  different resolutions of pixel display vs. printing;  etc.)   Therefore you may need to slightly alter your graph to make it look good on the web.  Be sure that:


 


Criteria for Grading
 

Table
properly labeled
properly cited (source of data)
complete title (year, variables, units, geography if relevant)
missing cases explained
units explained (e.g., dollars, in millions)
at least 5 variables and 25 cases (more are fine but not needed)
Graphs
properly labeled and cited
complete title
makes a point quickly
at least bivariate
no distortions
minimal "chart junk" 
accurate
encourages the eye to compare data
reveals the data at different levels: both broad overview and details 

 
 



Suggestions/Hints for Assignment 2

Creating Tables in Excel

A spreadsheet in Excel serves at least three different functions:
1.  to be a repository of your data set.
2.  to create a data table for presentation and/or publication
3.  to be the base for generating Excel charts.

It is often hard to have one single sheet serve all three functions well.  One strategy is to therefore have three separate sheets (all on the same file) to serve these different functions.  You might call them:  "Original Data," "presentation table," "data for charts".   (For example, for the "data for charts" you may want to rearrange the columns, calculate new variables, change the labels, etc., but not have these manipulations mess up either your original data set or your presentation table.
 

Graphing using Excel


1.  It is often easier to work with a chart as a separate sheet than one that is embedded (sitting on top of your data sheet).   Charts as separate sheets are automatically full screen and it is easier to move around from one chart to another by simply clicking on the tabs at the bottom.  You can always convert a graph as "object" to a new sheet through the "location" feature.

2.  Scatterplots are often a much more effective and direct way to show bivariate relationships that other chart types.   You might even try a "bubble chart" to add a third variable.  (You can also try multiple series to show more than two variables.)

3.  Excel's default background color for charts is usually gray.  You might switch to white for better contrast.

4.  There is an inevitable trade-off between trying to show lots of data and variables on one chart and yet maintaining a simple, clean graphic where the variable relationships are clear.  You may want to show multiple dimensions of the data (think of each variable as a data dimension), but a computer screen or piece of paper has only two dimensions.  One strategy, if you can't fit everything on one chart, is to generate a single graphic that consists of several charts side-by-side.  The reader's eye can better compare the several graphics if they are the same size, format, scale, etc.  Tufte calls this useful design "small multiples."

5.  One test of whether your chart is clear is to have a fellow student look at it.  If they can quickly see the patterns in the data (rather than patterns in the graphic design) without your verbal explanation, then that is evidence that you are communicating well.  But if the chart is so complex or unintuitive that the viewer needs a lengthy explanation, then the graph is not working.   Try to simplify, change the data ordering, change the chart type, the scale, try a combination chart, convert variables (e.g., use percents rather than absolutes, such as percent Hispanic population), use different variables, etc.

6.  Excel does make it easy to add a "trend line" to your chart.  (With two variables, a trend line is essentially a regression line, and the R is simply the square of the correlation coefficient "r". R values closer to 1.0 indicate a stronger linear relationship.)   However, this tool can be abused.  Only add a trend line if it helps the reader understand the data patterns.  Otherwise, leave it out.  (And if you do include a trend line, you might make the line subtle, rather than a think black line, and perhaps make the R2 value small and discrete.)

7.  Overall, show the data; have the view think the patterns in the data, not the graphic design; avoid distortion; encourage the eye to compare data; clearly label the graph.  Give the viewer the greatest number of ideas, in the shortest time, with the least amount of ink, in the smallest space.  (For Tufte, this means avoiding "chart junk".)  A chart is not a replacement for a table;  sometimes a table can communicate information in ways that a chart can't.  So do recognize the limitations of charts (e.g., that it is often hard to show more than two variables), but creatively explore ways to reveal the data.

 

Alternative Software

You might find that other software (e.g., SPSS) has better graphing capabilities. As an option, you may choose to create some or all of your assignment with another application. (We stress Excel on this assignment because it is the most widely available software, and most planners will end up in offices with Excel on their machines.)