This page includes:
suggestions on finding data sets
products to be handed in
how to upload your graphs to your ifs space
assignment suggestions/hints
UP504 (Campbell)
Winter 2000
University of Michigan
last updated:  2/2/00

Assignment 2: Gathering Data, Constructing Tables and Graphs (due Wednesday, February 2)

This assignment consists of two parts. The first part is to gather data and construct a data table. The second part is to generate graphic figures, both from the data that you generate and from a data set made available to the class (to be downloaded).
 

Skills to be learned:

Part 1: Gathering Data and Constructing a Data Table
You are to gather data on an urban planning or policy related theme. You may obtain the data from a source of your choice: e.g., U.S. Census, United Nations, World Bank, Bureau of Labor Statistics, etc. You might also look at non-governmental sources. The goals of the assignment are to gain experience with locating data sets, selecting variables and cases, loading them into spreadsheets, and properly documenting, labeling and citing your data. The variables, unit of analysis, time and space frame are your decision. Convert this data into an Excel spreadsheet form (or some other appropriate spreadsheet format). (You are to turn in only the printed table; do not turn in the raw data or an electronic version of the data.)
 
 
For some ideas on accessing web-based data sets, see the:
* the source of your data should not already be in spreadsheet format

* each student should generate a unique data set

* at least 25 cases and at least 5 variables.

* properly cite the source of the data (and its original format)

* clearly label all variables (units, date, etc.); avoid awkward abbreviations

* provide a complete title to the table

* explain all unclear variables

* properly handle all missing cases

* if useful, provide an extra column(s) with converted variables (e.g., percents, dollar amounts adjusted for inflation, ranks, etc.). Be consistent with number formats; do not include excessive/insignificant digits (e.g., 7.4235%).

* sort the cases based on some logical criterion (e.g., alphabetical, geographic, or ranked by a key variable -- such as population size).

* properly justify the column alignment (e.g., right-justify numbers, line up decimal points)

* if possible, have the table on one page (this may require some reduction in font size)
 

 
Part 2: Graphing the Data
Generate two graphs: one from your individual data set in Part 1, and one from the data set to be made available to the class as a whole. (You might experiment with alternative formats for each, and then select the best.) Be careful in your selection of the type of graph and the variables. (See Tufte for guidelines.) Clearly label the graph elements, with a title and sources.

For the first data set (from Part 1), create a chart that illustrates what you think are the most interesting patterns in the data. (Exploratory Data Analysis)

For the second data set (U.S. Census data on the 40 largest U.S. cities in 1990), create a chart that shows the relationship between public transit ridership and at least one other variable (such as race, population density, income, age of city, or other variables). (Structured data analysis)

The data set (data on the 40 largest US cities from the US Census, City and County Data Book) can be downloaded in one of two ways:
1. Download from here (select "Save File" after you click on this link)

2. Using IFS
a) For PC users, go to: user/m/g/mgrewal/public/up504/up504assign2data.xls
b) For Mac users, go to: user/s/d/sdcamp/public/up504winter2000/up504assign2data
 

Products
to be handed in (in paper format) on Wednesday, Feb. 2

one data table (Part 1)

two graphs (Part 2)

to be uploaded onto your ifs space by Tuesday, Feb. 1 (instructions below)

two graphs
 

Class Presentations
You are to make your two graphs available for presentation during class on February 2. (You do not need to make your data table available electronically.)  To do this, upload (using FTP) your excel graphs (in html format) to your ifs space and send the url (web address) for your files to mgrewal@umich.edu no later than Tuesday, Feb. 1 (the earlier in the day the better).


Here are the instructions on how to upload your file to your IFS space.
1. Make sure the graphs are the final version.

2. Select the graph you have done from the class data set and copy it.

3. Open Microsoft Word

4. Go to "Edit | Paste Special"

5. Click "Ok" for "Microsoft Excel Chart Object"

6. Repeat steps 2 - 5 for the graph you did from the data you collected on
your own.

You should now have both graphs in one Microsoft Word document.

7. Within Microsoft Word, go to "File | Save As HTML"

8. Make sure you are in your IFS space (Homes on Samba) under the "html" directory within your "Public"
directory.   (Note:  directories are sometimes also called "folders".)
If you do NOT have an html directory, you will need to create it.  To do this, click the  "make directory" button [MkDir] within your "Public" directory and name it "html".

9. Name the file 'assign2.html' and click "Save".

10. Click "yes" to the dialogue box that appears about losing formatting.
This is not an issue since you only have graphs.

11. Send Mandy an email which tells her:
- Your full name
- What your unique-name is. This is the name that appears on your email
account. For example Mandy's unique-name is mgrewal since her email is
mgrewal@umich.edu



 
 

Criteria for Grading
Table: properly labeled, properly cited (source of data), complete title (year, variables, units, geography if relevant), missing cases explained, units explained (e.g., dollars, in millions), at least 5 variables and 25 cases (more are fine but not needed).

Graphs: properly labeled and cited; complete title; makes a point quickly; at least bivariate; no distortions; minimal "chart junk;" accurate; encourages the eye to compare data; reveals the data at different levels: both broad overview and details.
 



Suggestions/Hints for Assignment 2

Downloading Data
Do see the links listed above, especially on downloading "US Census Lookup" data to Excel above.

Creating Tables in Excel
A spreadsheet in Excel serves at least three different functions:
1.  to be a repository of your data set.
2.  to create a data table for presentation and/or publication
3.  to be the base for generating Excel charts.

It is often hard to have one single sheet serve all three functions well.  One strategy is to therefore have three separate sheets (all on the same file) to serve these different functions.  You might call them:  "Original Data," "presentation table," "data for charts".   (For example, for the "data for charts" you may want to rearrange the columns, calculate new variables, change the labels, etc., but not have these manipulations mess up either your original data set or your presentation table.
 

Graphing using Excel  see also class notes
1.  It is often easier to work with a chart as a separate sheet than one that is embedded (sitting on top of your data sheet).   Charts as separate sheets are automatically full screen and it is easier to move around from one chart to another by simply clicking on the tabs at the bottom.

2.  Scatterplots are often a much more effective and direct way to show bivariate relationships that other chart types.   You might even try a "bubble chart" to add a third variable.  (You can also try multiple series to show more than two variables.)

3.  Excel's default background color for charts is usually gray.  You might switch to white for better contrast.

4.  There is an inevitable trade-off between trying to show lots of data and variables on one chart and yet maintaining a simple, clean graphic where the variable relationships are clear.  You may want to show multiple dimensions of the data (think of each variable as a data dimension), but a computer screen or piece of paper has only two dimensions.  One strategy, if you can't fit everything on one chart, is to generate a single graphic that consists of several charts side-by-side.  The reader's eye can better compare the several graphics if they are the same size, format, scale, etc.  Tufte calls this useful design "small multiples."

5.  One test of whether your chart is clear is to have a fellow student look at it.  If they can quickly see the patterns in the data (rather than patterns in the graphic design) without your verbal explanation, then that is evidence that you are communicating well.  But if the chart is so complex or unintuitive that the viewer needs a lengthy explanation, then the graph is not working.   Try to simplify, change the data ordering, change the chart type, the scale, try a combination chart, convert variables (e.g., use percents rather than absolutes, such as percent Hispanic population), use different variables, etc.

6.  Excel does make it easy to add a "trendline" to your chart.  (With two variables, a trendline is essentially a regression line, and the R is simply the square of the correlation coefficient "r". R values closer to 1.0 indicate a stronger linear relationship.)   However, this tool can be abused.  Only add a trendline if it helps the reader understand the data patterns.  Otherwise, leave it out.  (And if you do include a trendline, you might make the line subtle, rather than a think black line, and perhaps make the R2 value small and discrete.)

7.  Overall, show the data; have the view think the patterns in the data, not the graphic design; avoid distortion; encourage the eye to compare data; clearly label the graph.  Give the viewer the greatest number of ideas, in the shortest time, with the least amount of ink, in the smallest space.  (For Tufte, this means avoiding "chart junk".)  A chart is not a replacement for a table;  sometimes a table can communicate information in ways that a chart can't.  So do recognize the limitations of charts (e.g., that it is often hard to show more than two variables), but creatively explore ways to reveal the data.