This page includes:
suggestions on finding data sets products to be handed in how to upload your graphs to your ifs space assignment suggestions/hints |
UP504
(Campbell)
Winter 2000 University of Michigan last updated: 2/2/00 |
Assignment 2: Gathering Data, Constructing Tables and Graphs (due Wednesday, February 2)
This assignment consists of two parts.
The first part is to gather data and construct a data table. The second
part is to generate graphic figures, both from the data that you generate
and from a data set made available to the class (to be downloaded).
Skills to be learned:
For
some ideas on accessing web-based data sets, see the:
|
* each student should generate a unique data set
* at least 25 cases and at least 5 variables.
* properly cite the source of the data (and its original format)
* clearly label all variables (units, date, etc.); avoid awkward abbreviations
* provide a complete title to the table
* explain all unclear variables
* properly handle all missing cases
* if useful, provide an extra column(s) with converted variables (e.g., percents, dollar amounts adjusted for inflation, ranks, etc.). Be consistent with number formats; do not include excessive/insignificant digits (e.g., 7.4235%).
* sort the cases based on some logical criterion (e.g., alphabetical, geographic, or ranked by a key variable -- such as population size).
* properly justify the column alignment (e.g., right-justify numbers, line up decimal points)
* if possible, have the table on one
page (this may require some reduction in font size)
For the first data set (from Part 1), create a chart that illustrates what you think are the most interesting patterns in the data. (Exploratory Data Analysis)
For the second data set (U.S. Census data on the 40 largest U.S. cities in 1990), create a chart that shows the relationship between public transit ridership and at least one other variable (such as race, population density, income, age of city, or other variables). (Structured data analysis)
The data set (data on the 40 largest
US cities from the US Census, City and County Data Book) can be
downloaded in one of two ways:
1. Download
from here
(select "Save File" after you click on this link)
2. Using IFS
a) For PC users, go to: user/m/g/mgrewal/public/up504/up504assign2data.xls
b) For Mac users, go to: user/s/d/sdcamp/public/up504winter2000/up504assign2data
to be handed
in (in paper format) on Wednesday, Feb. 2
one data table (Part 1) two graphs (Part 2) |
to be uploaded
onto your ifs space by Tuesday, Feb. 1 (instructions below)
two graphs
|
Class Presentations
You are to
make your two graphs available for presentation during class on February
2. (You do not need to make
your data table available electronically.) To do this, upload (using
FTP)
your excel graphs (in html format) to your ifs space and send the url (web
address) for your files to mgrewal@umich.edu
no later than Tuesday, Feb. 1 (the earlier in the day the better).
2. Select the graph you have done from the class data set and copy it.
3. Open Microsoft Word
4. Go to "Edit | Paste Special"
5. Click "Ok" for "Microsoft Excel Chart Object"
6. Repeat steps 2 - 5 for the graph
you did from the data you collected on
your own.
You should now have both graphs in one Microsoft Word document.
7. Within Microsoft Word, go to "File | Save As HTML"
8. Make sure you are in your IFS space
(Homes on Samba) under the "html" directory within your "Public"
directory. (Note:
directories are sometimes also called "folders".)
If you do NOT have an html directory,
you will need to create it. To do this, click the "make directory"
button [MkDir] within your "Public" directory and name it "html".
9. Name the file 'assign2.html' and click "Save".
10. Click "yes" to the dialogue box
that appears about losing formatting.
This is not an issue since you only
have graphs.
11. Send Mandy an email which tells
her:
- Your full name
- What your unique-name is. This is
the name that appears on your email
account. For example Mandy's unique-name
is mgrewal since her email is
mgrewal@umich.edu
Criteria for
Grading
Table: properly labeled, properly
cited (source of data), complete title (year, variables, units, geography
if relevant), missing cases explained, units explained (e.g., dollars,
in millions), at least 5 variables and 25 cases (more are fine but not
needed).
Graphs: properly labeled and
cited; complete title; makes a point quickly; at least bivariate; no distortions;
minimal "chart junk;" accurate; encourages the eye to compare data; reveals
the data at different levels: both broad overview and details.
Downloading Data
Do see the links listed above, especially on downloading "US Census
Lookup" data to Excel above.
Creating Tables in Excel
A spreadsheet in Excel serves at least three different functions:
1. to be a repository of your data set.
2. to create a data table for presentation and/or publication
3. to be the base for generating Excel charts.
It is often hard to have one single sheet serve all three functions
well. One strategy is to therefore have three separate sheets
(all on the same file) to serve these different functions. You might
call them: "Original Data," "presentation table," "data for charts".
(For example, for the "data for charts" you may want to rearrange the columns,
calculate new variables, change the labels, etc., but not have these manipulations
mess up either your original data set or your presentation table.
Graphing using Excel see also class
notes
1. It is often easier to work with a chart as a separate sheet
than one that is embedded (sitting on top of your data sheet).
Charts as separate sheets are automatically full screen and it is easier
to move around from one chart to another by simply clicking on the tabs
at the bottom.
2. Scatterplots are often a much more effective and direct way to show bivariate relationships that other chart types. You might even try a "bubble chart" to add a third variable. (You can also try multiple series to show more than two variables.)
3. Excel's default background color for charts is usually gray. You might switch to white for better contrast.
4. There is an inevitable trade-off between trying to show lots of data and variables on one chart and yet maintaining a simple, clean graphic where the variable relationships are clear. You may want to show multiple dimensions of the data (think of each variable as a data dimension), but a computer screen or piece of paper has only two dimensions. One strategy, if you can't fit everything on one chart, is to generate a single graphic that consists of several charts side-by-side. The reader's eye can better compare the several graphics if they are the same size, format, scale, etc. Tufte calls this useful design "small multiples."
5. One test of whether your chart is clear is to have a fellow student look at it. If they can quickly see the patterns in the data (rather than patterns in the graphic design) without your verbal explanation, then that is evidence that you are communicating well. But if the chart is so complex or unintuitive that the viewer needs a lengthy explanation, then the graph is not working. Try to simplify, change the data ordering, change the chart type, the scale, try a combination chart, convert variables (e.g., use percents rather than absolutes, such as percent Hispanic population), use different variables, etc.
6. Excel does make it easy to add a "trendline" to your chart. (With two variables, a trendline is essentially a regression line, and the R2 is simply the square of the correlation coefficient "r". R2 values closer to 1.0 indicate a stronger linear relationship.) However, this tool can be abused. Only add a trendline if it helps the reader understand the data patterns. Otherwise, leave it out. (And if you do include a trendline, you might make the line subtle, rather than a think black line, and perhaps make the R2 value small and discrete.)
7. Overall, show the data; have the view think the patterns in
the data, not the graphic design; avoid distortion; encourage the eye to
compare data; clearly label the graph. Give the viewer the greatest
number of ideas, in the shortest time, with the least amount of ink, in
the smallest space. (For Tufte, this means avoiding "chart junk".)
A chart is not a replacement for a table; sometimes a table can communicate
information in ways that a chart can't. So do recognize the limitations
of charts (e.g., that it is often hard to show more than two variables),
but creatively explore ways to reveal the data.