SI 544 Introduction to Statistics and Data Analysis

Resources

SI 544 home

cTools
Readings, assignments, etc. will be posted to the course ctools website

problem sets

software tools for the class

other resources

instructor:
Lada Adamic


Schedule

Winter 2008:

Lectures will be
Tuesdays and Thursdays
from 9:00 to 10:30 am.
On Thursdays we will usually meet in 409 West Hall, on Tuesdays we will be at the DIAD lab.

Office hours:
Mon 4-5pm

Tues/Thurs 10:30-11:00am




PS1 Descriptive statistics with R

 

 

1. Handedness
Download the file handedness2008.txt and load it into R. This is the tabulated data of the in-class survey.

  • Calculate the handedness ratio for each student (create a new vector) and draw a histogram. The ratio is given by (right-left)/(right+left).
  • What is the median?
  • What is the mean?
  • What is the standard deviation?
  • Is the distribution left or right skewed (by shape)?

2. CO2 concentrations
Download the files co2last400000years.txt (Petit et al, Nature v.399 (6735), pp. 429-436. (1999)
Data originally downloaded from here ) and co2_mm_mlo.dat (http://www.cmdl.noaa.gov/projects/src/web/trends/co2_mm_mlo.dat, covering 1958-2006). Note that I have edited the files to remove the leading comments (which R would choke on if you don't warn it). If you download the files from the original source, make sure to use the skip argument of the read.table function.

  • Load them both into R using read.table (careful, one of them has column headers and the other doesn't).
  • First take the data since 1958. Create a new vector that contains both the year and month, for example 1980 04 becomes 1980+(4-1)/12 = 1980.25 (approximately, depending on whether the measurement was taken on the first of the month (as this assumes) or in the middle).
  • Plot the CO2 concentration vs. year. What trends do you observe?
  • Now turn to the much longer scale data. The first column is the age of the ice core sample in years, the second column is the CO2 concentration.
  • Calculate a year column vector by subtracting the age from the current year (e.g. 2006-11719).
  • Plot the CO2 concentration vs. year, but extending the x-axis using the xlim parameter to include the year 2006, and adjusting the ylim parameter to go up to 400ppmv.
  • Now use the points() function to add the data points from 1958-2006 in a different color. How do the CO2 levels in recent decades compare to the fluctuations over the last several hundred thousand years?

For both problems, submit a single PDF file that has the R commands you typed in, their output, your interpretation and the figures. The figures can be exported to a file or copied to the clipboard via the drop down menus.