SI 544 Introduction to Statistics and Data Analysis


SI 544 home

Readings, assignments, etc. will be posted to the course ctools website

problem sets

software tools for the class

other resources

Lada Adamic


Winter 2008:

Lectures will be
Tuesdays and Thursdays
from 9:00 to 10:30 am.
On Thursdays we will usually meet in 409 West Hall, on Tuesdays we will be at the DIAD lab.

Office hours:
Mon 4-5pm

Tues/Thurs 10:30-11:00am




This course teaches the fundamentals of statistics, that is, the ability to describe data samples and draw inferences about the populations from which they were drawn. It should also sharpen individual intuition about how to read data, interpret data, and judge others' claims about data.

Specifically, at the end of this course students should be able to:
  • characterize population data intuitively for themselves and others;
  • draw conclusions and inferences from population data;
  • check assumptions of others' claims and debug their putative "facts";
  • look for correlations while controlling for confounding effects

Prerequisites: none

Reading: We will be using two textbooks:

  • Introductory Statistics for the Behavioral Sciences (5th or 6th Edition) by Welkowitz, Ewen, and Cohen.
  • Using R for Introductory Statistics by John Verzani

Both books are required and will be available at Ulrich's.

Assignments and grading (students will complete a small group project)

Instructor: Lada Adamic

Course Syllabus (click on PDF/PPT icon to download lab notes)

  date subject reading assignment due
1 Thu 1/3 intro S.ch1: Introduction  
2 Tue 1/8 lab: descriptive statistics S.ch2-5 (descriptive statistics)
R.ch2:univariate data
3 Thu 1/10

probability intro

McClave & Sincich Ch 3 (available on cTools)  
4 Tue 1/15 lab: discrete distributions R.ch5: describing populations PS 1
5 Thu 1/17 continuous distributions S.ch9  
6 Tue 1/22

lab: scatter plots and transformed scores

McClave & Sincich Ch 4 & 5 (available on cTools) PS 2
7 Thu 1/24 sampling A1,A2,A3*  
8 Tue 1/29 multivariate data R.ch4: multivariate data PS 3
9 Thu 1/31 concepts of statistical inference S.ch8  
10 Tue 2/5 lab: outliers, confidence intervals R 7.1-7.4: Confidence intervals PS 4
11 Thu 2/7 significance testing S.ch10, S.ch11  
12 Tue 2/12 lab: one and two sample tests R.ch8 PS 5
13 Thu 2/14 simple linear regression S.ch12, S.ch13
13 Tue 2/19 review for midterm catch-up on reading PS 6
14 Thu 2/21 midterm  
  Tue 2/26 -- spring break --    
  Thu 2/28 -- spring break --
15 Tue 3/4 lab: simple linear regression and correlation pdf icon R. 3.3-3.4 and R 10.1 - 10.2 project progress report
16 Thu 3/6 analysis of variance S.ch15 & ch 16  
17 Tue 3/11 lab: analysis of variance R.ch11 PS 7
18 Thu 3/13 statistical communication (I) A4*,A5*  
19 Tue 3/18     article review
20 Thu 3/20 tabular data S.ch17  
21 Tue 3/25 lab: tabular data R 8, 9.1-9.2 PS 8
22 Thu 3/27 power, multiple regression S.ch14  
23 Tue 4/1 guest lecture   PS 9
24 Thu 4/3 logistic regression    
25 Tue 4/8 lab: multiple & logistic regression R 10.3, R 12.1 project report
26 Thu 4/10 student project presentations    
27 Tue 4/15 review (leftovers in R: ) take home final given out due 4/18

*The following can be obtained from cTools:

  • A1: Freakonomics Introduction: the hidden side of everything
  • A2: Freakonomics 1. What do schoolteachers and sumo wrestlers have in common?
  • A3: Feakonomics 5. What makes a perfect parent?
  • A4: Fairness and the Assumptions of Economics
    Daniel Kahneman; Jack L. Knetsch; Richard H. Thaler
    The Journal of Business, Vol. 59, No. 4, Part 2, 1986
  • A5: Joel Best. 2004. “Chapter 1: Missing Numbers.” in More Damned Lies and Statistics. Berkeley and Los Angeles: University of California Press.

Here are some practice exams:

2006: midterm (solution), final (solution) (tennis data set, you need to email me for Pew Survey)
2008: midterm (solution), final (solution) (MovieGenresInAsia.txt, MoviesCountryGenre.txt, BoxBudgetRating.txt)