SI 544 Introduction to Statistics and Data Analysis
 Resources SI 544 home cTools Readings, assignments, etc. will be posted to the course ctools website problem sets software tools for the class instructor: Lada Adamic GSI: Tracy Liu

 Schedule Winter 2010: Lectures will be Tuesdays and Thursdays from 8:30 to 10:00 am. Location WH 311 Discussion section Thursdays 6:30-7:30 WH409 Office hours: Lada: Mon 11am-12pm & Fri 1:30-2:30pm in WH3082 Tracy: Tues 10-11am WH417A Thurs 7:30-8:30 WH409

Syllabus

This course teaches the fundamentals of statistics, that is, the ability to describe data samples and draw inferences about the populations from which they were drawn. It should also sharpen individual intuition about how to read data, interpret data, and judge others' claims about data.

Learning objectives. At the end of this course students should be able to:
• construct a data sample appropriate for a given question/hypothesis and understand biases that can be introduced through sampling
• select appropriate methods to analyze such samples to determine whether the hypothesized effects are statistically significant
• critically analyze the sampling methods and analysis of others (e.g. don't take what the popular press tries to feed you about the latest health-related finding -- be able to read the source study yourself)
• stop worrying and love the data

Prerequisites: none

Reading: There are two required textbooks (to be found at local bookstores):

• Se5 (5th edition) or Se6 (6th edition) Introductory Statistics for the Behavioral Sciences by Welkowitz, Ewen, and Cohen.
• Re1 (1st edition), Re2 (2nd edition) Introductory Statistics with R by Dalgaard

We will be using R in class. R is a statistical programming language, and it is open source. You should bring a laptop to every class for hands-on in-class exercises. If you don't have one, please contact the instructor to arrange for a loaner laptop during classtime.

Assignments and grading (students will complete a small group project)

 date subject reading assignment due 1 Thu 1/7 intro S.ch1: Introduction 2 Tue 1/12 descriptive statistics S.ch2-5 (descriptive statistics) Re1.ch1: Basics or Re2.ch1: Basics and Re2.ch2: the R environment 3 Thu 1/14 probability intro McClave & Sincich Ch 3 (available on cTools) PS 1 due 1/18 4 Tue 1/19 discrete distributions: the binomial and hypergeometric Re1.ch2/Re2.ch3: probability and distributions McClave & Sincich Ch 4.1-4.5 (available on cTools) 5 Thu 1/21 practice with discrete distributions Se5.ch9/Se6.8: Normal distribution Se5.ch6/Se6:7: Z and T scores PS 2 due 1/25 6 Tue 1/26 poisson distribution, transformed scores and the normal distribution McClave & Sincich Ch 4.6: the poisson 7 Thu 1/28 graphical descriptions of data Se5.ch9/Se6.8: Additional techniques for describing batches of data Re1.ch3/Re2.ch4: descriptive statistics and graphics PS 3 due 2/1 8 Tue 2/2 sampling A1,A2,A3* 9 Thu 2/4 concepts of statistical inference Se5.ch8&ch9/Se6.ch9 PS 4 due 2/8 10 Tue 2/9 outliers, confidence intervals,significance testing get started early on Thursday's reading 11 Thu 2/11 one sample tests Se5/e6.ch10 Re1.ch4,Re2.ch5 PS 5 due 2/15 12 Tue 2/16 two sample tests Se5/e6.ch11 13 Thu 2/18 simple linear regression Se5/6.ch12&13 PS 6 due 2/22 form group & select topic 14 Tue 2/23 review for midterm catch-up on reading 15 Thu 2/25 midterm Tue 3/2 -- winter break -- Thu 3/4 -- winter break -- 16 Tue 3/9 more regression and correlation Re1.ch5,Re2.ch6 17 Thu 3/11 analysis of variance Se6.ch15 & ch 17 Se5.ch15 & ch 16 PS 7 due 3/15 18 Tue 3/16 more analysis of variance Re1.ch6, Re2.ch7 project progress report due 3/17 19 Thu 3/18 statistical communication (I) A4*,A5* article review due 3/22 20 Tue 3/23 discussion of article reviews 21 Thu 3/25 tabular data, chi-squared Se5.ch17, Se6.ch20 PS 8 due 3/29 22 Tue 3/30 more tabular data Re1.ch7, Re2.ch8 23 Thu 4/1 power, multiple regression Se5&Se6: ch14 PS 9 due 4/5 24 Tue 4/6 logistic regression Re1:ch9&ch11, Re2: ch10&ch12 25 Thu 4/8 more multiple & logistic regression PS 10 due 4/12 26 Tue 4/13 student project presentations 27 Thu 4/15 student project presentations project report due 4/19 28 Tue 4/20 review (leftovers in R:) take home final given out due 4/23

*The following can be obtained from cTools:

• A1: Freakonomics Introduction: the hidden side of everything
• A2: Freakonomics 1. What do schoolteachers and sumo wrestlers have in common?
• A3: Feakonomics 5. What makes a perfect parent?
• A4: Fairness and the Assumptions of Economics
Daniel Kahneman; Jack L. Knetsch; Richard H. Thaler
The Journal of Business, Vol. 59, No. 4, Part 2, 1986
• A5: Joel Best. 2004. “Chapter 1: Missing Numbers.” in More Damned Lies and Statistics. Berkeley and Los Angeles: University of California Press.

Here are some practice exams:

2006: midterm (solution), final (solution) (tennisdata.txt, tennisballweights.txt, you need to email me for Pew Survey)
2008: midterm (solution), final (solution) (MovieGenresInAsia.txt, MoviesCountryGenre.txt, BoxBudgetRating.txt)