SI 544 Introduction to Statistics and Data Analysis

Resources

SI 544 home

cTools
Readings, assignments, etc. will be posted to the course ctools website

problem sets

software tools for the class

other resources

instructor:
Lada Adamic


Schedule

Winter 2008:

Lectures will be
Tuesdays and Thursdays
from 9:00 to 10:30 am.
On Thursdays we will usually meet in 409 West Hall, on Tuesdays we will be at the DIAD lab.

Office hours:
Mon 4-5pm

Tues/Thurs 10:30-11:00am




PS6 t-tests

 

 

Skim through the paper by Bhavnani et al. "Strategy Hubs: Domain Portals to Help Find
Comprehensive Information". Read pages 12-16 carefully, because it will describe the dataset bhavnani.txt, that you'll be working with. The paper, dataset, as well as sample surveys used in the study are all available on cTools.

1. Plotting summary statistics (40 pts)
A. (20 pts) Reproduce figure 6 on page 16 of the Bhavani et al. paper. You may find it useful to use barplot() with beside=T and legend=T arguments, as well as the tapply() function to obtain the mean scores by tool used for the two kinds of questions separately (you can bind them using rbind() or cbind() before passing the means to barplot()). Other methods are OK too, as long as you submit your R code and the figure (including a legend and axes labels) you obtain.

B. (20 pts) Construct a plot (boxplot()) which has 2 boxplots of scores corresponding to the two different kinds of questions: "treatment" and "diagnosis". From the boxplots, does the difficulty of one set of questions look substantially different than the difficulty of the other?

2. t-tests (40 pts)

A. (10 pts) Use the t-test to compare the average score of subjects using any search tool they choose ("anytool"), to subjects using the Strategy Hub ("strategyhub"). Report the p-value and the 95% confidence interval for the difference in the mean. Use the p-value to conclude whether the difference is statistically significant at any of the following levels: 0.1, 0.05, 0.01.

B. (10 pts) Answer the same, comparing subjects using the Strategy Hub with subjects using MedlinePLUS.

C. (20 pts) Finally, compare "Strategy Hub" vs. "MedlinePLUS" and "Strategy Hub" vs. "any tool" just for questions having to do with diagnosis. For both t-tests, report your p-values. Do they agree with the p-value upper bounds (e.g. "p < 0.05") given in the paper?

3. Uncle Chuck's co-op (20 pts)

Use the data on sales at the co-op in 1979 to show that the Friday cashier is a thief. Give a justification and your confidence in your conclusion.

Extra credit: 5pts
Give an estimate of the total amount of money the Friday cashier took off with.