D C


SI 544 Introduction to Statistics and Data Analysis

Resources

SI 544 home

cTools
Readings, assignments, etc. will be posted to the course ctools website

problem sets

software tools for the class

other resources

waiving the stats requirement for HCI


Schedule

Fall 2006:

Lectures will be
Tuesdays and Thursdays
from 9:00 to 10:30 am (argh, I know).
On Tuesdays we will usually meet in 409 West Hall, on Thursdays we will be at the DIAD lab.

Office hours:
Tuesday 10:30-11:30am
in 3082 West Hall




PS9 Tabular data

 

 

The 2005 Pew internet survey on spyware

We will be using the Pew internet survey data which is available to you, along with an MS Word document describing the content of the data and the survey methodology, on cTools. You can load the SPSS formatted data (Spyware.sav) using the read.spss() function included in the foreign library (as we did in class). The Pew Internet Survey report (which you can choose to read or not) is at http://www.pewinternet.org/PPF/r/160/report_display.asp.

1. Sampling and sex

A. (5 pts) Use either the table() or the summary() function to find the number of male and female respondents to this survey.

B. (10 pts) Use the binomial test to see whether the sample could have originated in an unbiased way from a population that is 50% female. Report on the p-value and whether you can reject the null hypothesis that the sampling is unbiased.

C. (10 pts) Read the description of how the survey was conducted. Give possible sources of bias in this survey in recruiting respondents relative to the gender composition of the entire US population. From the same attached methodology description, explain how a balanced sample can be obtained from the full sample.

D. (10 pts) Repeat the test using the prop.test() function. Are the results significantly different? Why or why not.

2. The pessimists

A. (15 pts) Tabulate simultaneously the responses to questions Q1 "Overall, are you satisfied or dissatisfied with the way things are going in this country today? " and Q2 "Generally speaking, would you say that most people can be trusted or that you can’t be too careful in dealing with people?". Then keep only the "satisfied"/"dissatisfied" responses for Q1, and the "Most people can be trusted"/"You can't be too careful" responses for Q2. Copy your table here.

B. (10 pts) Draw a barplot for the table from 2A, making sure to include a legend.

C (10 pts) Do a chi-square test on the table from 2A, commenting on whether you can reject the null hypothesis that trust in others and dissatisfaction with the state of things in general are independent of one another.

D. (30 pts) Repeat the above analysis (tabulating, omitting some rows/columns, barplotting, and finally chi-squaring) but with questions Q2 and Q20 (which asks how often people read user agreements, privacy statements, etc. before downloading and installing software). Can you reject the null hypothesis that the vigilance of individuals when it comes to the fine print on websites is independent of their trust in fellow man?


Extra credit (5 points) Find (another) pair of survey questions with significant interaction between the factors.