1. Sample size
In class, we've mentioned that the probability of a boy birth is 51.3. In Canada's Aamijiwnaang native reserve, the proportion of boy births from 1993 to 2003 has been reported to be 41.2%. The reserve is located in "Chemical Valley", an area having the largest concentration of petrochemical
manufacturing plants in Canada.
- How large would the sample (# of observed births) have to be in order to be able to say that the observed number of boy births or fewer has less than a 5% probability? Less than a 1% probability?
- Assuming a constant (over time) community membership of 850, and an average Canadian birth rate of 12 per 1,000 population per year, for how many years would you need to collect data in order to obtain a sufficiently large sample?
2. All those distributions
You have an urn with 150 black and 250 white balls. You draw 10 of them without replacement and count the number of black balls. (note, you can plot all the distributions simultaneously using matplot(), or add them one by one to the same plot using plot() and points() or lines()).
- Plot the corresponding hypergeometric distribution for all the possible outcomes.What are the mean and variance?
- Assume that you are drawing with replacement instead. Overlay the corresponding binomial distribution on the hypergeometric. What are the mean and variance? Explain whether the conditions under which the binomial is a good approximation to the hypergeometric are satisfied.
- Assume that the number of black balls is a Poisson distribution with the same mean as the binomial. Overlay the Poisson distribution on the same plot. What are the mean and variance? Explain whether the conditions under which the Poisson is a good approximation to the binomial are satisfied.
- Finally, overlay a plot of a normal distribution with the same mean and variance as the binomial on the plot. How much of an error would you be making in estimating the probability of drawing 6 black balls or more using the normal as opposed to the hypergeometric distribution?
3. Libraries
Download the file http://www-personal.umich.edu/~ladamic/courses/si544w08/data/libraries.dat This data is a subset of the Public Libraries Survey, Fiscal Year 1996 data http://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=2003391 . An explanation of the different columns in the data is given in http://nces.ed.gov/pubs2003/2003391.pdf .
Use the R commands from lecture 6 as your guide (a guide only, you'll need to change the commands to suit the task) in doing the following.
- Calculate the circulation per population served by the library and plot a histogram. Does it look normally distributed? Verify your answer with qqnorm().
- Use the script to calculate the average circulation per population served for the 50 states. Plot a histogram and a qqnorm plot. Is the distribution closer or further away from normal? Give a reason why.
Extra credit (up to 5 points): find another interesting distribution to plot from this data.
|