1. Campaign contributions and state population (100 pts)
For this question, please download the following data from cTools, which was obtained from http://www.opensecrets.org/races/index.php. It contains the 0910 campaign contributions, which I aggregated to the candidate level. You will further aggregate it to the state level and merge it with state population data as follows:
contribbycand = read.table("fedcampaigncontrib2010.dat",head=T)
population = read.table("population.txt",head=T)
bystateandtype = aggregate(contribbycand$amount,by=list(contribbycand$type,contribbycand$state),sum)
colnames(bystateandtype) = c("type","state","amount")
senatebypopulation = merge(population,subset(bystateandtype,type=="senate"),by.x="state",by.y="state")
housebypopulation = merge(population,subset(bystateandtype,type=="house"),by.x="state",by.y="state")
rm(population)
A. (20pts) Run a simple linear regression modeling the total amount donated by state for the house congressional candidates as a function of the state's population (using summary(lm())). From the resulting output, report on and interpret the slope of the regression line and the coefficient of determination. Are you able to reject the null hypothesis that there is no correlation between population and amount of money the candidates receive collectively?
B. (10pts) Overlay the fitted regression line and the prediction and confidence intervals on a scatter plot of the house campaign data.
C. (10pts) For a state having the same population as Michigan, what is the average predicted amount donated to its candidates running for house seats in congress?
C. (20pts) Run the same regression for the senate campaign data (no need to submit a plot, but it may be instructive to look at). Compare the R^{2} (coefficient of determination) for your model of senate campaign contributions and house campaign contributions. Use your knowledge of American government (or acquire it as appropriate) to explain the difference.
D. (10pts) For the model of house campaign contributions, plot the residuals of your linear model as a function of state population.
E. (10pts) In addition, use a qqnorm() plot to evaluate how close to normally distributed the residuals are. Interpret.
F. (20 pts) Test the following hypothesis using the contribbycand data frame. H_{0}: On average, the amount of money obtained by senate and house candidates is equal. Show your analysis, a boxplot, and interpret your result.
