UP504 • Multiple Regression

MORE COMMENTS

last updated: Wednesday, January 16, 2008

Assignment One

 

see also regression notes

return to UP504 main page


QUESTION: How do I start the assignment -- just jump into running regression on SPSS?
ANSWER:
I would strongly recommend that you first start with exploring the range/variation of housing values across the county (see the thematic maps of this linked to the assignment page). Then sit down and conceptually think of what variables would most likely be useful in explaining this variation in housing value. Some (but certainly not all) of these variables will be in the data set. (And remember: the unit of analysis is the census tract, not the individual hhd: so be wary of committing an ecological fallacy...).

 

QUESTION: how many variables in my model?
ANSWER
You can use as many variables as you want -- or at least as many as make up a good model. Too few variables may lead to an under-specified model. Remember: there is only one dependent variable (median housing value), but two or more ind. variables.

 

QUESTION: What is more important: coefficients that are significant at the .000 level or a higher F score?
ANSWER
That is ostensibly a trade-off, but the key is to make sure that all the variables are significant at the .05 level. (.000 is even more stat. significant, but .05 is certainly sufficient.) As long as the F is significant, I would not hesitate about adding an additional variable to your model (even if it reduces F somewhat) as long as all the ind. variables are sign. and that you note an increase in your R2. (There are trade-offs between a parsimonious model and one with lots of ind. variables.)

 

QUESTION: How to remove an outlier case (such as "Census Tract 4229" in Washtenaw county ) -- that is, how do I "Run if Census tract unequal to 4229"? we could not figure out how to do this. We simply deleted this data "4229" from the data sheet.
ANSWER
You can either simply delete the case (which is the easiest but essentially a permanent solution). or, you can -- within the regression command box -- click on "IF", and instruct SPSS to include only those cases where census tract is not =4229 (or use some other criterion).
To check which cases are used in the analysis, select (also in the regression command box), under residuals, a diagnostic of ALL CASES (and you an include census tract as a case label). You can then see which cases are included, and also what the residual values are of each case. (here the residual means the gap or difference between the predicted value and the actual value for each case).

 

QUESTION: How to create a dummy variable to specify a specific geographic areas (e.g., a set of census tracts)? i.e., identify census tracts which lie in the Ypsilanti area? Could we create a dummy variable about this?
ANSWER
For Ypsi, see the census tract map to identify those in that city. (I think it is 4102-4103, and 4106-4112) -- but confirm. You can then create a new variable that has the value of "1" for these cases and "0" for the rest. (You can do this either through RECODE into new variable, or else simply do it manually by creating a new variable and then typing in zeros and ones.)

 

QUESTION: How to define the threshold for converting an interval scale variable into a dummy variable (e.g., "high percentage of seniors")?
ANSWER
If you are converting an interval scale variable to a dummy variable, you can use whatever threshold value (to separate the "0" and "1" values) you wish. You might look for a natural break in the data, or use the median value, or some other value that makes sense. You can use trial and error here. (It is NOT necessary to have the 0 and 1 values evenly distributed.) That is, you could try 10% or 15% or whatever seems to work and/or theoretically makes sense. Sometimes a scatterplot helps you see a logical break point.