Making a control area for Detroit Empowerment Zone (DEZ) using GIS and factor analysis

 

Classification is the systematic grouping of objects or events into classes on the basis of properties or relationships they have in common (Abler, Adams, and Gould, 1971).  Groups are commonly understood as clusters of events or objects defined in terms of similarity.  However, a primary question is what we mean when we say that A is similar to B.  If we call the events or objects themselves to be classified “the operational taxonomic units (OTU’s)”, the degree of similarity is measured by distances of each OTU from every other OTU in the taxonomic space where every OTU as a point has a position value. 

 

For instance, Point P and Q on X, Y plane has each position value, P (x1, y1) and Q (x2, y2), then the distance between P and Q is .  Using this concept we can measure all the distances between pairs of OTU’s and the same concept can be applied regardless of a higher dimension of taxonomic space.  More important, the value of position for an OTU in a taxonomic space can be measured by not only a geographical value represented by x, y coordinates on a map but also by a value of any socioeconomic variable.  Thus, it is important to note that in a taxonomic space the distances between OTU’s can be relative distances as well as absolute ones.  That is, x and y in the above equation can be a population or an unemployment rate in a census tract, respectively.  In summary, in a k dimensional taxonomic space with the variables, , each OTU occupies a position in the taxonomic space, in which usually the position value is converted to the standardized z-score that represents the relative distance from the population mean.  Then, we compute the distances between each OTU and every other OUT, and group the OTU’s in order of the nearest neighbors, which results in classification. 

 

I should designate an area functioning as a control area as opposed to DEZ as an experimental area.  The control area must have similar characteristics to DEZ.  This is a typical classification problem.  Combining this basic concept and method for classification with GIS helps me search a control area for my project.  For the control areas, it is pertinent to search the area with similar socioeconomic conditions to DEZ because the criteria of designating the EZ are largely based on the socioeconomic condition of the area.  The OTU’s in my case can be census tracts because the EZ consists of census tracts. 

 

A simple way is to consider only one variable that is the most likely critical in representing the socioeconomic characteristic of each census tract.  After computing z-scores of each census tract for the critical variable selected, we draw a thematic map for the variable, which shows clusters of values classified using the distances measured by z-scores between every pair of OTU’s.[i]  And then, we designate arbitrarily a sub-cluster—as a control area-- within a larger area which falls into the same class to DEZ. 

 

However, any one variable cannot thoroughly represent characteristics of an area, although the percentage of unemployment or the poverty rate can be a good variable to represent a socioeconomic condition of a census tract.  In addition, in the case that the variables are inter-correlated among themselves, it results in difficulty in estimating the exact amount of correlation of a variable with the socioeconomic condition.  Factor analysis (FA) can be a good tool to classify an area when we want to consider simultaneously all highly inter-correlated variables in a multi-dimensional taxonomic space.

 

Let me describe how I designated a control area for DEZ using the FA method. 

 

Stage I: Factor Analysis

(Notation in the factor analysis and the thematic map)

 

Stage II: Mapping

 

 

Stage III: Statistical hypothesis test

 



[i] ArcView offer a default classification method, termed Natural Break.  This method identifies breakpoints between classes using a statistical formula (Jenk optimization). This method is rather complex, but basically the Jenk method minimizes the sum of the variance within each of the classes. Natural Breaks finds groupings and patterns inherent in your data.

 

 

References

 

 

Abler, Ronald, J. S. Adams, and P. Gould. 1971. Spatial Organization: The Geographer’s view of the World: Ch. 6. Englewood Cliffs: Prentice- Hall.

 

Agresti, Alan and Barbara Finlay. 1986. Statistical Methods for the Social Sciences: 514-517. San Francisco: Dellen Publishing Company.

 

Arlinghaus, Sandra. 1999. Course Homepage, NRE 530, Geography: Spatial Analysis, Theory and Practice. http://www.csfnet.org/530

 

Chung, Chae Gun and Yalin Chao. Employment for Detroit Empowerment Zone, 1994~1996. Course Project for UP 507 Geographic Information Systems. April 2000.

 

Clarke, Keith C. 1999. Getting Started With Geographic Information Systems. Upper Saddle River: Prentice-Hall.

 

Kim, Jae-on and Charles W. Mueller. 1978. Factor Analysis: Statistical Methods and Practical Issues. Beverly Hills: Sage Publications, Inc.