Classification is the systematic grouping of objects or events into classes on the basis of properties or relationships they have in common (Abler, Adams, and Gould, 1971). Groups are commonly understood as clusters of events or objects defined in terms of similarity. However, a primary question is what we mean when we say that A is similar to B. If we call the events or objects themselves to be classified “the operational taxonomic units (OTU’s)”, the degree of similarity is measured by distances of each OTU from every other OTU in the taxonomic space where every OTU as a point has a position value.
For instance, Point P and Q on X, Y plane has each position value, P (x1, y1)
and Q (x2, y2), then the distance between P and Q is . Using this concept
we can measure all the distances between pairs of OTU’s and the same concept
can be applied regardless of a higher dimension of taxonomic space. More important, the value of position for an
OTU in a taxonomic space can be measured by not only a geographical value
represented by x, y coordinates on a
map but also by a value of any socioeconomic variable. Thus, it is important to note that in a
taxonomic space the distances between OTU’s can be relative distances as well
as absolute ones. That is, x and y in the above equation can be a population or an unemployment rate
in a census tract, respectively. In
summary, in a k dimensional taxonomic
space with the variables, , each OTU occupies a position in the taxonomic space, in
which usually the position value is converted to the standardized z-score that
represents the relative distance from the population mean. Then, we compute the distances between each
OTU and every other OUT, and group the OTU’s in order of the nearest neighbors,
which results in classification.
I should designate an area functioning
as a control area as opposed to DEZ as an experimental area. The control area must have similar
characteristics to DEZ. This is a
typical classification problem. Combining
this basic concept and method for classification with GIS helps me search a
control area for my project. For the
control areas, it is pertinent to search the area with similar socioeconomic
conditions to DEZ because the criteria of designating the EZ are largely based
on the socioeconomic condition of the area.
The OTU’s in my case can be census tracts because the EZ consists of
census tracts.
A simple way is to consider only
one variable that is the most likely critical in representing the socioeconomic
characteristic of each census tract.
After computing z-scores of each census tract for the critical variable
selected, we draw a thematic map for the variable, which shows clusters of
values classified using the distances measured by z-scores between every pair
of OTU’s.[i]
And then, we designate arbitrarily
a sub-cluster—as a control area-- within a larger area which falls into the
same class to DEZ.
However, any one variable cannot thoroughly represent characteristics of an area, although the percentage of unemployment or the poverty rate can be a good variable to represent a socioeconomic condition of a census tract. In addition, in the case that the variables are inter-correlated among themselves, it results in difficulty in estimating the exact amount of correlation of a variable with the socioeconomic condition. Factor analysis (FA) can be a good tool to classify an area when we want to consider simultaneously all highly inter-correlated variables in a multi-dimensional taxonomic space.
Let me describe how I designated a control area for DEZ using the FA method.
(Notation
in the factor analysis and the thematic map)
[i] ArcView offer a default classification method, termed Natural Break. This method identifies breakpoints between classes using a statistical formula (Jenk optimization). This method is rather complex, but basically the Jenk method minimizes the sum of the variance within each of the classes. Natural Breaks finds groupings and patterns inherent in your data.
Abler, Ronald, J. S.
Adams, and P. Gould. 1971. Spatial
Organization: The Geographer’s view of the World: Ch. 6. Englewood Cliffs:
Prentice- Hall.
Agresti, Alan and Barbara
Finlay. 1986. Statistical Methods for the
Social Sciences: 514-517. San Francisco: Dellen Publishing Company.
Arlinghaus, Sandra. 1999. Course Homepage, NRE 530,
Geography: Spatial Analysis, Theory and Practice. http://www.csfnet.org/530
Chung, Chae Gun and Yalin Chao. Employment for Detroit
Empowerment Zone, 1994~1996. Course
Project for UP 507 Geographic Information Systems. April 2000.
Clarke, Keith C. 1999. Getting
Started With Geographic Information Systems. Upper Saddle River:
Prentice-Hall.
Kim, Jae-on and Charles W.
Mueller. 1978. Factor Analysis: Statistical Methods and Practical Issues.
Beverly Hills: Sage Publications, Inc.