Projects

UM Computational and Cognitive Neuroscience Lab

The neural development and organization of abstract word recognition

Neural network model. We used the same basic architecture as was used for the letter segregation model (section 1.c) except that the number of units was increased and we used a more sophisticated version of Hebbian learning (so-called zero-sum Hebbian learning, O'Reilly and McClelland, 1992; see Miller, 1990 and Rolls, 1989 for similar rules).

Operation of the network. As in the previously described model, presenting a stimulus (say the word ÒcapÓ) leads to a cluster of activation because of the cooperative/competitive pattern of connectivity in the output layer (neighbors excite, others inhibit). Now suppose the same word is presented, but now in uppercase ("CAP"). Because "C" and "P" are visually similar to "c" and "p" their input representations will also be similar (in this simple localist model, that means they excite the same units; in a more realistic distributed model, the representations would share many units rather than being identical). As a result, "C" and "P" will be biased toward exciting some of the same output units that "cap" excited. The input "A", however, has no such bias. Indeed, its connections to the "cap" cluster would have been weakened when "cap" was presented (because it was inactive when the cluster was previously active) and it excites units outside this cluster. The cluster inhibits these units via the long-range inhibitory connections and it eventually wins out. Hebbian learning again strengthens the connections from the active inputs (including "A") to the cluster. The result is that "a" and "A" are biased toward exciting nearby units despite the fact that they are visually dissimilar and initially excited quite different units. An ALI has emerged.

If this were the whole story then one might expect all letters to converge on the same output representation because many different letters occasionally occur in common contexts (e.g., "a" and "o" in "c-t"). The reason this does not happen is that the distributions of contexts in which different letters appear are very different. For most contexts in which "a" occurs, there is a visually similar context in which "A" occurs (namely, in the same word written in uppercase). Furthermore, these contexts will occur with comparable relative frequencies because words that are frequent in lowercase will also tend to be frequent in uppercase (that is, frequent relative to other uppercase words). So if "a" frequently occurs in a given context (e.g., "says"), then the corresponding visually similar context will also be frequent for "A" (e.g., "SAYS"). And infrequent contexts for "a" will be infrequent for "A" ("zap" and "ZAP"). As a result, the same forces that shape the representation of "a" (that it looks more like "s" than like "z") will shape the representation of "A" and the two inputs will end up having similar output representations.

The same is not true for different letters. Although "a" and "o" occur in some of the same contexts ("cat" and "cot"), there are many contexts in which only one of the two letters can appear. Furthermore, even those contexts that admit either letter will not occur with comparable relative frequencies ("cat" is roughly 50 times more frequent than "cot" in beginning reading books). Thus, very different forces will shape the representation of these letters and they will end up with dissimilar output representations.

Results and discussion. We trained this network using a corpus of the 50 most frequent words from beginning reading books (Baker & Freebody, 1989) in appropriate relative frequencies. We presented words equally often in lowercase, uppercase and Initial Capital forms (we also tried presenting the lowercase words more frequently with no change in the results).

The upper- and lowercase forms of five of the six letters of interest converged on similar representations. That is, the network self-organized to produce representations of letter identities that abstracted away from their visual appearance (i.e., ALIs). The representations of R and r were different, but only because the weights from r were smaller than those from R, so that only the most active output passed the threshold of the all-or-none style sigmoid activation function that was used. The underlying pattern of weights were similarly distributed, the weights from R were simply stronger.

UM Computational and Cognitive Neuroscience Lab |
tpolk@umich.edu | maintained by David Krauss,
dkrauss@umich.edu | revised June 97