Motivation, Conventionalization, and Arbitrariness

in the Origin of Language

Robbins Burling

University of Michigan


The purpose of this paper is to explore the role of what I will call "motivated" signs in the origin of language. By motivated signs I mean signs that are not arbitrary and that are, therefore, very different from the typical arbitrary signs of language. At least since the time of Ferdinand de Saussure, early in this century (Saussure 1959: 69), linguists have generally insisted upon the arbitrary relationship between the form and the meaning of linguistic signs. The substance that we call bread is called pain by the French, and roti by speakers of Hindi, and any other name would do as well. All that is required is that everyone in the speech community agrees on a some consistent convention. To be sure, a few onomatopoetic words echo the bleats of animals or the twitterings of birds, but these words have generally been looked upon as exceptions to the more general pattern of language.

Nevertheless, I will argue that motivated signs are likely to have been important in the earlier stages of language. I will start by proposing a typology of the various forms of human and primate communication, and by exploring the role of motivated signs within the context of other forms of communication. Next, I survey of the various kinds of motivated signs that are found in contemporary human languages. I then move on to diachronic change and consider several examples in which signs that began as motivated were gradually conventionalized until, having lost their motivation, they became totally arbitrary. The background having been given, I turn in the final sections of the paper to phylogeny. Here I consider the part that motivated signs could have played in the early stages of human language. I conclude with some suggestions, admittedly speculative, about the stages through which the early antecedents of language could have passed.

Arbitrary and Motivated Signs.

To sort out the varieties of signs that human beings can use, we can do no better than return to the American philosopher, Charles Sanders Peirce and to his famous three-way division of signs among "icons", "indices", and "symbols" (Peirce 1994). Figure 1 shows Peirce’s classification of signs, except that I have added the level of "motivated" and "arbitrary" which Peirce did not make explicit.

Peirce used the word "symbol" for signs like bread, whose form and meaning are related only by convention, i.e. arbitrarily. Neither Peirce’s indices nor his icons are arbitrary. An index is associated with the object or idea to which it refers–its referent–by proximity or causality. The pointing gesture by which I indicate a cat is an index. A footprint that tells me that someone has passed and the smoke that indicates a fire are indices that are related to their referents by causality.

Peirce recognized three kinds of icons: images, metaphors, and diagrams. Images have a physical resemblance to their referents. Both pictures and onomatopoetic words resemble their referents so they are obvious images. By moving my hand I can represent the direction or the manner in which an object moves, and these gestures, too, are images. Metaphors are more abstract. They need not literally resemble the physical form of their referent but they relate to it in a more abstract way. When talking about a plan of action, I might hold my hands with the palms facing each other, in the same way that I might show the size of a fish. I would not suggest that a plan has an absolute size, but I do reveal a feeling that it is similar to a bounded object, something that has beginning and an end. My hands do not form a picture of a plan so they do not shape an image, but they relate to the idea of a plan (or at least to this particular plan) in a metaphorical way. Diagrams show the relationships among the parts of the object, but they need not resemble either the whole object or its parts. A wiring diagram shows how the parts of a circuit are connected, but neither the overall shape of the diagram nor the representations of the individual parts need to resemble the physical circuit. In the following pages, I will often refer simply to "icons", but it will occasionally be helpful to distinguish among Peirce’s three types of icons.

In their enthusiasm for arbitrariness, linguists have tended to minimize the role of iconicity and indexicality in language, but as soon as we widen our perspective to include the full range of human communication, motivated signs can no longer be so easily dismissed. Our gesticulations–the waving of our arms and the molding of our hands with which we accompany speech–are pervasively iconic. The sign languages of the deaf have more obvious iconicity than do the spoken languages of the hearing. All the earliest writing systems of which we have knowledge, such as Sumerian, early Egyptian, and ancient Chinese, were iconic in their extensive use of pictographs. Even spoken language shows more iconicity than Saussure and many later linguists have led us to expect. Not only do languages have the phonological iconicity of onomatopoeia and sound symbolism, but they have considerable syntactic iconicity as well.

I will return to all of these varieties of iconicity later in the paper, and I will argue that the ability to produce and to interpret motivated signs, both icons and indices, could have been an important part of the cognitive foundation upon which the early language of our evolving human ancestors was built.

The Varieties of Human Communication.

Human beings have several different forms of communication and, if we are to understand the evolutionary role of motivated signs, it is important to keep them distinct. Our two most distinctive forms of communication are language itself, and what I have called our system of "gesture-calls" (Burling 1993). Our gesture-calls include our laughs and sobs, our smiles and frowns, our looks of puzzlement, annoyance, anger, and joy. Some of these are audible and some are visible, and several, such as laughs and sobs, are both. Some of our gesture-calls, including many facial expressions, are silent, but most of our calls are associated with characteristic gestures, so I find it artificial to separate the visible and audible components of our gesture-call system. What can be seen and what can be heard work so closely together that they have to be considered as constituting a single, unified communication system, and this is the reason for my hyphenated term. Our gesture-call system constitutes a large part, though by no means all, of what we often call our "nonverbal communication."

Our gesture-calls differ radically from language, not only in their form but also in the messages that they characteristically convey. With our gesture-calls, including our facial expressions, bodily postures, and laughs and cries, we subtly convey the details of our emotions, and for the most part, these signals are easily understood even when used by people of widely differing cultures. Languages are more variable and they require more learning.

It would be much too simple, however, to suppose that gesture-calls are determined entirely by our genetic inheritance while language has to be learned. We have to learn some particular language but it is our biological inheritance that endows our minds with the potential for learning a language. The same thing is true of our gesture-calls. Much about our gesture-calls is set by our inherited human nature, but they are by no means immune to experience. Thus, like everything else that we are and do, both language and our gesture-calls are formed by the way our experiences act upon our inherited biological potential. Nevertheless, the learning that is required for language is certainly greater than the learning required for gesture-calls. With no training at all, we can understand a large part of the gesture-call system of the most culturally remote people on earth. We need years to achieve an equivalent control over a language that is not our own.

A more fundamental difference between our gesture-calls and language may be that gesture-calls form an analog system with graded signals, while language is fundamentally digital. Graded signals, such as those that lie along the continuum from a giggle through a laugh and on to a guffaw, can no more be counted than the positions of a slide rule, or of a continuously variable meter. Language, on the other hand, is constructed from units such phonemes, words, and sentences that stand in contrast with one another. We cannot compromise between two different words, or between their meanings, in the way that we can compromise between gesture-calls or slide rule positions. The units of language can be counted because they fall into contrastive sets.

Our gesture-call system is excellent for conveying our emotional state and for indicating our intentions. With our gesture-calls we can show that we are friendly and cooperative, or conversely, that we are angry or bored. We modulate all of our social relationships with our gesture-calls, offering polite smiles of deference and reassurance, or suggesting subtly that it is time to break off a meeting. Most of us can probably convey the subtleties of our emotions, intentions, and degree of cooperative inclinations more successfully with our gesture-calls than with language. Our gesture-calls, on the other hand give us less help when we want to convey factual information about the world, and it is here that language comes into its own. Only with language can we describe things distant in time and space. We can feign with our gesture-calls but we cannot really lie. A proper lie requires language. We can do our best to laugh at a joke that we do not find funny, but most of us are not very skillful about conveying a misleading impression without words. When someone conveys one message with words but a different message with his facial expression, it is his face, not his words, that we believe. We have a strong sense of voluntary control over language. Voluntary control over our laughs, cries, and facial expressions is much more difficult.

The words of a language are organized by means of a complex syntax that makes it easy to form sentences that no one has ever said before. Lacking in syntax, gesture-calls never allow us to say anything really new. Quite apart from the productivity allowed by syntax, we can also add entirely new words to our language. New gesture-calls are impossible. Speakers of all languages control tens of thousands of distinct words. In order to keep all these words distinct from one another we need a phonological system that gives language its characteristic dual level of patterning. Lacking a huge contrastive vocabulary, our gesture call system has no need of a separate phonological level. Even the neurology of our two communication systems are different. Language is under cortical control; gesture calls are under the control of the limbic system. In sum, language and gesture-calls use different machinery and they convey different kinds of messages.

It should hardly have to be emphasized that the human gesture-call system closely resembles the gesture-call systems of other primates and even more distantly related mammals. Indeed, by using the label "gesture-calls," I have anticipated the conclusion that this part of human communication has a quite typical primate, and even mammalian, character. Of course the particular calls and gestures that we use are not the same as those of any other species, although ours are quite similar to those of chimpanzees, and they are so close to those of bonobos that both they and we find it easy to understand one another’s facial expressions and body language. Savage-Rumbaugh and Lewin (1994: 106-7) give stunning examples of the ease with which humans and bonobos are able to communicate by means of what I am calling their gesture-call systems, a confirmation of the close relationship of our two species. As would be expected, our gesture calls are less like those of more distantly related primates, but since every species has its own characteristic signals, our special signals give no grounds for surprise.

I emphasize the distinction between our gesture-calls and our language because I want to be clear about the questions I will raise. Language is a relatively recent evolutionary development, and I want to ask how an animal without language could have evolved into an animal with language. I also want to ask what earlier aptitudes and behavior might have formed the background out of which language grew. Of course gesture-calls also evolved, but their evolutionary roots go back very much further than language, and only confusion results from a failure to distinguish them.

The dichotomy between gesture-calls and language is basic to an understanding of human communication, but of course, the story is more complex. To see the part that motivated signs play in communication, more detail is needed, and Figure 2 adds several types of communication to those that I have already considered.

The double horizontal line of Figure 2 divides the analog part of our communication at the top from the digital part at the bottom. The vertical line divides visible communication from audible. Only the cell for gesture-calls at the very top lacks the vertical division, for here it would be artificial to divide the visible from the audible. Elsewhere the table is arranged in a way that suggests the close parallelism between our audible and visible communication. Wilcox (this volume) makes a strong case for the primacy of visible signals in language evolution. I remain agnostic on this issue although, as I will point out later, it is certainly easier to construct a hypothetical continuum that yields visible signals than to construct a corresponding continuum for audible signals. I do think it is worth emphasizing that every kind of communication that we find in one modality has a close parallel in the other, and selection for skills in one modality is likely to have carried with it increased skills in the other.

At the bottom right of Figure 2 is spoken language, balanced on its left by sign language. Since signing can exploit the three dimensions of visible space as well as the dimension of time, its organization can differ in significant ways from spoken language, but except for being visible rather than audible, it shares most of the characteristics of spoken language. It is utterly different from a gesture-call system.

Just above the row for language are spaces for what Kendon (1988, 1992) has aptly called "quotable gestures," and what might in symmetry, be called "quotable vocalizations." Quotable gestures include such things as nods, head shakes, the V-for-victory sign and the head screw that suggests that someone is crazy. These are learned, conventional and variable from culture to culture. They also contrast with one another, and it is this that makes them quotable. They share most of the characteristics of language, but because they are visible rather than audible, they cannot be incorporated into the phonology or syntax of a spoken language. Nor are quotable vocalizations incorporated into phonology or syntax, and it is this that keeps them on a separate row from language in Figure 2. Not conforming to standard phonology, the quotable vocalizations are difficult to spell, but they include the expressions that we sometimes write as oh-oh, tsk-tsk, m-hm meaning ‘yes’ and uh-uh meaning ‘no.’ Like the proper words of a language and like quotable gestures, these vocalizations need to be learned and they are culturally variable. They also contrast, both with each other and with proper words. In all these respects they are very much like quotable gestures.

In the remainder of this paper I will have little to say about quotable gestures or quotable vocalizations. I mention them in order to make clear that quotable vocalizations are not the same as calls, and that quotable gestures are very different from both the gestural component of our gesture-call system and from the gesticulation that accompanies language. The word "gesture" is dangerously ambiguous. Quotable gestures are much more language-like than are our other facial or manual movements, and they are properly placed on the digital side of our communication system along with language.

Only one row of Figure 2 remains to be considered. This row includes gesticulation in the visible column and intonation in the audible. The purpose of this typology has been to isolate these two types of communicative signals, and these will be the topic of the next three sections.

First, however, it is important to realize that human beings make one other kind of gesture that is not included in Figure 2. These can be called "instrumental gestures" and they are omitted from Figure 2 because their basic function is not communication at all. They are the movements that all animals, including humans, need to make in the ordinary business of life. We move ourselves from place to place, we manipulate objects, we eat, we kick, we scratch. Wilcox (this volume) emphasizes the importance of instrumental gestures as grounding for human cognitive abilities and language capacity, and they may have played an important part in the evolution of communication. Although instrumental gestures are designed to deal with the environment rather than to communicate, animals easily interpret many of the instrumental gestures made by conspecifics and even by members of other species. We understand what others are up to by the way they behave, so their instrumental gestures do communicate even if that is not their primary purpose. Since they must be adapted to the environment, instrumental gestures are inherently motivated. The shape of our hand iconically reflects the shape of the object we are about to grasp. The direction of our gaze indexes the object at which we look. Instrumental gestures, moreover, are capable of being conventionalized for communicative purposes, and in this way they serve as the foundation upon which a good deal of communicative gesturing can be built. I will return to instrumental gesturing in later sections of this paper, and suggest its importance for the launching of language.

Paralinguistic Signs.

Like gesture-calls, gesticulation and intonation are analog signals, but they are both used more intimately with language than are gesture-calls. All of us sometimes mold our hands and move our arms as we speak, and these gesticulations so closely reflect the meaning of our talk that they form a sort of counterpoint to it (McNeill 1992). "Gesticulation" refers most specifically to hand and arm gestures, but we often bob our heads and even hunch our shoulders at the same time, and the bobbing and hunching belong with gesticulation in a broader sense. Intonation refers to the melody of language, to the rises and falls of pitch that accompany our words and sentences. Intonation works closely with rhythm, pauses, and intensity (loudness), and together these form the prosodic system of a language. I will limit my observations primarily to intonation and will consider the other aspects of prosody only tangentially.

Intonation must not be confused with the contrastive system of tones in languages like Chinese. Syllables in all languages are characterized by particular consonants and particular vowels. Chinese syllables are also characterized by particular tones, and these tones are as much a part of the contrastive system as are the consonants and vowels. Tones do not prevent their languages from having intonational patterns like other languages. Tones and intonation readily coexist, but tones, like consonants and vowels, are used to distinguish words from one another, while the primary function of intonation is to express our emotions and our attitudes toward the things we are talking about.

Word stress, in languages like English, works in ways that resemble tone. Roughly speaking, one syllable of each English word can be said to have stress. Dífficult is stressed on its first syllable, while understánd is stressed on its third. While there are complications and a few exceptions, the location of stress is generally fixed, and its location, along with the sequence of vowels and consonants, is a part of the contrastive system of the language. When a word is spoken in isolation, the stressed syllable is almost always given an accent. This means that it is pronounced a bit longer or louder than surrounding syllables or with a higher, or occasionally lower, pitch. In one way or another, it is set off from the neighboring syllables. In the flow of speech, when a word is accented, it is the stressed syllable that receives the accent, but not every stressed syllable receives an accent every time the word is used. Some stressed syllables are said no higher, longer, or louder than the surrounding syllables, although we know the location of stress so well that it may be hard for us to realize that a stress has not received an accent. It is the ability of a stressed syllable not to receive an accent that forces us to make a distinction between stress (a fixed location in a word) and accent (the setting off of a stressed syllable by pitch, intensity, or length). Since stress, like tone, is an inherent part of the word, its location is a part of the contrastive system of the language. The accent given to a stressed syllable can vary continuously from nothing at all to a scream. The degree of accent reflects the emotions and attitudes of speakers toward what they are talking about. In spite of its close association with language, accent is used in ways that recall the gesture-call system.

Traditionally, linguists have regarded intonation, but not gesticulation, as belonging to language. The only reason to treat them differently is that intonation is produced by the voice so its association with the spoken language that most linguists study seems more intimate. If we can ignore their differing visible and audible modalities, however, it is clear that gesticulation and intonation have such similar relationships with (the rest of) language, that it is difficult to find any solid basis for considering one, but not the other, to be a part of language. I prefer to define language in a way that excludes both. If we were to define language broadly enough to embrace gesticulation and intonation, we would still have to acknowledge their differences from the rest of language. It seems easiest to recognize the distinction from the beginning by defining language more narrowly. We can even look upon both intonation and gesticulation as a sort of invasion of language by the kind of signaling system that forms our gesture-calls. We hear the degree of excitement or fatigue in other people’s voices and we see the same emotions in their gesticulations.

The movements and noises that we use in gesticulation and intonation sometimes escape their usual association with language. We use our voices iconically without language whenever we imitate a noise. Occasionally we even hum an intonation in the absence of words. In the right context it is possible answer a question simply by humming the tune of I don’t know. Even without vowels and consonants, the intonational contour sounds enough like the sentence to convey its meaning (Bolinger 1986: 211). We can also use our hands without language to form iconic movements and shapes. We lean our bodies or thrust our shoulders indexically in the direction we are concerned with. Watching something being pushed, we move our hands or our bodies in sympathy. Our gesticulations often reflect the manipulation of real objects in the world so they reflect our instrumental gestures. For present purposes, the important thing about paralinguistic signals is that they are the most consistently motivated component of human communication.

I have now located motivated signs within the broad range of human communication. The next step is to survey the several types of motivated signs that human beings use.


In this and in the following two sections, I survey the main varieties of motivated signs that are used in human communication. Motivation is probably easiest to recognize in gesticulation, in no small part due to the careful work of David McNeill reported in his book Hand and Mind (1992). McNeill uses the word "gesture" for what I, following Kendon and others, have called "gesticulation." Although I will follow McNeill’s analysis, I will substitute "gesticulation" for his "gesture" because I want to be careful to distinguish it both from quotable gestures and from the hand and facial movements of the gesture-call system.

McNeill recognizes several kinds of gesticulation. First, what Peirce would have called "images," McNeill calls iconics. These imitate the shape or movement of something that is being talked about. We may move our hand upward when talking about someone who is climbing. We may wiggle our fingers to represent moving legs. When offering to get someone a cup we may form our hand into the shape that would hold a cup. We hold our hands wide when talking about something big. We may pinch our faces together for something small or trivial. We outline the shape of almost anything with our hands.

McNeill’s metaphorics correspond to Peirce’s metaphors. They are similar to iconics, but they represent more abstract ideas. We may direct our palm first one way and then the other to show that there are two sides to some issue. We move our hand back and forth several times when talking about something repetitious. Metaphorics are not pictures of physical objects, but they are representations of ideas.

McNeill’s deictics are pointing gestures, and they would be included among Peirce’s indices. We may point to physical objects, including people, when we talk about them. We may also point more abstractly, as when we point first in one direction and then in another to refer to two absent people, to two different events, or even to two opposing viewpoints.

McNeill’s final two types of gesticulation are not so easily assigned to one of Peirce’s types of signs but, in their own ways, they also point, so both have an indexical component. We may mark the places in our sentences that we find important with beats. These can be slight movements of the finger or hand, or they can be bobs of the head. Of all gesticulations, beats are generally the least conspicuous, but they are also the most tightly tied to language. Try saying I ábsolútely will nót dó it very forcefully, by placing a firm accent on áb-, lút, nót, and , but lift your head slightly on each beat instead of bobbing it downward. Then try bobbing your head downward on unaccented syllables instead of on accented ones. You will quickly discover how tightly you are constrained in the way you form and place your beats. Beats generally coincide with intonational accents and they point to the important parts of our discourse.

Finally, cohesives are made by repeating any of the other four types of gesticulations. A point in one direction may be followed by a point in another direction. The repetition itself carries a meaning, for it ties different parts of the discourse together and announces that the two places that are marked by the same gesticulation have something in common. Cohesives can be regarded as pointing to each other. In Peircian terms they diagrammatically reflect the relationships among the things we are talking about. All these gesticulations are well motivated. Some are iconic and some are indexical, but none is arbitrary. Like gesture-calls, gesticulations are analog signals, but they are much less stereotyped.

We have little systematic knowledge of gesticulation in other cultures. A good deal of cross-cultural variation in the amount of gesticulation is sometimes presumed to exist, and possibly there is variation in the types. Southern Europeans are supposed to wave their arms more freely than Northerners, although the apparent variation may be due more to differing uses of quotable gestures than to differing patterns of gesticulation. In any case, all people do, sometimes, gesticulate, and so far as we now know, everyone uses the same basic types of gesticulation that McNeill has identified among American speakers of English.


Because intonation is produced with the voice rather than with the hands, it is more difficult to extract from the rush of spoken language than is gesticulation, but the pitch of intonation is iconic in at least two different ways.

First, high pitch is associated with high-tension, arousal, excitement, eagerness, activity; low pitch with low-tension, relaxation, completion. These relate to the pervasive metaphor of "up and down," and with inconclusiveness and conclusiveness. As Dwight Bolinger put it "In the course of an action we are up and moving; at the end, we sit or lie down to rest. In a discourse this translates to higher pitches while [an] utterance is in progress and a fall at the end" (Bolinger 1985: 99). Mothers soothe their infants with low-pitched reassuring sounds. High pitched enthusiasm is stimulating. We raise our voices in fear, anger, excitement, or intense interest. Our voices drop with boredom and fatigue. We hold the floor with a rising intonation. We yield to another speaker with a fall.

Rises and falls of pitch modulate the meaning of the words with which they are used. The end of a statement most often has a fall in pitch, sometimes gradual, sometimes quite abrupt. In English, as in most languages, questions, that can be answered with a yes or a no usually end with a terminal rise. It is wrong, however, to consider this rise to be a "question intonation" for questions do not always rise, and rises are found in other places than questions. Subordinate clauses, for example, typically rise: If I had some físh, I’d eat it. Rises are more accurately described as indicating incompletion. As with a subordinate clause, a question shows that something else is expected, although it invites someone else to supply the rest. Statements that end with a terminal rise often suggest uncertainty on the part of the speaker. I think I have enóugh, with a rise at the end, leaves room for doubt. A falling pitch would show more certainty.

Any sentence type can be used with any intonation. We say that we "soften" a command by putting it in the form of a question. Instead of saying Hand me the chisel it is more polite to say Can you hand me the chisel? Even in its question form, however, the sentence is likely to end with a fall rather than a rise, and this shows that it is not really a question at all, but a request. No verbal reply is expected. Conversely, if I say That’s a turníp with a terminal rise I show my own uncertainty. I leave something unfinished. I have asked a question without using question syntax.

The close association of intonation and gesticulation is shown by the fact that raised eyebrows, like a raised voice, can signal uncertainty, lack of completion, or a question. A more persistent association between gesticulation and intonation is the synchronization of spoken accents with gesticulated beats of the hands or head. In English, we most often accent the points of a discourse that interest and excite us with a rise in pitch. The more interesting and exciting the point, the greater the rise. These accents generally coincide with the beats of gesticulation, and it is this kind of close coordination of intonation and gesticulation that makes it awkward to consider intonation to be a part of language while excluding gesticulation. Since accents mark important points of the discourse, often the points where new information is introduced, they reflect the pragmatic flow of information rather than the syntax, and so do the beats of gesticulation.

If a syllable is accented, the strength of its accent can be continuously graded, and the strength is proportional to the strength of the emotions. His name is John can be said with a barely perceptible peak in pitch on John and then a fall. With progressively stronger accents, the assertion becomes progressively more forceful (Bolinger 1978: 474) until it finally becomes a scream.

The second iconic association of pitch is with size. A high or rising pitch suggests small size, weakness, helplessness, submission, courtesy, and a lack of confidence. A low or falling pitch suggests large size, assertiveness, authority, aggression, confidence, self-sufficiency and threat (Ohala 1944). The iconicity here is obvious. We expect high pitches from small musical instruments and small people–women and children. Large musical instruments and large people–men–are pitched lower. Even among animals, high pitch is associated with submission. A dog’s submissive whine is pitched higher than its threatening growl, just as an infant’s pleading whine is high while a sergeant’s forceful command is low.

We try to project an image by the way we speak. Hoping to sound more authoritative than we feel, we may try to mask our nervousness by lowering our voice. In many languages, a rise in pitch is used for polite speech (Brown and Levinson 1987: 267-268). Women’s speech is said to be more often characterized by hesitant and deferential rising intonation than is the speech of men (R. Lakoff 1975: 17). Both raising and lowering pitch can be useful for getting ones way, but a careful assessment of one’s relative social position is needed before deciding which is more likely to be successful. We easily recognize the meaning of these pitch differences. We react to the degree of deference or authority shown by a person’s speech.

Intonation is linked to the structure of sentences in one important way. The rises, falls, and discontinuities of the melodic line mark the syntactic divisions of sentences. Each phrase is likely to have one particularly important point marked with an accent. That accent helps to set off the phrase from its neighbors. The melody helps the hearer to untangle the structure of sentences. In most respects, however, intonation expresses the attitude and emotions of the speaker.

Syntactic Iconicity.

In addition to the iconicity of onomatopoeia, sound symbolism, and intonation, spoken languages exhibit considerable iconicity in syntax, and since Roman Jakobson first opened the subject (1965, 1966), syntactic iconicity has drawn the attention of a growing minority of linguists (Haiman 1985; Givon 1989).

Syntactic iconicity shows up most clearly in the order of words and morphemes. Word and morpheme order cannot be used for imagistic or even metaphoric iconicity, but it can stand in a relationship of diagrammatic iconicity to the meaning (Matthews 1991: 12). The most obvious example is the tendency for the words of a sentence to follow the order of the events they describe. We understand the famous veni, vidi, vici to mean that Caesar’s first action was to come. Then he saw, and only after that did he conquer. Languages do give us ways to say things out of chronological order, but this comes at the cost of more complex syntax: Before conquering I saw and before seeing I came. A logician might argue that I went inside and ate reveals nothing about where I did my eating, but nonlogicians will normally understand the sentence to mean that the eating took place after going in, so it must have taken place inside. Someone who says I ate and went inside will be understood to have eaten outside.

A less obvious form of syntactic iconicity is revealed by the order of words in an English noun phrase. Consider the old red iron steam engine, as shown in Figure 3. The order of the words in this phrase is almost fixed. It is not quite impossible to say the red old iron steam engine or even the old red steam iron engine but neither of these is quite natural. Why do we prefer the old red iron steam engine over any alternative order? Do we simply learn this order as an arbitrary characteristic of the English language?

The order turns out to be far from arbitrary, for it diagrams the relationship among the concepts. The modifiers that stand closest to the noun are also closest to its meaning. Steam is used with only a handful of nouns such as steam boiler, steamroller, steamship, steam shovel. These are so few and distinctive that dictionaries list them as lexical items. Iron can modify far more words than steam, among them nail, hinge, and key, but only rarely and metaphorically would iron ever be used to modify a word that refers to something soft. Red, on the other hand, can be used as easily with soft things like shirts and cheeks as with hard things like nails or engines. The meaning of old is even more general, for unlike steam, iron, or red, it can easily be used with words for abstractions, such as problem, question, or argument. Finally the is the most general modifier of all, for it can be used with any common noun (i.e. not a "proper" noun) in the language. The pattern is consistent: the modifiers whose meanings are most specific to the meaning of the noun are placed closest to the noun. The most general modifiers are the farthest away.

If this word order conforms diagrammatically to the relationships among the concepts, we should expect to find the same order in other languages. I know of no systematic cross-linguistic study of noun phrases that tests this expectation, and any test would be complicated by languages in which some or all modifiers are placed after the noun rather than before it. The languages whose data I have looked at give me the strong impression that when modifiers precede the noun they appear in the same order as in English.

Other examples have been investigated more carefully, the best known being Bybee’s study of verbal affixes (1985). Verbs in many languages can have prefixes or suffixes that mark aspect, tense, mood, and person. Person refers to the difference between first, second and third person, and between singular and plural. Mood distinguishes assertions (indicative), nonassertions (subjunctive), and commands (imperative). Tense, of course, locates the action in time. Aspect is not prominently marked by affixation in English, but languages with morphological aspect use it to indicate such things as the beginning, continuation or completion of an action. English often has separate verbs to make distinctions that could be made by an aspect marker in some other languages. For example, we use distinct verbs to show the difference between moving into a state and being in it: fall asleep/sleep, sit down/sit, learn/know, grow/be big, grab/hold, lift/carry. In some languages, this distinction is made by aspect affixes. A distinction that is made by using two different words instead of by means of an affix is said to be lexicalized, so we can say that aspect is often lexicalized in English.

When the verbs of a language carry affixes that distinguish more than one of these four distinctions, the affix for aspect is almost always placed closest to the verb, just before the verb if it is prefixed, just after the verb if suffixed. Affixes for tense, mood, and person are placed at increasing distance from the verb. As with the noun modifiers, the order of these affixes is diagrammatically related to the degree of involvement of the concepts with the verbs. Lexicalization shows the closest involvement of all. This means that even though English does not have grammatical aspect, the intimate involvement of aspect with the verb is shown by its lexicalization. Tense affects the meaning of the verb less closely than aspect, and mood even less so. Person refers to the participants in the action rather than to the action itself, so it is reasonable to find person markers furthest from the verb.

Syntax can be motivated in still another way. Generally, the categories that linguists refer to as "marked" are expressed by longer forms than unmarked categories. The obvious example is plurality. When a language distinguishes plural from singular nouns, it is almost always the plural that is "marked" by the longer form. English, of course, conforms to this rule by forming the plural with a suffix that is added to the singular form of the noun. Logically, it should be possible to leave the plural "unmarked" and to form the singular by adding a "singular marker" to the plural form. Languages almost never do this, presumably because the singular is cognitively central. Marking something as plural adds an idea, as well as a suffix, to the core word.

If morpheme and word order were consistently motivated, we might expect morphemes and words to occur in the same order in all languages. Of course they do not, and we must ask why. In some cases different orders seem to have little difference in motivation. Whether an adjective comes before or after its noun, for example, seems to matter less than its distance from the noun. In other cases, historical processes of various sorts probably drag languages away from perfect motivation. We might expect that if languages lose too much motivation, pressures would build that would push them back to greater motivation. It is clear, for example, that children learn motivated constructions more easily than those that violate motivation (Slobin 1985, see below). Children make more mistakes with nonmotivated constructions. If they make enough mistakes they could gradually force the language back to a form that is more clearly motivated.

English has one strikingly nonmotivated suffix: the third person singular -s as in I drive/she drives. Languages more often mark first and second person verbs or plural verbs, and leave the third person singular unmarked. The history of this English anomaly is well known, and it resulted from entirely ordinary processes of linguistic change. English once had a rich system of person markers but most were progressively lost until the third singular -s is all that remains, the final relic of verb agreement. If this suffix is as nonmotivated as it appears to be, we might expect it to be unstable. Its loss would remove an anomaly. In fact, the third person singular suffix has already disappeared from a few dialects of English including those spoken by some African-Americans who can say she drive tomorrow as easily as I drive tomorrow. Perhaps in the course of the next century or two the loss of the third person singular -s will spread. Some day it may be remembered only as an historical oddity.

The iconicity and indexicality found in gesticulation, intonation, and syntax is more extensive than linguists have sometimes acknowledged, but linguists have still been justified in emphasizing the arbitrary nature of many other aspects of language. In the concluding section of the paper, I will suggest that motivated signs could have had a much greater importance during the early evolutionary stages of language than they do today. First, however, I must consider the competing advantages of conventionalization. Conventionalization leads to a decline in motivation and a corresponding increase in arbitrariness.

Conventionalization in history.

Having surveyed the use of motivated signs in contemporary languages, I turn in this and the following section to several examples in which signs that began as motivated icons or indexes have gradually became standardized and conventionalized. When conventionalization goes far enough, motivation can finally be undermined and the signs can be drawn into a contrastive system of communication.

To one who has no knowledge of American Sign Language (ASL), its signs are by no means transparent, but many have an underlying motivation (Frishberg 1975; Klima and Bellugi 1979). Without an explanation, the sign in which the thumbs and index fingers of both hands are formed into an "o" shape and moved alternately up and down will appear completely obscure. Once told that the sign represents a balance scale whose pans swing up and down, the motivation for its meaning ‘judge’ becomes obvious. A large number of ASL signs are as clearly iconic as ‘judge’ but still arbitrary enough to require learning. These signs have become partially conventionalized but they have not yet lost all motivation.

When signers lack a name for something, they find it easy to invent one, and their newly invented signs are often clearly iconic. Klima and Bellugi (1979: 11) describe a sign for ‘cinnamon roll’ that was invented by a three year old child. She held one hand in a cupped position, and just above it, she circled the index finger of her other hand. She sketched the roll’s swirls, its most salient feature. Klima and Bellugi also describe deaf researchers who needed a sign for a ‘videotape recorder.’ The machine they needed to talk about had two spools that spun together as the tape moved from one to the other. In the sign invented by the researchers, the two index fingers outlined circles in imitation of the turning reels. Gestures like ‘cinnamon roll’ and ‘videotape recorder’ are not very different from the gesticulations that a hearing person might make while mentioning these objects, except that, from the start, they need to carry the full burden of communication. This forces them to be correspondingly explicit and, unlike gesticulations, the invented signs of the deaf tend quickly to become conventionalized. At first, the sign for the videotape recorder had both fingers circling in the same direction, just like the spools. Soon, however, the signers began to circle their fingers in opposite, complimentary, directions. At the cost of reduced iconicity, this made the sign easier to make and also brought it closer to the style of established ASL signs. Many ASL signs have had a similar history. They started as clear iconic representations of the objects or actions to be talked about, but were then adapted to the established patterns of the language.

With enough time, the original iconicity of a sign can be completely lost. ASL grew from a form of signing that was brought to the United States from France in the early 19th century, so its history in America is relatively short, and many of the changes it has undergone are well known. An example is the sign for ‘home.’ Since home is the place where you eat and sleep, its sign began as a compound, formed from the sign for ‘eat’ followed by the sign for ‘sleep.’ ‘Eat’ is made by a gesture that suggests bringing food to the mouth, and ‘sleep’ is made by placing the palm of the hand beside the head, as if sleeping. Both signs are transparently iconic, and their meanings could unite in the sign for ‘home.’ With time, however, the two parts of the compound merged. Today ‘home’ is made with two taps of the extended fingers on the cheek. The hand shape of ‘eat’ has been retained but it is now made in approximately the position for ‘sleep.’ The revised sign is quicker to form than the original, but it has lost all trace of the earlier iconicity of its parts. It has become as arbitrary as the English word home.

Early writing was gradually conventionalized in much the same way as ASL. Sumerian, ancient Egyptian, and early Chinese, all made extensive use of pictographs. A picture looking a bit like a mountain might represent the word for mountain. A picture of a hand could represent hand. Pictographs offered an obvious, though only partial, solution to the problem of representing the huge number of words in a spoken language on clay or papyrus. Even in the earliest surviving examples of writing, however, the pictures were often quite stylized. For example, the earliest form of the Sumerian word for ‘water’ consisted of two wavy but generally horizontal lines, easily understood as representing waves. This was a pictograph but already highly stylized. Later, for unknown reasons, this sign, along with all the rest of Sumerian writing, was rotated 90 degrees so that the wavy lines were written vertically instead of horizontally, a considerably less iconic representation. Then, with the development of a stylus that made triangular impressions in clay, the sign was further stylized into one long vertical stroke that replaced one wavy line, and two shorter strokes, one above the other, that replaced the second wavy line. By then, the sign had lost any hint of iconicity (Kramer 1963). The same kind of conventionalization came to all other Sumerian signs and to the signs of other writing systems as well. The earliest surviving Chinese characters were much more iconic than modern ones. A trace of iconicity can still be seen in a handful of modern Chinese characters, but most have become completely arbitrary symbols that represent the syllables of the spoken language.

It is more difficult to point to convincing cases in which intonation has become conventionalized. Bolinger speculates that intonation could have contributed to the development of tones in some languages (Bolinger 1989: 27), and this would amount to a conventionalization of intonation. If one looks hard enough, it is possible to find examples of tone-like distinctions even in English That’s fùnny pronounced with a low pitch on fùn- is likely to be understood as meaning "that’s strange." That’s fúnny with a high pitch on fún- is more likely to be understood as meaning "that’s laughable." English has not become a tone language, however, and there are other well-known ways for tones to arise than by conventionalization of intonation. I know of no certain cases where intonation has been conventionalized into contrastive tone.

Clearer examples of the conventionalization of pitch differences are found in sound symbolism. In many languages, words meaning ‘little’ have high front vowels, while words meaning ‘large’ more often have vowels that are lower and further back. Itsy bitsy and teeny weeny are obvious examples. Part of the appeal of humungus lies in the sound symbolism of its vowels. High front vowels have high second formants and they are perceived as high pitched and thus appropriate for words meaning ‘little’. Back vowels have low second formants and they are perceived as having low pitch, and so more appropriate for words meaning ‘large’. It is true that big and small are counter-examples of the generalization, but so many examples from so many languages fit the generalization that the relationship is convincing in spite of the exceptions.

Conventionalization in the individual.

Centuries were needed for writing to be conventionalized; a few generations have brought considerable conventionalization to American Sign Language. We can watch conventionalization happening even more quickly as every child learns to talk. A compelling example is found in Goldin-Meadow’s study of deaf children of hearing parents (1993). The children with whom Goldin-Meadow worked had no contact with an established sign language but their deafness cut them off from the spoken language of their homes. In spite of the absence of linguistic input, these deaf children devised elaborate gestural systems by which to communicate with other members of their families. The signs they used were all iconic or indexical in origin, and at first, they were not so different from the gesticulations of hearing people. The children pointed to things as a way of naming them, and they formed shapes with their hands just as everyone sometimes does while speaking. The gestures of these deaf children had to stand alone, however, and like the signs of ASL, they became conventionalized into something more like words than like gesticulations. The children created what amounted to simple languages.

All of the children used their gestures as "tools" for communication–to convey information about current, past, and future events, and to manipulate the world around them. Like children learning conventional languages, the deaf children requested objects and actions from others and did so using their gestures; e.g., a pointing gesture at a book, a "give" gesture, and a pointing gesture at the child’s own chest, to request mother to give the child a book; or a "hit" gesture followed by a pointing gesture at mother, to request mother to hit a tower of blocks. Moreover, like children learning conventional languages, the deaf children commented on the actions of objects, people, and themselves, both in the past (e.g., a "high" gesture followed by a "fall" gesture to indicate that the block tower was high and then fell to the ground) and in the future (e.g., a pointing gesture at Lisa with a head-shake, an "eat" gesture, a pointing gesture at the child himself, and an "eat" gesture with a nod, to indicate that Lisa would not eat lunch but that the child would). Gestures were also used to recount events which happened some time ago; e.g., one child produced an "away" gesture, a "drive" gesture, a "beard" gesture, a "moustache" gesture, and a "sleep" gesture to comment on the fact that the family had driven away to the airport to bring his uncle (who wears a beard and a moustache) home so that he could sleep over (Goldin-Meadow 1993: 65-66).

Goldin-Meadow recognized three different kinds of gestures. "Pointing gestures" indicated a person or an object, but unlike the points of hearing people, which indicate the location of an object but do not name it, the pointing gestures of the deaf children came to function like the nouns and pronouns of an established language. They served as names for objects and people. Mostly the children pointed to things present in their immediate environment, but as they grew older they invented a more abstract kind of pointing. One child, for example, made a round gesture to indicate a Christmas tree ball and then indicated its hook by pointing to the place in the gesture where the hook would belong.

Goldin-Meadow gave the name "characterizing gestures" to iconic gestures that denoted actions and attributes. Chewing movements made while a fist was held near the mouth meant ‘eat.’ A hand moving forward in the air when describing the movement of a toy meant ‘go.’ Each child devised his or her own idiosyncratic signs, of course, but each settled on stable forms that, while iconic or indexical in origin, became conventionalized by repetition. At least one child constructed some signs by combining one set of fixed hand shapes with another set of fixed motions.

The term "marker" was used for a third type of gesture. These included nods and head shakes to indicate affirmation and negation, and a finger held in the air to mean ‘wait.’ These, of course, are conventional gestures for hearing speakers of American English, and they were learned from family members, but they were incorporated securely into the children’s linguistic system.

The signs invented by the deaf children came to differ from the gesticulations of hearing people in two ways. First, they were segmented into a linear sequence, and second, they became sufficiently standardized to contrast with one another. The signs of the deaf children had a more consistent form than the gestures of the hearing members of their families for whom the gestures stayed closer to ordinary gesticulation, but even the children’s signs never become so conventionalized as to be arbitrary. They always retained their motivated character. An isolated deaf child who is trying to make his or her needs known to people who do not control the system as well, is limited to signs that are clearly motivated. Goldin-Meadow suggests (1993: 78) that at least two language users may be needed before arbitrariness can be introduced into a communication system. Two users can more easily to agree on arbitrary conventions.

Hearing children sometimes conventionalize gestures in such subtle ways that we hardly notice. Consider the "arms-up" gesture by which toddlers ask to be picked up. This begins as an instrumental gesture, a part of the baby’s adaptation to the impinging world, in this case a part of his interaction with bigger people. After being lifted often enough by adult hands that have been placed under his arm pits, a baby learns to spread and then raise his arms in anticipation. The gesture then becomes conventionalized and turns into a stylized request. This arms-up gesture is so common that we might almost imagine it to be an inborn gesture-call, but it is more dependent upon learning than true gesture-calls. Unlike the words of a language or quotable gestures such as the bye-bye wave, however, it is learned through practice in interaction, not by either imitation or direct instruction. If adults make an arms-up gesture to a baby they are more likely to be imitating the baby than offering instruction. The arms-up gesture is not as clearly culturally patterned as words or quotable gestures, and it is not used symmetrically. Adults do not seriously ask babies to pick them up. The begging gesture–hand extended, palm upward with the fingers together–is learned in much the same way. The communicative grunts of human and primate infants that are described by McCune (this volume) are a parallel vocal example of a signal that is conventionalized from an instrumental act. These and many others common gestures are conventionalized from the background of our instrumental activities, and they remain more clearly iconic than many quotable gestures. I will return later to chimpanzee infants, who conventionalize gestures just as human infants do.

The spoken language of hearing children also progresses from motivated to increasingly conventionalized and arbitrary forms. This can be seen in the word order that often appears in early child language. Small children often use word order that is "incorrect" by the standards of the adult language, but that is also more motivated. Slobin (1985) gives exhaustive documentation from many languages in which the children’s "incorrect" order is more iconic than the "correct" order. Logically, for example, most negatives negate their entire sentence. It would be diagrammatically reasonable to negate The train will come on time as Not–the train will come on time. By placing the sign for negation outside the rest of the sentence we could show that the meaning of the entire sentence has been reversed rather than the meaning of just of one of its parts. In ordinary speech, we tuck the negation inside the sentence and say The train will not come on time, but this loses diagrammatic iconicity. As they begin to learn English, small children often place the negative outside the rest of the sentence: no sit there, no the sun shining, no fall, no play that. (Both the examples and the suggested analysis are from Klima and Bellugi, 1966). The order violates adult rules of grammar, but it diagrams the meaning of the sentences better than adult grammar does.

Children sometimes make their first questions simply by using a rising intonation, but a bit later they may attach question words to the beginning of the sentence: What he can ride in?, What you eat? To make the first of these examples conform to adult standards, the subject and auxiliary verb need to change places (What can he ride in?) and the second example requires an added do. These changes appear to be unmotivated, and children generally learn them later than the use of the question word itself.

Even more obviously motivated than word order are the stereotypic children’s words that are, to some degree, onomatopoetic: choo-choo, bow-wow, honk-honk and so forth. Children learn these words from older people, of course, but adults recognize children’s affinity for motivation and expect such motivated forms to appeal to children.

In special circumstances, even adults can conventionalize signals in the course of a single conversation. With sufficiently imaginative gesturing, two adults who share no common language can communicate a good deal. Of course they can use the gestures and calls that all humans share: smiles to show good will, laughs to show solidarity, frowns to show puzzlement, and all the rest. They may also share quotable gestures such as nods and head shakes even if they share no spoken language. In addition they are also likely to make heavy use of pantomime, the most motivated of all gestures. They will point and make shapes with their hands, hold up fingers to represent numbers, and even move their entire bodies to show their meaning. They may draw pictures on paper or in the dirt. With enough patience it is possible to explain to someone that he should go three blocks that way, turn right, walk two block more, and find the tall building. It does not take long, in encounters like these, for conventions to arise. Whatever is understood can be seized upon as useful for the future, even if it is soon abbreviated. It may take a good deal of initial effort to get across the idea of blocks in three blocks, but once that has been accomplished, it will be much easier to explain that after turning right the person should go two more blocks. A start is made in agreeing on a conventional way to name blocks.

Groping to give directions to someone with whom one shares no language represents a stage of communication that amounts to a sort of pre-pidgin–communication with no spoken language at all. When people with good ears but no common language are thrown together for any length of time, they soon find spoken words to support, and then to replace, the frantic gesturing with which they must begin. Efficiency of communication quickly encourages conventionalized sequences of sounds with consistent meanings.

Conventionalization and Arbitrariness.

I have described a process that has taken place repeatedly. The early stages of several types of communication relied heavily on motivated signs, but as each communication system became established, the signs became more and more conventional. Finally, they lost all the motivation with which they began, and reached the point of arbitrariness. The transparency of motivated signs would seem to give them a clear advantage over those that are arbitrary, but conventionalization and arbitrariness have so regularly won out that they must have important advantages that compensate for the loss of motivation. What are they?

Conventionalization represents, in part, the victory of the producer, (signer, writer, speaker) over the receiver (reader, listener). Even more, it is the victory of skilled and experienced users over learners. Motivated signs are much easier to learn than arbitrary ones so learners should have a clear preference for extensive motivation. Unfortunately for learners, their power over the form of a communication system is limited and, for the most part, they must take signs as they find them. Experienced producers have more control, and they find it advantageous to cut corners, to make a diagram instead of a picture, a stylized hand movement instead of a pantomime, a conventionalized sequence of sounds instead of a realistic imitation of a noise. Receivers may understand more easily when the signals are clear and well motivated, but skilled receivers are also producers and once they have learned the code they can generally understand the same conventionalized and arbitrary signs that they can produce. The learner, with no more power over the communication system than our own school children have over the irrationalities of English spelling, must adapt to the abbreviations and conventionalizations that producers find convenient and that skilled receivers have learned to cope with.

Conventionalization speeds up communication and it makes the job of the producer easier, but these are not its only advantages. As signs become standardized they also become less ambiguous, and clarity can be enhanced if the signs are made to contrast with each other. In a highly iconic system, we might expect the signs for ‘cat,’ ‘tiger,’ and ‘leopard’ to be quite similar. The signs for ‘dog,’ ‘fox,’ and ‘wolf’ ought to differ somewhat from those for the cats, but they would differ less from the signs for the cats than from those for body parts or clothing. For practical communication, it is more important to keep similar objects distinct. It is much more dangerous to risk confusing ‘cat’ with ‘leopard’ than with ‘hat.’ The context will generally suggest whether ‘cat’ or ‘hat’ is the intended word. It is less likely to help us decide between ‘cat’ and ‘leopard.’ Too much iconicity invites ambiguity.

With an increasingly complex syntax, conventionalization and arbitrariness would have had still other advantages. People who are clever enough to agree to keep their words in a consistent order should be able to communicate more successfully than those who jumble their words at random. If modifiers are always kept on the same side of the word they modify, listeners will understand them more easily, simply because they will know which word is the modifier and which is modified. Even this modest degree of conventionalization implies the beginning of syntax and rudimentary parts of speech.

Conventionalization and arbitrariness have one final implication. Conventionalization implies that different people and different groups can create quite different conventions to represent the same meaning. Different groups can agree on different shapes for individual words, and different orders in which to string these words together. Conventionalization, then, opens the way for different dialects and languages. I will return to conventionalization in the final section of this paper, where I will suggest some reasons for the decline in the iconicity with which language may have begun.

Motivated Signs in Gorillas, Chimpanzees, and Bonobos.

Human beings easily produce and understand motivated signs of many sorts, but such signs appear to be rare among other species. It is true that ordinary gesture-calls have sometimes been said to be iconic. Since gesture-calls ordinarily develop by the ritualization of some other behavior, they start with the inherent iconicity of instrumental acts. When a dog snarls in order to threaten, his lip is drawn back as if to bite, so a snarl can be interpreted as an icon of a bite.

Nevertheless, even if animal symbols are regarded as iconic, they are also highly stereotyped and the kind of productive iconicity that lets human beings so easily discover similarities between objects, pictures, gestures and sounds, is hard to find among animals. Bonobos, chimpanzees and gorillas, however, have demonstrated a degree of productive iconicity that brings them closer than other animals to humans. Even the ability to recognize their own reflection in a mirror suggests that these animals are closer than any others to the recognition of icons, and their ability to recognize objects, people, and animals in pictures unambiguously demonstrates an iconic capacity that most animals never show. A number of captive chimpanzees have also been reported to have gestured spontaneously in iconic or indexical ways in order to indicate their wants. Viki, the chimpanzee who was raised by Keith and Catherine Hayes (Hayes and Nissen 1971), and who is remembered most often for her failure to learn to talk, was skillful at other kinds of communication.

At times her gestures became very explicit. Watching bread being kneaded, she begged for a sample of dough by going through the motions for a while, and then holding out her hand, palm up, moving her fingers in the gesture which means "give me" to both her species and ours. A similar incident occurred during the weekly ironing as she grew impatient for her turn to do the napkins. She stood on a nearby table, moving one clenched fist slowly back and forth above the ironing board while her other hand tried to take the iron away from "mamma." (Hayes and Nissen 1971: 107).

The Gardners (1969: 670) reported that when Washoe wanted to go through a door she would hold up both hands and pound on it with her palms or knuckles. The Gardners encouraged Washoe to develop this gesture into a sign, but even without the conventionalization that we expect in a sign, it was certainly motivated. Many of Washoe’s other gestures also seem to have been motivated, although this is obscured in the Gardners’ description by their eagerness to interpret them as signs, and by their efforts to persuade Washoe to make them in a way that would conform more closely to the conventional signs of ASL.

More recently, iconic gesturing has been observed in an adult male gorilla named Kubie who lives among other gorillas in reasonably naturalistic conditions in the San Francisco zoo (Tanner and Byrne 1996). Kubie interacted frequently with a female named Zura, and he used iconic gestures to show her what he wanted her to do. By moving his hand downward, either while touching Zura’s body or simply while moving it where she could see it, he indicated that he wanted Zura to move downward like his hand. Kubie also patted his own chest in a gesture that seemed to call attention to himself. He, and to a lesser extent Zura, used many other gestures in what appears to have been a rather subtle form of communication. These gestures were spontaneous, not taught by humans.

Savage-Rumbaugh gives a striking example of the ability of a chimpanzee named Booee to read a novel iconic gesture of a human being.

One time [Booee] hung by his hands from the top of the cage and did a 360° turn while we were playing. I laughed and, wanting to see that again, held up my hand and spun my index finger around in a 360° arc and pointed to the top of the cage. Booee at once grasped my intent and proceeded to repeat his flip for my benefit. This gesture was not one of Booee’s signs. In fact no one had ever made that sort of gesture or request to him before. . . Yet he was immediately able to comprehend the meaning of my gesture–repeat that flip you did up there (Savage-Rumbaugh and Lewin 1994: 36).

When he was a year and a half old, Kanzi, the famous bonobo studied by Savage-Rumbaugh and her colleagues, began to extend his arm, though not his index finger, to indicate the direction in which he wanted to travel. When riding on Savage-Rumbaugh’s shoulders he would sometimes lean his whole body in the desired direction or even forcefully turn her head to show the direction he wanted to go. He used hitting motions to show that he wanted nuts cracked and a twisting motion to show that he wanted a jar opened (Savage-Rumbaugh and Lewin 1994: 134). Even more striking, perhaps, because no human participants are involved, bonobos use iconic hand and arm gestures to indicate the positions they desire the other to assume for copulation (Savage-Rumbaugh and Lewin 1994: 112). Finally, Savage-Rumbaugh (this volume) reports that Kanzi’s half sister, Panbanisha, referred to a visitor as "mushroom" because she recognized the similarity of the visitor’s hairdo to a mushroom.

These examples of motivated signs among apes are all drawn from captive animals, and almost nothing seems to be known about the use of motivated signs in the wild. It is difficult to imagine that the iconic gestures used by Kubie when communicating with Zura, or those used by bonobos during copulation, would be possible unless they were based upon behavior that is used in the wild as well. It must be very difficult to observe motivated signs among wild apes, but so far as I am aware no one has looked very hard for them. As far as we now know, the ability to communicate by means of motivated signs sets apes apart from other animals, even monkeys, but it is an ability that apes share with human beings.

In addition, chimpanzees appear to come closer than other animals to the human ability to conventionalize instrumental gestures. Human children conventionalize the originally instrumental arms-up gesture and use it to make a request. Tomasello and his colleagues (1985, 1989, 1994) have studied a wide variety of gestures that were conventionalized by young chimpanzees living in a semi-natural group at the Yerkes Primate Center Field Station. Some of these young chimpanzees learned to touch their mother’s side in a characteristic way in order to get her attention when they wanted to nurse. They learned to hold their hand in a begging position below the adult’s mouth while looking at the adult’s face when hoping to be given some food. They engaged in a whole series of stereotyped acts in order to invite other young chimps play.

While many of these chimpanzee gestures are used by more than a single individual, they are not universal in the species. They have to be learned in the same way that a human child learns the arms-up gesture, but there is no indication that they are learned by imitation. This means that they lack any kind of traditional continuity within a community, and this makes them quite different from the signs and words of human languages. Individuals differ in the gestures they use, and in the forms of their individual gestures. In particular, the gestures used by a young chimpanzee with its mother can be quite idiosyncratic since the mother and her child use them only with each other, never with other individuals. Like human children who conventionalize a request to be carried, each young chimpanzee is capable of conventionalizing his own gestures and putting them to communicative use.

Comprehension and Production.

I need to digress long enough to deal with an old puzzle that hovers over the first stage of language and that concerns the relation between comprehension and production: What could the first speaker have hoped to accomplish with her first words if no one else could understand them? An answer to this puzzle can begin by remembering the way in which other communicative signals have became established in other species, and the most important observation is that it is not production but comprehension that makes a sign. In every case that we understand, the noises or movements that became communicative signals were first made for reasons other than communication (Tinbergen: 1952). Only later did they become communicative. A dog’s snarl is the classic example.

Snarls began as part of the preparation for a bite. They helped to get the teeth into the proper position, but at first they had neither communicative intent nor result. Only when the victims began to notice that retracted lips tended to be followed by a bite did the movement become communicative. Potential victims might even be able to use the information provided by retracted lips to try to avoid the bite. Then, with victims understanding, the aggressor had a new opportunity. By developing a ritualized lip movement, perhaps more stereotyped and even exaggerated than before, he might reduce the ambiguity of the sign and might even be able to frighten off his enemy while avoiding the riskier activity of biting. The lip movement, originally instrumental, would have evolved into a communicative snarl, sending information that was useful both to the aggressor and to his potential victim. The snarl was ritualized by being built into the genetic endowment of the species so it does not need to be conventionalized by each individual, but the essential point is that it was comprehension that started this process. The movement of the lip could only start to be ritualized after it was understood. Other animal signals seem to have begun in parallel ways, with comprehension as the first and crucial step. Could the communication that we recognize as language also have begun at the point where it was understood rather than the point at which it was produced?

It helps to recognize that, at every stage of language as we use it today, individuals are better at comprehension than at production. Linguists have not always acknowledged the precocity of comprehension, in part because it is much more difficult to study than production, but in part because of a rather behaviorist bias that makes the "behavior" of speaking seem more important, than "passive" comprehension. In spite of pious assertions that their theories are neutral as between speaker and hearer, linguists have generally focused on production. Studies of comprehension are better established in primatology, particularly by the playback experiments of observers such as Cheney and Seyfarth (1990). Savage-Rumbaugh has stressed the importance of studying comprehension if we are to understand ape capabilities (Savage-Rumbaugh et al 1993).

At every stage of their development, children understand more than they can say. It is difficult to prove this to the satisfaction of a hard nosed experimental psychologist, but parents are rarely in doubt. The implication is that much that is essential about language is learned silently as children learn to understand. Speaking may be only the final stage, the point at which language that is already under passive control is finally made active. Even as adults, we can understand more than we can say. We all understand dialects that we cannot produce, and we all understand words that we would not use. In some parts of the world, people learn to understand most of what is said in a second or third language without ever saying much of anything. In New Guinea people sometimes say "I can hear that language but I cannot speak it" (Aram Yengoyan, personal communication).

As soon as we recognize that human beings can understand more than they produce, we ought to wonder how much spoken human language primates can learn to understand. Even if they are incapable of uttering a single spoken word, an ability to understand would demonstrate some genuine knowledge of a language. Numerous anecdotal reports have described captive chimps who, in spite of their inability to say anything at all, appeared to understand quite a bit of spoken human language. These reports have been met with some skepticism, partly for the bad reason that our biases push us to emphasize production, but partly for the much better reason that it really is very difficult to demonstrate comprehension skills. Apes, like people, can infer a great deal from the context in which language is used, so it is always difficult to know how much a listener depends upon context, and how much upon the language itself. Hayes and Nissen suggest that Viki learned to understand a considerable amount of spoken English, but they were so eager to teach her to articulate words, that they did not systematically study her comprehension.

The ability of apes to learn to comprehend a significant amount of spoken language has now been dramatically confirmed by Savage-Rumbaugh and her colleagues (Savage-Rumbaugh et al, 1993) who have worked with Kanzi. At the age of eight, Kanzi was able to respond correctly to a large number of different words and to a considerable variety of spoken sentences. His ability was compared with, and found remarkably similar to, that of a two year-old human girl. Kanzi’s skills impress me as a far better demonstration of linguistic ability than has ever been shown by any nonhuman primate who has been trained to produce language or language-like signals, whether by articulating spoken words, by signing, by manipulating plastic chips, or by pressing buttons. Indeed, Kanzi’s ability to comprehend a human language seems extensive enough to force us to grant him a degree of linguistic competence that linguists, at least, have generally presumed to be exclusively human. Neither Kanzi nor any other bonobo is ever likely to give serious competition to human children who learn a language with such apparent ease, but I do not doubt that Kanzi has learned a good deal of English. This does not really change the evolutionary question. We must still ask how a language using animal like ourselves could have evolved from an earlier animal that lacked language, but we must either suppose that there has been some remarkable parallel evolution of the human and bonobo lines, or push back the beginning of linguistic ability to a point before the split between humans and bonobos.

The priority of comprehension over production suggests that when a few individuals began to produce increasingly word-like signs others would have been able to understand them. This would remove any mystery about the communicative usefulness of the first words or word-like signs, except that it is no more likely that those first signs were produced with communicative intent than was a dog’s first snarl. The question that remains is: Why would anyone produce language-like signs, in the absence of communicative intent?


Up to this point I have tried to remain sober enough to stick to arguments that I am willing to defend, but the time has come to stop being cautious, and in this last section, I offer some speculations about how language might have been launched. The speculations grow out of arguments already offered, and I believe the speculations are not implausible. I claim no more.

At least since the time when Charles Hockett was writing (Hockett 1960; Hockett and Ascher 1964) it has seemed obvious to many workers that if we are to understand language evolution we must explain how primate calls could have evolved into human spoken language. The tradition continues in the enthusiasm with which skilled field workers such as Cheney and Seyfarth (1990) focus on calls rather than gestures, and search for parallels between calls and language rather than between ape and human calls. In searching for similarities between ape calls and language, I believe some primatologists miss the more important continuities between apes and humans. Earlier in this paper I reviewed the profound differences that separate human gesture-calls from language. Human gesture-calls are manifestly homologous to those of apes, but radical changes would be needed to transform a call into anything resembling a word or a sentence. The more promising continuities, I believe, are to be found in the workings of human and ape minds.

I find it much more difficult to distinguish language from the other capabilities of the human mind than to distinguish it from our gesture-call system. The evolution of language has transformed the human mind but it has disturbed our gesture-call system only marginally. If we want to understand how language has changed our species, it is our mind rather than our gesture-call system that requires our attention. Indeed, I believe that the most productive way to understand evolving language, is to see it as emerging as one component of an evolving mind.

Even the fact that deaf people so readily develop sign languages that are as rich and versatile as spoken languages argues strongly that what changed most crucially in the course of human evolution was the human mind rather than a pre-existing call system. People who can hear find it convenient to use their voices for language. People who cannot hear simply use an alternative medium–awkward in the dark or when using tools–but fully equal in most respects to the tasks that we give to any language.

Motivated signs depend on a cognitive ability that human beings have in abundance but that neither humans nor animals need for their gesture-calls–the ability to recognize similarities between a sign and the thing it stands for. This ability is one part of a more general human ability that allows us to make and to understand not only icons and indices, but also analogies and metaphors. Any animal that can produce motivated signs must be able to perceive connections between disparate phenomena: between an object and the gestured shape of the object; between a bird and an imitation of its call; between a pointing hand and the thing it points to. Motivated signs like these become possible only with a growing ability to puzzle out the interrelations among the phenomena of the world. They depend on a recognition of cooccurrence and an increasingly nuanced understanding of cause and effect. Most animals show little ability to use motivated signs, but apes do, and apes have clearly moved part way along the same path that has brought such an extensive ability for motivated signs to human beings.

It now possible to summarize the themes of this paper by proposing several stages by which a species without language might have evolved into a species with language:

1. Let us start with an ancestor living some time back before the split between humans and apes, a species with a versatile mammalian gesture-call system but nothing we would want to call language. Like most other mammals, individuals of this species would have used their gestures and calls to convey a limited amount of referential information and a great deal of information about their own emotions and intentions. They would also have used their bodies in all sorts of instrumental ways, of course, and while their instrumental gestures were not intended for communication, conspecifics and even members of other species would have profited from being able to interpret them. Wilcox (this volume) also points to the importance of "actions with instrumetnal functions" during the early stages of language.

2. Of necessity, these instrumental gestures had to be adapted to the physical world and so they would have been inherently iconic. In a species that, we assume, was becoming increasingly adaptable and increasingly dependent on learning and on living by its wits, those individuals who were most skillful at interpreting each other’s instrumental gestures should have been advantaged. Chimpanzees are so famously skilled at interpreting the behavior of conspecifics that they find it profitable to act so as to deceive one another but they must also find it profitable to comprehend well enough to detect attempted deception by others (Byrne and Whitten 1988). Improving comprehension would have allowed, and encouraged, the conventionalization that makes gestures easier to produce and that reduces their ambiguity. Chimpanzees, as Tomasello and his colleagues have shown, have reached this stage.

3. The growing ability to understand instrumental gestures with their inherent iconicity would have fostered, in turn, the ability to produce iconic signs for specific communicative purposes. Chimpanzees and bonobos, who can understand and produce iconic signs, appear to have reached this stage, although their abilities have been demonstrated more clearly in captivity than in the wild.

4. Individuals who were skillful at learning to understand conventional and iconic signs would have had an advantage over those less well endowed. Those who could use imitative learning as a shortcut to the acquisition of these signs would have been particularly advantaged. Individuals who could learn by imitation would no longer need to invent all of their conventionalizations. It has been difficult to find convincing examples of wild primates that learn communicative signals by imitation. Indeed, a good deal of skepticism has recently been expressed about the ability of primates, even of apes, to imitate much of anything (Visalberghi and Fragaszy 1990; Tomasello 1996), but imitation is essential for our kind of language. Only with imitation, did traditional transmission become possible.

5. It is easier to outline a hypothetical sequence for the evolution of visible signs than for audible signs, but since the basic adaptations are cognitive, we should expect that the increasing flexibility of visual communication would have entailed a corresponding flexibility in audible communication. The very close parallels between visible and audible communication that I outlined earlier suggest that they evolved together. At some point, our ancestors would have needed to develop better voluntary control over what has become the vocal tract than our even more remote ancestors once had, but this should be seen as an adaptation to improved communicative ability, rather than as a prerequisite for it.

6. As learning by imitation became increasingly important, the retention of iconicity would have mattered less, and the stage would be set for conventionalization to move on to the point of arbitrariness. The development of arbitrariness, however, could have been a very long process. The degree of arbitrariness that took a few generations for American Sign Language and a few centuries for writing may have required tens of thousands of generations for spoken language. We know by the results that conventionalization and arbitrariness won in spoken language, just as they later won in writing and as they continue to win in sign language, but we do not know the steps by which this happened during the early stages of language. With arbitrariness came the likelihood that different groups of people would settle on differing conventions. Differing dialects and languages would have become possible.

7. Clarity of communication would have been improved if the increasingly arbitrary signs could be kept safely distinct from one another. Contrast would be especially important for signs whose meanings could be easily confused, so contrast would have encouraged arbitrariness. Early contrast, however, did not require duality of patterning.

8. The capacity for storing and retrieving what I will now call "words" must have grown relentlessly. Modern humans are able to learn tens of thousands of distinct words, and this ability could only have developed through a long evolutionary process. Successive generations of our ancestors must have passed through many stages of growing vocabulary size. The time when humans and apes first started to diverge may be as recent as five million years ago, hardly a long period in which to have evolved our astonishing capacity to learn and to store words. If bonobos can store as many words as is suggested by Kanzi’s receptive ability, however, the expansion of storage capacity would seem to have begun well before the point of divergence.

9. An expanding vocabulary entails, among other things, a phonology. A few score words, conceivably even a few hundred, could have been kept distinct in the same way that we still keep our quotable gestures and quotable vocalizations distinct. Each word would have its own form and its own meaning, but even these early words could have been in contrast, just as our quotable vocalizations and gestures are still in contrast, without any need to build up the words from smaller distinctive phonological units. As the capacity for learning words expanded, however, a point would have come when their numbers would require a more orderly way to keep them all distinct. This implies the need for a phonological system built with such units as the distinctive features of spoken language, or the contrastive hand shapes, orientations, and movements of sign language. Perhaps there are other ways in which thousands of words could be kept distinct, but the method with which evolution has endowed us, is the capacity to use a contrastive phonological system.

10. There would be no reason for vocabulary to keep expanding unless it helped people to think and communicate about ever more complex matters. At some point, thought and communication would have been made richer by the ability to use words in orderly combinations, and syntax would be born. It is hard to imagine any basis by which people could start to join words except in motivated ways. (Armstrong, Stokoe and Wilcox 1995 suggest ways in which motivated gestures could have fostered the growth of syntax. See, especially, Chapter 7 "The Origin of Syntax".) Words used in temporal sequence could have reflected the temporal sequence of the speaker’s experiences just as they still do. Objects closely related in the world could be named by words placed close to each other in speech.

By the time a species had passed through this sequence, most of the general features that we associate with human language would have been in place, although an enormous number of details remain to be accounted for. The sequence is a hypothetical one, of course, but the individual steps seem plausible. The sequence avoids entirely the mysterious process that would be required to turn a gesture-call system into language, but it ties every stage of developing language to expanding cognitive abilities. Motivated signs may have played an important role at the launch, but we find good reasons for their partial replacement under the pressures of conventionalization and arbitrariness.

The ability to use motivated signs was probably one important prerequisite for language, but I do not mean to suggest that it was the only one. The mind is a complex organ. Its evolution must have responded simultaneously to many selective pressures and I see no reason to single out just one as central. Calvin, for example (1983, 1987) has suggested that precision throwing required the nervous system to evolve in ways that could also have supported linguistic ability. Wallace (1989, 1992) has pointed to the importance of cognitive mapping as a component of developing cognitive skills. Dunbar (1993) has argued that early language replaced grooming as a means of social bonding. Donald (1991) has argued for the importance of mimesis as a prerequisite to language. Sorting requires the same kind of sharp discriminations that are needed to assign names to objects, and Hayes and Nissen (1971) have described the enthusiasm with which the chimpanzee Viki sorted small objects. All of these, and many other evolving cognitive skills, could have contributed to a growing ability to use language. I see no inconsistency in supposing that they all worked together.

We would like to know much more than this about the sequence of stages through which evolving language passed, but this should be enough to suggest that, while modern spoken language seems to be characterized more by arbitrariness than by motivated signs, a stage during which communication depended upon motivated signs could have helped to push our ancestors onto the path that led, after a long evolution, to the ability to learn a language of the type we have today.

I hope that primatologists will look more closely at motivated signaling, not only among captive animals but among those in the wild. The ability to understand and use motivated signs seems to distinguish primates, or at least gorillas, chimpanzees, and bonobos, from other mammals. The use of motivated signs is a plausible step along the path toward language. By learning more about the cognitive foundation of the anthropoid ability to produce iconic and indexical signs, we should improve our model of the cognitive abilities of our own ancestors.


Armstrong, D. F., Stokoe, W. C. & Wilcox, S. E. 1995. Gesture and the Nature of Language. Cambridge: Cambridge University Press.

Bolinger, D. 1978. Intonation across languages. In Universals of Human Language, II: Phonology (Ed. by J. H. Greenberg et al), pp. 471-524. Stanford: Stanford University Press.

Bolinger, D. 1982. Intonation and its parts. Language, 58, 505-533.

Bolinger, D. 1985. The inherent iconism of intonation. In Iconicity in Syntax: Proceedings of a Symposium on Iconicity in Syntax, Stanford, June 24-26, 1983 (Ed. by J. Haiman), pp. 97-108. Amsterdam: J. Benjamins.

Bolinger, D. 1986. Intonation and its Parts: Melody in Spoken English. Stanford: Stanford University Press.

Bolinger, D. 1989. Intonation and its Uses: Melody in Grammar and Discourse. Stanford: Stanford University Press.

Brown, P. & Levinson, S. C. 1987. Politeness: Some Universals in Language Usage. Cambridge: Cambridge University Press.

Burling, R. 1993. Primate calls, human language, and nonverbal communication. Current Anthropology, 34, 25-53.

Bybee, J. 1985. Diagrammatic iconicity in stem-inflectional relations. Iconicity in Syntax: Proceedings of a Symposium on Iconicity in Syntax, Stanford, June 24-26, 1983 (Ed. by J. Haiman), pp. 11-47. Amsterdam: J. Benjamins.

Byrne, R. W. & Whitten, A. 1988. Machiavellian Intelligence. Oxford: Clarendon Press.

Calvin, W. H. 1983. A stone’s throw and its launch window: timing precision and its implications for language and hominid brains. Journal of Theoretical Biology, 104, 121-135.

Calvin, W. H. 1987. The brain as a Darwin machine. Nature, 330, 33-34.

Cheney, D., & Seyfarth, R. M. 1990. How monkeys see the world: Inside the mind of another species. Chicago: University of Chicago Press.

Donald, M. 1991. Origins of the Modern Mind: Three Stages in the Evolution of Culture and Cognition. Cambridge, Mass.: Harvard University Press.

Dunbar, R. I. M. 1993. Co-evolution of neocortical size, group size and language in humans. Behavioral and Brain Sciences, 16, 681-735.

Ekman, P. (Ed.). 1982. Emotion in the Human Face, 2nd. ed. Cambridge, England: Cambridge University Press.

Ekman, P. 1994. Strong evidence for universals in facial expressions: a reply to Russell’s mistaken critique. Psychological Bulletin, 115, 268-287.

Ekman, P., Friesen, W. V. & Ellsworth, P. 1972. Emotion in the Human Face: Guide-lines for Research and an Integration of Findings. New York: Pergamon Press.

Ekman, P. & Friesen, W. V. 1969. The repertoire of non-verbal behavior: categories, origins, usage, and coding. Semiotica, 1, 49-98.

Fridlund, A. J. 1994. Human Facial Expression: An Evolutionary View. San Diego: Academic Press.

Frishberg, N. 1975. Arbitrariness and iconicity: historical change in American sign language. Language, 51, 696-719.

Gardner, R. A. & Gardner, B. T. 1969. Teaching sign language to a chimpanzee. Science, 165, 664-672.

Gibson, K. R. & Ingold T. (Eds.). 1993. Tools, Language and Cognition in Human Evolution. Cambridge: Cambridge University Press.

Givon, T. 1989. Mind Code and Context: Essays in Pragmatics. Hillsdale, N.J.: Erlbaum Associates.

Goldin-Meadow, S. 1993. When does a gesture become language? A study of gesture used as a primary communication system by deaf children of hearing parents. In Tools, Language and Cognition in Human Evolution (Ed. by K. R. Gibson & T. Ingold), pp. 63-85. Cambridge: Cambridge University Press.

Haiman, J. 1985. Iconicity in Syntax: Proceedings of a Symposium on Iconicity in Syntax, Stanford, June 24-26, 1983. Amsterdam: J. Benjamins. 73-95.

Hayes, K. J. & Nissen, C. H. 1971. Higher mental functions of a home-raised chimpanzee. In Behavior of Nonhuman Primates (Ed. by A. M. Schrier & F. Stollnitz), pp. 59-115. New York: Academic Press.

Hinton, L., Nichols, J. & Ohala, J. J. (Eds.). 1944. Sound Symbolism. Cambridge: Cambridge University Press.

Hockett, C. F. 1960. The origin of speech. Scientific American, 203 (October), 89-96.

Hockett, C. F. & Ascher, R. 1964. The human revolution. Current Anthropology, 5, 135-68.

Hooff, J. A. R. A. M. van. 1972. The phylogeny of laughter and smiling. In Non-Verbal Communication (Ed. by R. A. Hind), pp. 209-241. Cambridge, England: Cambridge University Press.

Hooff, J. A. R. A. M. van. 1976. The comparison of facial expression in man and higher primates. In Methods of Inference from Animal to Human Behavior (Ed. by M. von Cranach), pp. 165-196. Chicago: Aldine.

Izard, C. E. 1994. Innate and universal facial expressions: evidence from developmental and cross-cultural research. Psychological Bulletin, 115, 288-299.

Jakobson, R. 1965. Quest for the essence of language. Diogenes, 51, 21-37.

Jakobson, R. 1966. Implications of language universals for linguistics. In Language Universals, with Special Reference to Feature Hierarchies (Ed. by J. Greenberg), pp. 208-219. The Hague: Mouton.

Kendon, A. 1992. Some recent work from Italy on quotable gestures (emblems). Journal of Linguistic Anthropology, 2(1), 92-108.

Kendon, A. 1988. How gestures can become like words. In Cross-Cultural Perspectives in Nonverbal Communication (Ed. by F. Poyatos), pp. 131-141. Toronto: C. J. Hogrefe.

Kendon, A. 1993. Human gesture. In Tools, Language and Cognition in Human Evolution (Ed. by K. R. Gibson & T. Ingold) pp. 43-62. Cambridge: Cambridge University Press.

Klima, E. S., & Bellugi, U. 1966. Syntactic regulation in the speech of children. In Psycholinguistic Papers (Ed. by J. Lyons and R. J. Wales), pp. 183-203. Edinburgh, Scotland: Edinburgh University Press.

Klima, E. S., & Bellugi, U. 1979. The Signs of Language. Cambridge, Massachusetts: Harvard University Press.

Kramer, S. N. 1963. The Sumerians: Their History, Culture, and Character. Chicago: University of Chicago Press.

Lakoff, R. 1975. Language and a Woman’s Place. New York: Harper & Row.

Matthews, P. H. 1991. Morphology. Second edition. Cambridge: Cambridge University Press.

McNeill, D. 1992. Hand and Mind: What Gestures Reveal about Thought. Chicago: University of Chicago Press.

Noble, W. & Davidson, I. 1996. Human Evolution, Language and Mind: A Psychological and Archeological Inquiry. Cambridge: Cambridge University Press.

Ohala, J. J. 1994. The frequency code underlies the sound-symbolic use of voice pitch. In Sound Symbolism (Ed. by L. Hinton, J. Nichols, & J. J. Ohala), pp. 325-347. Cambridge, England: Cambridge University Press.

Peirce, C. S. 1940. Logic as semiotic: the theory of signs. In The Philosophical Writings of Peirce (Ed. by J. Buchler), pp. 98-119. Dover edition, 1955. New York: Dover.

Russell, J. A. 1994. Is there universal recognition of emotion from facial expression? a review of the cross-cultural studies. Psychological Bulletin, 115, 102-141.

Russell, J. A. 1995. Facial expressions of emotion: what lies beyond minimal universality?. Psychological Bulletin, 118, 379-391.

Saussure, F. de. 1959. Course in General Linguistics. New York: Philosophical Library.

Savage-Rumbaugh, E. S., Murphey, J., Sevik, R. A., Brakke, K. E., Williams, S. L., & Rumbaugh, D. M. 1993. Language Comprehension in Ape and Child. Monographs of the Society for Research in Child Development. Serial No. 233, Vol. 58, Nos. 3-4, 1993.

Savage-Rumbaugh, S., & Lewin, R. 1994. Kanzi: The Ape at the Brink of the Human Mind. New York: John Wiley & Sons.

Slobin, D. I. 1985. The child as a linguistic icon-maker, In Iconicity in Syntax: Proceedings of a Symposium on Iconicity in Syntax, Stanford, June 24-26, 1983 (Ed. by J. Haiman), pp. 221-248. Amsterdam: J. Benjamins.

Tanner, J. E. & Byrne, R. W. 1996. Representation of action through iconic gesture in a captive lowland gorilla. Current Anthropology, 37, 162-173.

Tinbergen, N. 1952. Derived activities: Their causation, biological significance, origin and emancipation during evolution. Quarterly Review of Biology, 27, 1-32.

Tomasello, M. 1996. Do apes are? In Social Learning in Animals: The Roots of Culture. (Ed. by C. Heyes and B. Golef). San Diego: Academic Press.

Tomasello, M., Call, J., Nagell, K., Olguin, R., & Carpenter, M. 1994. The learning and use of gestural signals by young chimpanzees: A trans-generational study. Primates, 35, 137-154.

Tomasello, M., George, B. L., Kruger, A. C., Farrar, M. J., & Evans, A. 1985. The development of gestural communication in young chimpanzees. Journal of Human Evolution, 14, 175-186

Tomasello, M., Gust, D., & Frost, G. T. 1989. A longitudinal investigation of gestural communication in young chimpanzees. Primates, 30, 35-50

Visalberghi, E. & Fragaszy, D. M. 1990. Do monkeys ape? In "Language" and Intelligence in Monkeys and Apes. (Ed. by S. T. Parker & K. R. Gibson). Cambridge: Cambridge University Press.

Wallace, R. 1989. Cognitive mapping and the origin of language and mind. Current Anthropology, 30, 518-526.

Wallace, R. 1994. Spatial mapping and the origin of language: A paleoneurological model. In Studies in Language Origins Vol. 3. (Ed. by J. Wind, A. Jonker, R. Alcott, & L. Rolfe). Philadelphia: J. Benjamins.