Draft: a speech recognition telephone call-flow

Read it aloud!

(Or better, have an assistant read it aloud while you don't look at the screen.)

I'm making your lunch and you need a fruit.

Do you want: peaches, mandarin oranges, or an apple?


Could you repeat that? (short pause)

I have to put some fruit in your lunch.

Now, do you want peaches, (short pause)

mandarin oranges, (short pause)

or an apple?


[Timeout/NoResponse] [Unrecognizable sounds] ["OK/yes"] ["No"]
["Apple" clearly (or "No, I want apple", or "No, I want an apple", etc.)] ["Apple" indistinctly]
["Mandarin Oranges" clearly (or "No, I want oranges", or "No, I want mandarin oranges", etc.)] ["Mandarin Oranges" indistinctly]
["Peaches" clearly (or "OK, peaches", or "All right, peaches", etc.)] ["Peaches" indistinctly]
["Fruit", "Mixed fruit", "Fruit cup", "I want a fruit cup", etc.]
["Banana"] ["Pineapples"]
["Nothing", "I don't want any", "I don't want anything", "No fruit", etc.]
[4th unrecognized or low-confidence response] ["Whatever!"]
(Still checking all these...)

(This dialogue elicits an answer to a single question of only three choices.

The simplicity of questioning hides the complexity of programming.

Speech recognition takes more than ten times the design work,

and especially long user-testing cycles, compared with touchtone.

The caller could say anything, even if the best choices are clearly presented.

The program has to be able to handle anything gracefully,

with a "grammar" and lexicon of recognizable phrases,

steering the caller gently back toward the deliverable features.

At the same time, the system has to sound as if it has infinite patience and caring politeness,

especially when the caller gets upset with failed recognition steps. It dare not sound annoyingly repetitious.

The pacing and inflection of every word must be controlled carefully,

to keep the dialogue sounding naturally conversational,

and to stay within the system's intended "persona" and style.

It can take a day or more just to get a single question into good shape:

with retry phrasing, error trapping, timeouts, and the anticipation of "wrong" but reasonable answers....

plus another several days for coding and initially testing each improvement!

Good systems then go through multiple rounds of testing and tuning, analyzing actual call recordings,

adjusting the questions, adjusting the recognizable phrases and confidence thresholds, etc.)

Personal style:

I generally don't like to hear a strong "personality" coming from the computer.

It's a neutral device and doesn't have emotions.

I like the prompts to be even-keeled, without much pitch inflection.

When I'm calling a system as a customer, I don't need to be entertained, validated,

patronized, or congratulated for choosing that company's service.

I especially don't need to be thanked for cooperating with a rigid machine!

I just need my information delivered quickly, courteously, and with neutral emotions.

It's a machine serving me. It needn't work hard trying to engage me in conversation,

or stroke my presumed feelings. It should just obey my commands efficiently.

I'll be reworking all of this further, as I just finished reading an inspring book about it:

Bruce Balentine's It's better to be a good machine than a bad person.

An especially necessary point is that all the prompting strategy must prevent errors

from escalating into an unrecoverable or a frustrating mess.

[Restart] [Bradley Lehman Resume] The simpler IVR example is also available, for comparison.

[My automated phone manifesto (and IVR/VUI design principles)!]

© 2008 Bradley Lehman