[IVR example] [VUI example] [video 1] [video 2] [Bradley Lehman Resume]


Manifesto: automated telephone presence
as if it were music or radio

- Bradley Lehman, May 2008

I'm a performing musician and a composer. Some of the music I perform is fully written-out. Some of it is improvised from only a bass line, shorthand symbols, experience, and listening skills. Whatever is written on the page, it only matters if it affects the resulting sound in a performance. The performance is a crafted stream of sound, constrained to make sense as sound alone: with no visual component either to aid or distract from the delivery. Timing, articulation, pitch, and pacing are some of the most important tools.

Good radio programming is more compelling than television, because it engages my attentive imagination. Television makes things visually explicit, encouraging a more passive experience. The medium of radio forces the soundstream to stand alone, on merit.

Telephone systems (both touchtone and speech recognition) must deliver their interactive information only through sound. The system presents a context and some options, and the caller participates by choosing some path through them. Everything must be clear on the first time it's heard. As a musician I assess things by the sound. I listen for delivery, pacing, intelligibility, and the telephone system's ability to supply context. Poorly-chosen ideas, clunky pacing, and confused phrasing bother me as much on the phone as they do in mediocre music. Didn't the designers, programmers, or company care enough to supply a good product? Is telephone support important to their continued business, or not?

The more I listen to telephone systems, some good and some unimpressive, the more I feel the phrase crafting and editing should be done by sensitive musicians. So should the voice coaching when the recordings are made. A well-crafted soundstream is that important. It advertises a company's attention to detail, and commitment to customer service.

Some favorite books and other resources behind my IVR design ideals:

  • Balentine, Bruce. It's better to be a good machine..., 2007. Outtakes and a summary....
  • Harris, Randy Allen. Voice Interaction Design: Crafting the new conversational speech systems, 2005. (Linguistics, vocabulary, turn-taking, speech patterns, novice/expert expectations, "Wizard of Oz" testing, etc.)
  • Cohen, Michael, James Giangola, and Jennifer Balogh (the Nuance people). Voice User Interface Design, 2004.
  • Weinschenk, Susan and Dean Barker. Designing Effective Speech Interfaces, 2000.
  • Kotelly, Blade. The Art and Business of Speech Recognition: Creating the noble voice, 2003.
  • Chen, Fang. Designing Human Interface in Speech Technology, 2006. (Caller attention/distraction, auditory bottlenecks, error handling, evaluation methods, etc.)
  • Pink, Daniel. A Whole New Mind: Why right-brainers will rule the future, 2006. (It's about whole-brain design for human usability, not following left-brained programming patterns of a soon-obsolete paradigm.)
  • Years of listening to poorly-designed phone systems, as a customer.
  • Observing and testing other callers as they try to use phone systems (physically watching them, and studying call recordings to find patterns in problematic points).
  • Nine years of professional experience building better ones, and supporting or upgrading old ones.
  • Daily interaction with my children under age 6, noticing how they learn speech and decision-making processes. Also, reading children's books aloud daily. Good stories have clear sentences, and a fine balance of novelty and repetition for hearing comprehension.
  • Testing systems with eyes closed, as much as possible.

There are more to add to this list, as I continue to study and apply other people's theories of Best Practices. There is always more to learn, and to bring into practical approaches along with my own experiences. Disputes of style will always be with us, just as they are in music.

It's stimulating to identify practical examples that sound "wrong", anywhere that they may be found, and to seek strategies of improvement. It's also stimulating to hear problems that have been solved inspiringly well.

I like to ask acquaintances of all ages, and especially retired people, what's most unpleasant about phone systems they have encountered.

"Our menus have changed. Listen closely to the following fourteen options:" (although humans usually can't remember more than seven or eight...). And many of these items belong in more than one category, which demonstrates that an enforced hierarchy of information doesn't capture reality adequately.

Cognitive dissonance

  1. Choice 1 inviting me to press it "for all general questions", before I get to hear what the other four or five choices are going to offer me on maybe answering my specific question. How am I supposed to know if I have a "general" question from the company's point of view, before I make the probably wrong choice? [Example: Arizona Tea, 800-832-3775]
  2. If "Your call is important to us", why doesn't somebody answer it?
  3. The announcement says the service department opened at 7 a.m. and the main office opens at 8 a.m. It's 7:30 now. Why doesn't it give me any way to transfer to the service department before 8?
  4. I need a "None of those" option on the menu, because my perfectly reasonable question doesn't fit into any of those categories. What should I do now?
  5. Menus with way too many choices. Can't remember them all from the beginning to the end of the list, much less make a carefully considered decision about which one I want.
  6. Menu choice numbers that jump around, like 3 first and then 1, 2, 4. If they want me to hear choice 3 first, make it choice 1!
  7. "For information about an existing account, press 1." I pressed 1 and it said, "Please try again! To get information about an existing account, press 1." The second time, it took it. Why not the first time?
  8. "Listen carefully as our menu options have changed" is terrible, because I never called them before. Changed from what? Why do I care? If they want me to listen carefully, they should just give their information and choices clearly. "Our system was a disorganized mess before, but for you losers who memorized parts of it, forget what you know and listen so you can memorize our supposedly better one."
  9. I hit 4, which was not on the announced menu, and it said: "If you know your extension, please enter it now...." Why are there secret choices? And yeah, I know my own extension, but how's that relevant?
  10. "Please press or say your note number." I do nothing for five seconds, trying to find the paper with my "note number" on it, and the system says, "The note number entered is not valid." What note number entered? Nothing?
  11. "To repeat this INFORMATION, say Repeat", emphasizing the word "INFORMATION" with a big singsongy pitch inflection, instead of "To REPEAT this information". It's in a voice-recognition system where the command they want is "Repeat". Also, the caller isn't allowed to interrupt the system with a choice while it's speaking text-to-speech. [Example: Lincoln dealership locator, 800-521-4140, 1 for English, 5 for Other, 1 for automated system, 1 for Dealer Locations.]
  12. "For all other options, including the blah-de-blah-de-blah-blah-blah, press 4." If I'm really calling to find the blah-de-blah-de-blah-blah-blah, since it wasn't mentioned in options 1 to 3, couldn't I figure out that "all other options" might include it? If they really want to promote blah-de-blah-de-blah-blah-blah, shouldn't it be its own choice on the list? [Example: BJ's Wholesale Club, 800-257-2582]
  13. "To be taken off our calling list, press 1", and then instead of politely hanging up forever when I press 1, it gives me an advertisement to enroll in their services! If they're going to be that way about getting their victims off their list, why don't they say: "To be removed from our calling list, press 5 with a finger of your choice. (...) I'm sorry, that was not the correct finger. Please try again. And press it harder, three times in a row. That might work. No, it won't. We're really just messing with you. Have a nice day." At least the irony would be amusing. Wicked, bad, evil Zoot.
  14. "This call may be monitored or recorded" in a stern-sounding male voice before the main part of the system even greets the caller, or says the company name. Big Brother is watching me already, before the nice lady picks up my ringing phone call? [Example: Country Living, 800-888-0128]
  15. Their web site told me to call this phone number for information, and then the phone system says as its first thing to go see their web site. Bite me, Ouroboros.
  16. Dialing the phone number from the can or box of a product, the greeting is for some other product. [Example: 866-297-6682 (866-AW-ROOTBeer) on a can of A&W Root Beer, the greeting is about Dr Pepper and Snapple. 866-AW-ROOTBEER seems it should be a number dedicated to root beer, doesn't it...even if DrP/Snapple owns them?]
  17. Use of "looking for" on the phone. I'm not looking. It's a phone. I might be "calling to find" their product, but I'm not "looking for" one...and how would a blind caller take it?
  18. I listened through all four of the choices, and none of them were what I wanted. So, I sat there and did nothing. I thought maybe it would transfer me or something. Less than five seconds later it said (with a noisy recording), "Your entry is invalid! Please try again!" But I didn't enter anything! [Example: Deer Park Water, 800-288-8281]
  19. When it asks me to enter an amount of money on the keypad, how do I enter the cents?

It's a medium of sound, not visual

  1. "To go back to the main menu, press pound" is irritating because I shouldn't have to visualize or memorize any of their menus. I don't want menus. I want personal service.
  2. A company's web site is hopelessly clogged with distracting junk, but it's even worse on the phone because you can't skip ahead through the junk.
  3. When I call their company I'm not looking at any computer screen, or any notes on paper. But, their phone systems assumes I have my whole account history memorized or something.
  4. Convoluted phrases forcing the caller to figure out subordinate clauses, or complicated logical conditions. It's just bad writing for a spoken medium. The caller is not looking at the sentences, and can't skip back and forth through their clauses visually to figure it out. The phrases in a phone call come at a fixed speed and are heard only once, twisty bits and all. Government applications tend to be especially bad at this convoluted stuff, covering all their conditions. Didn't the bureaucrats listen to their phrases when drafting them? Didn't they try them out on people with only 8th grade educations? Perhaps they could hire a copy editor of news-radio stories for a couple of days, and get those prompts trimmed down to pithy sentences.
  5. Playback of address information from a list, not telling you what city it's in. It announces streets you've never heard of, first, and finally clarifies things by announcing the name of a city 70 miles away. If it would announce the city name first, and then give an option to hear details or skip ahead to something else, I wouldn't waste my time hearing the full reading of the street or phone number. [Example: the Toys-R-Us store locator, 800-869-7787]
  6. "The criminal case against a Kentucky man accused of scamming the Timberville Police Department by raising money for calendars he never produced can now move forward following his arrest after two years on the lam." That's the opening sentence from a front-page news story. It's a bad sentence for a print medium, but at least the eye can hop around a few times (backward and forward), and the referential meaning of all the clauses can be figured out...eventually. It's an impossible sentence for reading the newspaper aloud, or for a radio broadcast. It can scarcely be spoken in one breath. The listener has to keep pushing the clauses into short-term memory while waiting for the verb "can...move". The listener then has to wait further for the end of the sentence, and finally organize all this material mentally.

    For a speech medium, the story would have to be organized in shorter and more coherent bursts. For example: "A criminal case against a Kentucky man can now move forward. The man is accused of scamming the Timberville Police Department. He allegedly raised money for calendars, with no intention of producing them. He was indicted in April 2006 with 15 felony counts for this activity. He has been arrested again this month, after two years of continuing these alleged calendar scams in other states. He is now extradited to face the 15 local Timberville charges from 2006." (It took ten minutes to rewrite that, carefully reading up and down through the printed story's first four sentences! Why didn't the reporter write it this way in the first place? It's easier not only to hear, but also to read in print.)

Style

  1. Give me an inert machine. My microwave and coffee maker don't say they're "sorry" when I hit a button they didn't expect. Why should my phone handset pretend to be human? It's just a box of buttons.
  2. Voices that sound too cheery, or not old enough, or not intelligent enough to route my call correctly. I hope the company wouldn't hire a real "airhead" as a receptionist, but why did they make the auto-attendant sound like one?
  3. Voices that sound like they have less than zero sense of humor. The company could lighten up a little bit, couldn't they?
  4. If they have a Main Menu I can get back to, to start over, at least call it something other than a "menu". Real people don't speak in menus in real conversation. They rephrase the choices in a different way, or even a different sequence. Listen to a decent waiter/waitress describing the salad dressings. [See also my VUI demo on this.]
  5. Too much fake personality in the recorded voice, with the pretense of being "sorry" and having other emotions when there's been an error. The computer doesn't have emotions and isn't good at reading any human emotions, either. "Ohhhh...I'm sorry! Well, I can still help you with that!" The company was apparently trying too hard on personality/image, and not hard enough on getting me my information quickly. I'm not going to be upbeat and chipper when interacting with the computer, even if the computer is pretending to be eager to please. I'm keying or speaking commands. It's a computer. It's not a person. It's not even a pet. It's supposed to elicit my commands and then follow my instructions obediently. I don't need any emotional reaction to my commands, or to be thanked for them. So, I'd rather have short and emotionally-neutral prompts without too much pitch inflection. I heard one last week where I picked the option for "cancel service" on the system, and it said, "Oh. You want to cancel your service!" with a crestfallen tone. What, the computer's personally disappointed it will never get to talk to me again?
  6. A recording tells me "thank you" for pressing a number. Thank you for what? For following instructions? It's patronizing.
  7. Chatty slang like "I'd love to get your digits!" My hands are cold, ma'am. Still want them? [Example: Tecmo Games really uses that on their web site. Apparently it is a current phrase to request someone's phone number. All righty....]
  8. Ludicrous amounts of pitch inflection in the announcer's voice, making it all sound insincere. It sounds like the type of announcing in really bad local-TV ads. "Browse on over to" our website...oh, get a grip and lose the fake chattiness. "For all other inquiries, please press 4." I press 4, and the call dies, going to a busy signal. [Example: Spirit Airlines, 800-772-7117]
  9. Bureaucrat-speak: "We are currently assisting other customers. Your call will be answered in the order in which it was received." If the system already knows how many "other customers" are waiting in line for service ahead of me, kept in strict order for turns, could it at least give a hint how long it'll take? I know companies are busy. If I'm in line at the post office I can at least see how long the line is in front of me, and decide what to do with my time. And "currently assisting" is stiff, with a fake formality. So is "the order in which". If they're going to be vague about the waiting time on the phone anyway, I'd rather hear: "Please hold on, and someone will speak with you as soon as possible. Our people are still helping other earlier callers."
  10. Recorded persona too cheery. Totally fake and over-acted. "Wwwwwwelcome!!!! [hee HEE!] I can help with that!!!!! OK, no problem!!! I can use your credit card number instead!!!" all spoken with a twelve-inch grin in the voice. Hi, I'm Beckiee!!! I'm really 48 but I'm trying to sound 20!!! Plus my real name is Mary, but Beckiee sounds perkier!! OK, enough. The point is: all that caffeinated emotion in there sounds insincere and patronizing. It's a computer. It shouldn't even sound pleased or amused that I'm calling, let alone eager to make my day. [Example: Blockbuster, 866-692-2789. Sometimes you get this over-acted speech recognition system, and sometimes you get straight to the slightly calmer touchtone system. It's random. That's weird.]
  11. It's a phone. Speech is conversational. Let the prompts sound like ordinary talking by intelligent adults. Pedantically correct grammatical constructions get in the way.

Unfocused composition

  1. The first announced option in the call is to press 5. The second option is a long ad to go look at their web site instead of calling. And finally, the third option is to press 4. Boggle me. [Example: Evenflo parent resources, 800-233-5921]
  2. Getting stuck in a menu with no way to get out, or to get to a live person. Have to hang up and start over after wasting a lot of time already. [At least some good live-person shortcuts are collected at GetHuman.Com]
  3. Have to listen to a spoken advertisement about how great the company is, instead of getting service!
  4. In a speech recognition system, when it seriously tries to confirm a spoken birthdate's year as 2019...who programmed this? Shouldn't it sift out the impossible first, and admit that it didn't hear the answer correctly?
  5. You go through four or five layers of menu options and it still sends you to the wrong person, so they have to transfer you while you wait more.
  6. The phone system reads me a long web address I don't care about. What, are they telling me not to call them anymore? They don't want to serve me unless I'm sitting at a computer? I dialed them on the phone, taking up my own time. I want my answer over the phone.
  7. I picked some choice, and then the next announcement gave me no idea where I was. It should at least speak back what I chose.
  8. Systems asking me to enter my home phone number before they offer me any choices of useful information. Why do they need to know my number? And if they really need it, why don't they just sense it directly from the number I'm calling from? [That's the "ANI" feature. Some telephony switches have it and some don't.]
  9. "For information on blah blah blah, press 1." I press 1, and then it says, "For information on blah blah blah, please call:" and then a different company name and their phone number! How am I supposed to know ahead of time to write any of that down? And why did the first prompt lead me to believe I'd find information here? Don't make it the customer's problem when merged companies can't get their own acts together into a well-organized presentation. [Example: GE Appliances, 800-626-2005]
  10. "If you have a question about the Reward Zone program, you can visit [URL] or press 1. For status of your rebate, or to obtain a rebate form, press 2. For the status of your order, you can visit [long URL] or press 3. For store hours and locations, press 4." Gag. It begs me to bag this phone call and go to their web site during the prompts? [Example: Best Buy, 888-237-8289] Let's try a simple rewrite: "For the Reward Zone program, press 1. For rebates, press 2. To check on something you ordered, you can press 3. Or, to find a store and its hours, press 4. Our web site can also help you with all these things. Visit us at [URL]."
  11. Calling any business and hearing it advertise their web site at all. Business and product web sites have been around for more than 15 years already, and their corporate URLs are usually published at the same places the phone numbers are. If I wanted to "visit them on the web" I would have already done so. Or, maybe I already did so and chose to phone them anyway. Anybody who really wanted to find their web site could get it through a search engine, right? Don't waste phone time on it, especially to tell me to contact them by their web site or e-mail instead of this phone call.
  12. Prompts that editorialize about the products. "Are you looking for one of our great products for the U.S. market? Press 2 now to access our Product Locator Services." I'm calling because I'm already already familiar with your product, and I like or dislike it. I want to say my piece about it, and feel that the company is listening to my opinion instead of pressing their own. [Example: Dr Pepper / A&W again]
  13. Waiting through 15 to 30 seconds at the start of the call, while it tells me how great the company is (in their own opinion!), or how proud they are to yadda-yadda-yadda. [Example: Arizona tea, 800-832-3775. If they want us to take in all that information about them, they should print it on the can instead of broadcasting it over a phone call. But their web site's worse, playing annoying and unstoppable music while the overdone graphics are loading!]

Wrong notes, and recovering from finger fumbles

  1. Some of the choices get cut off and make no sense.
  2. If I make a mistake entering an account number or phone number, there is no way to fix the mistake.
  3. "That selection is invalid" sounds like dork-computer-geek-speek. Who uses the word "invalid" (in-VALID, not IN-valid) in human conversation? At least use something like "I didn't recognize your entry" or whatever.
  4. Bureaucrat-speak: "A representative will be with you shortly. Please stay on the line and your call will be answered by the next available representative." Lots of problems there. My call was already "answered"...by automation. Why say "shortly", an obfuscating word, when they mean "soon"? And "representative" is an unnecessary five-syllable word. Why tell the caller that she's stuck in a bureaucratic waiting line, when they could say something polite and cheery like: "Please hold on, and someone will speak with you soon"? Short and sweet.
  5. Getting deep into a decision tree that is already too deep or too wide, I make one mistake...and the only exit it offers is to start again all the way at the top of the tree.
  6. Going through a list of addresses in speech recognition, to skip to the next location you have to say the whole phrase "Next Location". If you say only "Next!", it mistakes it as "Repeat" and plays the same one again. [Example: 866-AW-ROOTBEER again, go to the store locator, and put in a zip code.] Just let me say "Stop!" to halt it, and "Next!" to move on.
  7. Systems that are supposedly for the general public, but that use words a non-native speaker of English wouldn't know.
  8. Prompts that use the verb "access" when "use" would do. Well, at least it's not as bad as marketing fluff that uses "impacts" as a verb.
  9. "Sorry, there are no store locations near the zip code entered." Helpful, but not quite. I can't be absolutely sure it heard my zip code correctly the first time, so I have to do it again to be sure. (Another caller might just give up here and assume there really aren't any....) "Sorry, I couldn't find a store near the zip code 2345" is better. The reason it couldn't find my zip, obviously, is that I keyed the first digit too early (interrupting the prompt too soon) and it missed part of the number. So, I know it's not necessarily a problem with my zip. In a not-found situation it's always nice to know what value it tried to use on the database lookup.

Rhythm

  1. Menu choices run together like they don't want me to get it. And, it doesn't give me time to digest what I just heard, or press anything, before it gives me the next one. When is it my turn to respond with an action? [An especially bad example of this: Chicago Tribune, 800-874-2863. Another bad one is CompUSA, 800-266-7872.]
  2. In some menus, all the choices have the same singsongy inflection, "To duh-da-da-DUUUH, press duh", and they sound boring. It's like the company didn't care it's talking to people.
  3. The system reads me a phone number too fast, too slowly, or with all the numbers run together. It should group them 3, 3, and 4 with short pauses, the same way people speak phone numbers. [Example: the otherwise pretty good KMart store locator system: 866-562-7848]
  4. Text-to-speech is notoriously difficult to get exactly right...but it shouldn't sound like it's on drugs or something. Some sound too slow or too fast. [Example of a bizarrely slow one: the Toys-R-Us store locator, 800-869-7787]
  5. A list of three things, with rising inflection implying that there will be a fourth choice announced next, but then only silence. Did the company remove something? Do they have a designer on top of his/her job?
  6. Spoken phrases that are twice as long as they need to be.

Timing

  1. Using a handset with number pad, after pressing a choice, not being given enough time to get it back up to my ear before it's speaking.
  2. The phone system wastes my time and makes me do things their way. My own work has me on a tight schedule and I can't wait 10 minutes to be sent the wrong place.
  3. I entered my 10-digit phone number and then it just sat there silently. If they wanted me to press pound or something at the end, it should have said so.
  4. "This call may be monitored or recorded for quality or training purposes." Are they really going to record my number choices and everything, or is this just a standard bunch of legalese blather to cover themselves? If they are only going to record the part where I speak with a representative, it should say so right before transferring, instead of way up here at the beginning of the call.
  5. I pressed a number and nothing happened for at least five seconds. Did I kill it?
  6. In a system I call several times a week, shouldn't it let me key ahead when I already know that the next question is going to be about entering my ID number?
  7. "Thank you for calling blah-blah-blah. If your call is regarding our rewards program, please press 1. Otherwise, please remain on the line." And then there's a full 10 seconds of dead air before it continues. Why didn't it just ask me to press 2, which is something proactive and pseudo-useful to do, instead of making me wait through 10 seconds of nothing? [Example: Coke, 800-201-2653]

Technique

  1. The company doesn't answer the phone at all; it just rings forever. Are they out of business? Couldn't afford a $15 answering machine? [IVR outage time is bad, now that consumers expect a 24-hour service line!]
  2. Getting transferred to a busy signal. If there are dead choices in the menu, take them out of the menu! [Example: Spirit Airlines, 800-772-7117, where options 1, 2, and 4 all go to a busy signal. Since the only option that works is 3, why doesn't it send the caller straight into that one?]
  3. The call gets cut off suddenly while you are on hold or in a menu, and you have to start over.
  4. If I hit the same numbered choice twice by mistake it sends me somewhere wrong, and I have to start over.
  5. When hitting a choice that isn't announced on the menu, it just repeats its whole greeting as if nothing happened. [Example: Teleflora, 800-835-3356. Pick anything other than 1 or 2, and it keeps doing it as long as you're willing to play with it. "Thank you for calling Teleflora-dot-com...." Sit there pressing nothing, and it repeats indefinitely. I gave up after it repeated itself ten times. What would they do if some caller accidentally doesn't hang up for hours? Pay a big phone bill?]
  6. An automated call placed to me says, "Hello, yadda yadda yadda; to hear my message now, please press 1. If it's not a convenient time to hear it, press 2." Already annoyed at the interruption of my work or my dinner, I humor it and press 1 to hear the thing...and it hangs up!
  7. I pressed a number not on the menu, and it suddenly ended the call!
  8. Get one little recorded speech from the company, decently done, but then an undignified sudden hangup. I have to call back to hear something else? [Example: UPromise, 888-434-9111, option 1. Go visit our web site, yadda yadda yadda, click. Worse, it says members must have internet access and a valid e-mail address. What about old-fashioned folks who like to conduct business by regular mail?]
  9. Telling me to give my Social Security Number, which is bad, but it's even more insecure when I'm calling from a cell phone. I had a live agent last week ask me for my name, phone number, and SSN...and then she said she'd e-mail all this to the right person and have them call me back. Any idea how terrible a security breach that is?

Preparation

  1. Voices that sound like they're reading their lines, at first sight, with no sense of meaning or pacing. It's worse than the badly-directed voice actors on local radio commercials. Couldn't the company afford $200 for a professional voice, or a director/producer/designer, to do their phone phrases well? I guess we know how important phone-business support is...er, isn't...to the company.
  2. A menu choice that has four or five different things all grouped into the same number to press. Come on, organize your material better, to ask intelligent questions! [Example: Eureka, 800-438-7352]
  3. You enter your account number, then wait forever on hold, and when you finally get to a live person they ask you for the account number again.
  4. If I ever hit the wrong thing there is no way to get back out of Phone Menu Hell, but to hang up and start over. Planning the thing in the first place, they should give multiple ways to get to the information...or at least a "none of these" recovery route, to go back up a level or two.
  5. If the call will need me to get my membership card, it should tell me that up front.
  6. Their list is set up with choices 1, 2, 5, which is silly. When it told me what the third choice was, I hit 3 during the announcement, before it said "press 5", and then it didn't work.
  7. Voice actors stumbling over the prompts themselves, and the designer/programmer leaving it in there instead of demanding a retake. The thing is going to be heard by tens of thousands of customers. Spend the extra five minutes and get it right! [Example: Arizona tea, 800-832-3775...whose opening monologue is 47 seconds long, with the selection numbers shuffled into it! The actor stumbles in selection 4, and again going into 5. "To report a product quality question issue, please press 4." What's he sight-reading from his piece of paper, something like "question/issue"? If so, whoever wrote it didn't have an ear for intelligibility.]

Presentation

  1. Voices that don't match, on different choices in the same menu.
  2. When you get through to a person, they sound aggressive.
  3. The phone system sounds like a mess, and it gives the impression the company doesn't care about my business or their image.
  4. Loud background noise during their recorded announcements. Get the recordings done by a pro in a studio, not by Sharon in Cube 310 who had fifteen minutes available some afternoon to do them on a phone.
  5. Recordings with a lisp, or noise distortions.
  6. Saying "please press" at every number, where the repeated "please" gets annoying. It's false politeness. We're already dealing with a machine instead of a human. The "please" just sounds formulaic instead of sincere.
  7. Spanish menu prompts spoken phonetically by an obviously non-Spanish speaker, at the point where I'm supposed to choose a language to continue. It makes it sound as if they don't sincerely want Spanish-speaking business. [Example: Target, 800-591-3869, then 2 for local store. It transfers to a local store lookup, at which point the Gringa Spanish comes up. The main greeting sequence at Toyota has the same problem: 800-331-4331. OK, it's still better than Nat King Cole's phonetically-sung recordings in Spanish or German....]

Attention

  1. Systems that obviously weren't tested on any real caller base, to observe if the callers are actually able to pay attention to the speeches.
  2. The first three messages I hear are: "Thank you for calling X. Please pay attention, as our menus have changed. This call may be monitored or recorded for quality assurance purposes." Well, they've already lost my attention, and I don't care about their menu changes. It's none of my business. And, "Pay attention!" treats me like a child. All this up-front garbage is as annoying as web sites that take 30 seconds to load before they show you anything.
  3. Spoken disclaimers that the caller has to agree to, pressing some keystroke of acknowledgment, before the call can continue. This is awful. At anything more than about ten words, some callers will already stop paying attention to details...so what's the point? To be legally covered, does the system really have to put out a monologue that the callers supposedly listen to and understand? Or, is it enough just to make the Privacy Policy (or the Fair Use Policy, or whatever) available as an option on the phone, for the callers who choose to listen through it? The rest of us are just waiting it out, la-di-da-di-da!, until it tells us what button to press next.
  4. They stuck some emergency recorded message into the first part of their call, and forgot to take it out. It's more than a couple of days out of date. And, it immediately loses the attention of every caller who doesn't care about that emergency issue. Doesn't the company care?
  5. If I call from my car, I can't press a bunch of buttons safely. I can't keep looking at the keypad either, because my attention has to stay on safe driving. The only reasonably safe thing to do, instead of hanging up, is to get to a live human with the fewest possible buttons pressed. Like 0. But 0 doesn't go there.

Tuning

  1. "This call may be monitored for quality purposes..." If they want to monitor something useful, let it be the amount of time and the sequence of caller keypunching. Monitor the time wasted before getting any real information, and before transferring to any helpful person. Analyze the menu structure and fix the confusing parts, for quality purposes. Analyze the calls where the caller had to enter 8 or 10 menu choices before getting anywhere useful. If it's a voice-recognition system, monitor it for the caller's agitated tone of voice on repeated answers that weren't understood. [In the dozens of IVR systems I've built, I always insist on logging and analyzing plenty of information on the timing and sequence. How else can one know what the callers really do with it? And then fix problems?]
  2. "For blah blah blah, press or say 2", and then you say "2" right away and it ignores you. It just keeps talking. Yell "2" three more times and it's still talking. Then, when you finally get to the next menu by fingering 2, it gives you three or four new choices but it doesn't say you're supposed to reference them by numbers. "Listen, then choose one of the options...." If you interrupt that list, it doesn't understand. Then it repeats them with numbers. If I really like pressing numbers for quick service instead of saying anything, why doesn't it give me number-pressing menus after I already pressed some? [Example: Home Depot, 800-793-3768]
Dagwood vs the IVR
Blondie, 7/12/08

Self-absorbed perspective, ignoring the audience

  1. The system tells me to "enter your zip code." So, I do. It didn't tell me that I'm really supposed to enter the zip code of the place where I want service: a bigger city, not my own small five-horse town. I instinctively enter my own zip code when it asks for one. Well...it looks up the zip code I gave it, and tells me (sort of politely) that it can't help me at all, try calling us back in three months, b'bye. In that error case, with a zip code not found in their services, couldn't it at least loop back around and let me try a different zip code during that same call? Do they want their callers to get service, and to recommend it enthusiastically to their friends as useful, or do they just want to get me off the line without helping me?
  2. Systems where the client or designer assumed all the callers will be metropolitan, not rural. Isn't a special strength of phone systems to serve widely scattered customers who can't get to a city in half an hour?
  3. To have to enter someone's birthdate in a format like "030840" is ridiculous. Invest a couple of days programming it to be user-friendly, even if it means entering the date in three questions instead of one. The caller's not a data-entry clerk the company is paying to type six-digit numbers into a field.
  4. The company probably never calls their own system to hear that it's broken.
  5. A message says they'll call me back, but when? Should I disrupt my whole day or take off work, just because they might call me back when I'm waiting by the phone? Is the company interested in serving the customer's schedule, or only their own? [Example: making the customer wait around at home all day, having taken off work to be there, on the promise that the repairman will show up sometime that day and fix the phone or cable service. And then, when he runs behind and can't make it until tomorrow, no apology from the company for the customer's own lost day of productivity!]
  6. Systems that call me and play messages I don't care about, and I can never get off their calling list. I hang up in the first two seconds and they call back an hour later!
  7. Calling a busy take-out restaurant with a live person answering the phones, they answer it with a rapid-fire: "Thank you for calling Spud's, please hold!" And bang, I'm on hold immediately without getting to say even a single word. If they're getting enough business that they can afford to be rude to their customers, answering live, they should certainly be able to afford a simple phone system. It could greet us calmly and queue us up. The machine could say: "Hello, thank you for choosing Spud's. Someone will take your order as soon as possible. Our people are still helping an earlier caller. While you're waiting, would you like to hear our specials? You can press 1 at any time to hear those." (Pause; then some gently pleasant hold music, preferably instrumental.) That, or they should hire enough employees to keep up with the call volume.
  8. Calling any business that has their office hours as choice 1 on the first menu. The web site, where we got the phone number in the first place, already showed us the office hours. We're usually calling because we have a problem with a product or service, and need help the web site can't give directly. We're calling during office hours already, deliberately, in the hope that we'll get to a helpful person. Maybe the office hours should be choice 1 only when the call is coming in within 90 minutes of closing time, when callers are deciding if there's time to drive in yet today before they close? [Or, have such a dynamic choice come before 1 and ask the caller to press * ?]

Transitional cues

  1. "To find our stores near you, press 7", I press 7, and it rings some other system without warning. It just seems abrupt. Is it dropping my call? Sending me to a person for generic questions the computer can't handle? What? It could at least say something in transition, like: "I'll send you to our Store Locator system. If you want to know their direct phone number for your future calls, press 1. Otherwise, I'll just transfer you right now." (One second pause, then transfer.)
  2. The greeting is "Thank you for calling Consumer Services", but it doesn't say what company it is. How do I know I don't have a wrong number, or a phishing scam?
  3. The call starts with no company greeting, but instead with: "At any time you can tell me to repeat something...." When I start hitting keys, and eventually it says it will transfer me to a representative, it transfers me to a fast busy signal and the call is dead. Can't I at least get to a representative to tell them their system is broken? If you say "Operator" at the beginning, it similarly sends you to a dead fast-busy signal. [Example: Hoover Appliances, 800-944-9202]
  4. When I've been sitting on immediate hold at one of those "Hello-please-hold-PLONK!" places, and my turn comes up to get service, their greeting had better not be a gratuitous marketing pitch, like: "Thank-you-for-holding, Can-I-interest-you-in-our-super-jumbo-deluxe- special-today?" I've already had three or four minutes to think about what I want, and hanging up is just about at the top of the list. Let me, the customer, have a turn to say something. And since I was stuck on hold anyway, against my will or consent, why couldn't the "on-hold" recording have already played me an announcement about today's specials they are promoting? That would be more useful than dead air or hideously tinny music.
  5. When a call comes off a hold queue and is being answered by a live person, it's nice to have some little sound in there indicating it's about to be picked up. A fake ringing sound would work fine, or it could be something more clever. While waiting indefinitely on hold, I've put the phone down on a table or I've put the call onto speakerphone. The sound tells me it's time to pay attention to my phone again. I don't want a recorded voice breaking into the music (which I might even be enjoying), just to thank me for continuing to hold, or to lie to me obsequiously that my call is important to them.

Too much information runnin' through my brain; too much information drivin' me insane

  1. Not knowing how long to ignore the long announcements, while waiting for the important part. If some of the announcements really aren't important, delete them and speed up the call!
  2. The system is telling me to call a different phone number, but not giving me any chance to write it down.
  3. Long announcements at the beginning that are totally irrelevant to the reason I'm calling. It's just as bad as the pizza place answering the phone with a scripted advertisement, instead of listening to what I want first. I'm the customer. I usually know what I want before I dial. Don't give me the marketing junk unless I say first I'm not sure what I want.
  4. Important information going too fast, and no way to hear it again.
  5. Not knowing when we're getting near the end of the list of choices.
  6. I can't give total attention to their system, and it's noisy where I call from, with lots of distractions. If I miss hearing some of the choices, it doesn't let me hear them again.
  7. Emergency recordings at the start of the call saying: "If you're calling about yadda yadda yadda", and going on for 45 more seconds of badly-worded instructions about it, but I'm not calling about that! I'm calling about something else. What should I press now to skip ahead through all that irrelevant garbage?
  8. Perhaps there should be a separate set of very short "expert" prompts for frequent callers. Let us switch to them by hitting # or something else special at the beginning of the call.

Those are serious design flaws, some little, some big.

There are also some remarkably good and inspiring systems out there: remarkable for their unremarkability. They work fine and courteously, without creating silly user irritations. But, any little thing can go wrong and the designer needs to know about it.

There are at least 14 typical actions that the caller might take at every available decision point. (0,1,2,3,4,5,6,7,8,9,#,*, hang up, or wait through a timeout -- and that's not counting any of the double-keystroke actions, or the entry of multi-digit fields such as membership numbers or phone numbers.) The system has to have a contextually relevant and intelligent response to every possibility, following up with a prompt and a path that make sense. Good systems furthermore do something different at the third or fourth timeout as opposed to the first several, and they take some other intelligent branch if the caller has had three or four keyed errors on a single question.

It is in everyone's best interest when well-crafted prompts get the caller to cooperate, politely and without fuss. The caller gets some appropriately deliverable customer service, with (one hopes!) not too much wasted time and effort. The company gets a reasonable phone bill plus not-yet-fatally-dissatisfied customers.

Somebody at the company should keep testing the odd paths all the time, and listening to feedback from real-world users of the system. Give the callers a "Suggestion Box" on phone and/or web to leave comments about the phone system. Pay the designer to spend a couple of hours every week calling all the systems, checking the usability for any problems. Remember the movie The Doctor where William Hurt's character has a terrible bedside manner, until he becomes a patient in his own hospital? Well-designed service must show empathy for the users and their frustrations. The phone system ought never make the customers more upset than they already are, when calling in for help.

A corporate IVR/VUI designer is just as important as a corporate webmaster. The company's public image is at stake. Does the company really care about its customers? And keeping its customers?

I like re-prompts that phrase the questions in a slightly different way the second time, and with different inflection. That's how people talk when trying to elicit information from other people. It shows flexibility, and empathy with the other person's point of view (or confusion). The varied tone of voice shows concern, and a desire to help. It shows that the company is interested in communication with human callers.

If an IVR system merely plays the same small set of prompts over and over, on repeats, it shows that the designer or the company valued computer restrictions ahead of customers. Perhaps it was programming laziness, or a cost-cutting rush during development, or an unwillingness to prepare perfect phrases for each context in which they'll be used. Perhaps the builders didn't even bother to think deeply about usability. Perhaps they built or tested the thing visually, more than listening to it with a fresh mind. Clearly, no human-factors expert was brought in to push for improvements before release.

And why force the caller to conform with one and only one rigid path to get to the needed information? Spoken "menus" stink. The world doesn't fit into singly-connected lists of things. The callers can't see what's coming, but can only hear what arrived before they pressed something to move on. Human conversation doesn't go through (or back to!) stiff menus, and neither should automated support. Ideas and thoughts bounce around, with interconnections.

The caller's keystrokes and timeouts tell a story about a human need. The caller dialed in for some presumably legitimate reason: needing help with something. Callers aren't trained, and they're not looking at anything in particular. (They're certainly not looking at the same thing the system developers were looking at, onscreen or on paper!) Callers could be inattentive or in any sort of mood. They can press anything they want, or nothing. They can be patient or in a hurry. They can be multi-tasking. They can be unsure what they really want, or unsure what the system is able to deliver.

The system has to guide all these callers in useful directions, while the callers feel respected and in reasonable control of their own requests. The system must provide service that seems empathetic with that human need. It has to be flexible and resourceful in response...without the advantage of any normal human clues for interpreting intentions or emotions.

I'm a caller. I need service. Now. I dialed for help. The system can't sense how I'm feeling today, or how well I'm paying attention to it, but it has to handle me anyway. Give me multiple reasonable ways to get to my answer. If possible, give me multiple acceptable ways to frame my own request! There's a better chance I'll get served decently instead of hanging up. There's a better chance the company gets to keep my business. I'm highly educated, but I shouldn't have to use that as the customer of a doggone phone system. Prompt me courteously on a 5th-grade reading level, offering logical choices and sensitivity that would score with a kindergartner.

This is why IVR is difficult, time-consuming, and expensive to do well. [Tip of the iceberg: speech recognition is ten times harder yet.] It needs a conscientious and ruthless person to stay on top of all those possible problems of bad design. Everything must be designed and thoroughly tested from the user's point of view. Er, point of hearing. "View" is irrelevant, and menus are invisible fluff. The sound, pacing, and continued accuracy of the system are everything. The thing either makes crystal clear sense as a soundstream, or it needs improvement. Any broken navigation reflects badly on the company's customer service commitments.

And what about those Store-Locator features on phone systems? They're useful, but they could always be improved. Too many of them sound like brochure-ware, reading addresses exactly as they appear on paper or a computer screen. It doesn't work on IVR. If the caller is riding in a vehicle, or worse, driving around looking for the building, the fast reading of an address is informative but not helpful.

Try this experiment. Go to a company's web site, bring up the Store Locator, put in a zip code, and spend two seconds glancing at the screen listing four or five store locations. What are the two pieces of information your eye looks for first? Driving distance (if available) and City name, most likely. The eye jumps directly to the city name on the third line of the printed address. If you don't want to go to that city, you don't care about its street address or phone number. You don't bother reading them. Your eye skips automatically to the city of the next address.

Now, what do you hear on most IVR store locators? It dutifully reads you a recording starting with the street address, maybe also the name of a shopping center, and only later it tells you what city it's in! The ear hasn't heard the all-important city name or distance first to set mental context, and the ear can't skip ahead.

On the phone, the delivery also has to allow enough padding time before and during the address: giving the caller time and mental context to jot down the wanted information. Here is the way I did it for an Illinois government-office locator in 2007:

Read the following presentation aloud as someone writes down the information you're speaking! (The caller is also allowed to interrupt the address at any time by saying "NO" or pressing any key, and the system moves ahead to the next location found.)

OK, I have the office information. When you are ready to write it down, say yes. "YES"
I found more than one office, so I will read them to you one at a time.
After you have heard the information you want, you can simply hang up.

It is in the city of: Belleville. [Pause 1 second]
Family Community Resource Center. [Pause 1 second]
The street address is: 1-2-2-0 Centreville Avenue.
Belleville, Illinois. [Pause 1 second]
The ZIP code is: 6-2-2-2-0. [Pause 1 second]
The office's phone number is: 6-1-8, 2-5-7, 7-4-0-0. [Pause 2 seconds]
Do you want me to repeat that address? "NO"

Here is the next office. [pause 1/2 second]
It is in the city of: East Saint Louis. [Pause 1 second]
Family Community Resource Center. [Pause 1 second]
The street address is: 2-2-5 North 9th Street.
East Saint Louis, Illinois. [Pause 1 second]
The ZIP code is: 6-2-2-0-1. [Pause 1 second]
The office's phone number is: 6-1-8, 5-8-3, 2-3-0-0. [Pause 2 seconds]
Do you want me to repeat that address? "NO"

OK, that's all for now. [pause 1/2 second]
This office search is also available on the internet. Go to w-w-w....

Again, everything in a phone system needs to be designed so it makes usable sense as a sound stream. The callers won't be sitting at desks or looking at anything. They might be distracted by other simultaneous tasks. They might need to write down what they hear.

Poorly-designed systems expect callers to be empathetic with computer restrictions (or design restrictions!), patient and forgiving with bad phrasing, and able to figure out bizarre and unnecessary problems...just to get any service at all. The callers are unwilling and unpaid servants of the bad design. Their own needs have to be squashed into the several rigid things the system is able to deliver, on its own terms, at its own speed. Well, who should drive the requests during the call? The automated system, or the customers needing to be served, getting something of value from the call? Didn't they take the initiative to dial in? Can't they hang up in disgust whenever their needs aren't satisfied, and usually take their business elsewhere?

Good systems let the callers "get in, get out, and get on with their lives" (to paraphrase the slogan of a restaurant chain). The well-designed computer system is the customer's helpful and courteous servant, not the customer's master. The customers' needs must be heard and satisfied, intelligently. Dozens or hundreds of customers, all at once.

I am passionate, maybe even obsessive, about designing these systems well.

End of manifesto, for now....

© 2008 Bradley Lehman