Let’s grab a coffee
To explore entity collection and use, let’s take the theoretical case of a chat assistant for a big, nameless Seattle-based coffee chain. Users tend to enter the app with a pretty narrow goal, or two intents: order a coffee and check out. Simple. But getting them from first touch to mocha latte is less so.
As you probably know, customers are going to buck the script. To paraphrase Churchill, users always do the right thing … but only after exhausting every other option. How frustrating or supportive they perceive your assistant to be as they stumble through depends on:
- How flexibly it responds to their non-formulaic inputs
- How well it manages context switching
- How well it acts upon conditional logic
- How deftly it follows up on incomplete utterances, inquiring about the specific missing information
- And maybe most important, how well it preserves and recalls earlier inputs for later use
Today, our coffee ordering app user could say anything from “black, cream, no sugar” to the much more mysterious “Vente PSL,” “Frapxchino,” or a billion other variants, acronyms, synonyms, or misspellings.
Let’s explore how your assistant can and should handle those.
Perceived intelligence is all in how you capture and store overfilled information
You can sort that user’s utterance into one of three categories: underfilled, filled, or overfilled (for that inquiry). That is, did they give you less than you need, what you need, or too much information for that stage?
- Underfilling: When someone provides too little. New users don’t know how this interaction is supposed to go, and so may give you a partial answer, like, “I’d like a coffee.” Or no answer—like, “Wait, is this a person?”
- Filling: When someone provides just enough. If they’re a repeat user, they probably know what you need and may reply as prompted. E.g. “A large coffee, no cream or sugar.” (The assistant can then elicit more information.)
- Overfilling: When someone provides too much. An experienced user may know the flow so well, they try to skip steps. E.g. “I want a Grande coffee sugar and no Pike, and my name is Braden.” In this example, they gave too much relevant information for the step. The name is only relevant later. This is called a “one-shot”—a completely correct answer—and it’s enough to skip multiple trees.
When a user underfills, the experience hinges on how deftly your assistant fishes for additional information—and just the information needed. If they said “coffee” but not which type, it should ask for the type, but not the entire order again. (Sounds obvious, but not obvious.)
When a user overfills and you respond appropriately, it’s a real delight. The user gives a one-shot, and your assistant says, “Great, please confirm I got all that”—and it did. This type of interaction is rare and requires you to understand those potentially overfilled entities and store them for use later. (You can add those additional, later entities in the first query, but leave them hidden. So if a user knows how this goes and gives you all the entities needed to fulfill multiple intents, the assistant will know what to do with them.) This is extremely difficult and something you really only arrive at through extensive user testing.
The thing I’ve also learned about overfilling is that users actually tend to underfill intentionally because they’re nervous about confusing the assistant. But if you delight them by using overfilled information once, it’s a great experience—in one shot, they skip the flow—and it teaches them that your assistant is not like the others. Yours is a lot more valuable and adaptive than they thought. This opens their eyes, and little surprises like this are the future of conversation design.
Successfully ingesting overfilled information teaches users that your assistant is not like the others.
The challenge? Too many intents and entities
In our Seattle-based coffee chain example, intents are inherently narrow—nobody goes to the app for their horoscope or to place bets. It’s (almost) always about ordering coffee. That means there are relatively few intents and entities involved. But for other assistants where the potential is wider, like a customer support assistant for a big (also Seattle-based) e-commerce giant, with sixty departments ranging from online video to groceries, the list grows longer and the complexity compounds.
If you don’t have a good system in place for that entity collection, things can get out of hand and it’s difficult to clean up later without starting over. To get technical, it’s a factorial equation where two entities means four checks, three entities means nine checks, and nine entities means 81 checks. (Or, (A + B) (A) (B) (0).)
The more complex that gets, the more important it is you have a conversation design tool that has functions built in for entity fulfillment. Or if not that, a really well-architected spreadsheet where you can plot out all the matrices with unique answers for each part of the grid. Otherwise, a flowchart won’t suffice. It will become way too big to manage.
How you collect is how people feel
This is all to say that the secret to delighting overfillers like return power users is being really thorough. And always spending four hours sharpening your ax. Poring over tactual transcripts, having more conversations, and running tabletop user tests.
And then mapping entities to intents, reevaluating your categories, nesting entities within others, and always trying to use the user’s full utterance, even if it’s a one-shot.
That’s how you get someone’s order right.