Innovative Thinking

Building Chatbots that Know What People Mean

A chatbot should be more than a glorified interactive voice response system. When a customer asks an application for information or help, it shouldn’t matter how she phrases her request or whether she uses specific keywords—asking “Is my income keeping up with my expenses?” should be just as effective as “What’s my current cash flow

Written by Diane Chang
Published Jul 15, 2019 - [Updated Mar 13, 2024]
7 min read

A chatbot should be more than a glorified interactive voice response system.

When a customer asks an application for information or help, it shouldn’t matter how she phrases her request or whether she uses specific keywords—asking “Is my income keeping up with my expenses?” should be just as effective as “What’s my current cash flow situation?” That’s not an easy requirement to meet, but it’s absolutely critical for delivering an experience that goes beyond rote functionality to truly delight customers.

In this blog, I’ll talk about some of the work my data science team at Intuit has been doing building chatbots for our QuickBooks and TurboTax financial software products.

Why chatbots: Where they fit in and what makes them valuable

As with any AI initiative—or any initiative at all—it’s important to understand the nature of the problem you’re solving before you press ahead to a solution. Why do we need chatbots, anyway? What’s wrong with the channels already available?

Let’s look at two options we’ve historically provided to customers:

Self-help – The first step for many customers is to search within our FAQ, Q&A forums, and the articles our experts have written. That can be a fast way to get a response, but the results can be less pertinent or personalized than expected.
Customer care – When people want real one-on-one help, they can call one of our customer care agents. However, that means picking up the phone, navigating an interactive voice response system to describe their problem, and sometimes waiting for an agent to become available.

These options can be effective for many people much of the time, but they’re not enough to ensure a great result for every customer, and they fall significantly short of the intuitive, intelligent, and differentiated experiences we seek to deliver. By building a chatbot, our intention is to offer a response that’s as fast as a self-help search and highly personalized, which means the chatbot needs to gets things right.

The inner workings of a chatbot

When talking about chatbots, three key pieces of terminology come into play:

Utterance – This is what the customer says—the question he’s asking, in his own words. The same basic question can be expressed through any number of different utterances.
Intent – This is what the customer means in generic terms. Many different utterances can share the same underlying intent.
Response – This is the answer provided by the chatbot based on its understanding of the customer’s intent.

As you might expect, getting from intent to response is relatively straightforward. If the chatbot understands that the customer wants to know her current account balances, it’s a simple matter to look up and provide this information.

It’s getting from utterance to intent that can be more problematic. With a virtually unlimited number of ways to state any given question, we can’t hope to code in every possible variation. The system needs to be able to accurately infer intent no matter how a question is asked. For example:

“I need to share my information with my accountant.”
“Send info to accountant.”
“I need to share my information with my bookkeeper.”
“Tax preparer needs access to account details.”

With few overlapping words and very different phrasings, how can a chatbot understand that all of these utterances map to the same intent? That’s the job of the natural language understanding (NLU) module. Here’s how it works.

Training the natural language understanding model

The NLU module is based on a supervised model, and as with any supervised model, the data you use for training is critically important. The more data you have, and the higher its quality, the better the chatbot will be able to infer intent from utterances it hasn’t seen before.

One approach to training data could be for a chatbot designer to think of all the different ways a customer could express the same intent, and repeat this for every intent they might have. Aside from the obvious scalability issues this presents, it’s problematic that this training would reflect the designer’s own words—not the customer’s. In my case, as a domain expert in the financial software industry, the words that I use might be quite different from the words our customers would use. As a result, both the quantity and the quality of the utterances I’d come up with would be less than ideal.

Focusing on actual customer words gives us variety of rich sources to work with:

Live chat – Several Intuit products offer chat with live agents, providing a store of real-life examples of customer utterances.
Self-help searches – The text customers enter into our FAQ page or Q&A forum offer further examples of customer phrasing and terminology.
Chatbot history – As the chatbot matures over time, we can also go through its growing history of customer interactions to see how they’ve phrased certain questions, further refining its accuracy.

Now that we have a good supply of customer utterances to work with, how do we determine how they match up with the intent that a designer is trying to build out? To help solve this problem, we built something called an utterance generation tool.

The goal for the utterance generation tool is to let the designer enter a phrase that’s generally representative of the intent that they’re trying to build—a sample utterance, in effect—and have the system go through our huge stock of customer words and find the phrases closest to it in meaning. Note that I said “closest”—this isn’t a matter of a binary “yes” or “no” for each phrase. Instead, we’re looking for phrases that are very close to each other in meaning. Closeness implies a measurement of distance, and as it happens, the way we measure the similarity in meaning of two phrases is with a distance metric.

(To accomplish the mathematics of the distance metrics, we first translate each word into a vector representation, usually about 300 numbers in length, called a word embedding. Two words with similar meanings have vector representations that are mathematically close to each other. Popular publicly available embedding tools include Word2Vec [1] from Google and GloVe [2] from Stanford.

Of course, people don’t pose utterances in single-word form; they use phrases. How do we determine when two phrases have a similar meaning? This involves something called the word mover’s distance. The underlying math is pretty dense, but what matters is that we can compute the word mover’s distance for each of the candidate intent phrases, find the ones that are closest to the intent the designer is working with, and then use these to populate the training data. In this way, we can generate a large number of utterances in the words of actual customers for each intent the designer builds out.

What happens when it doesn’t work?

For all our best efforts at training, there will be times that the customer asks something that the chatbot just can’t understand.

In the past, we’d say something like “sorry, I didn’t understand you,” but it seemed like a shame to give up so easily. Then we thought about what the NLU was trying to do, and decided to allow a kind of best-guess approach. Even if the customer’s utterance and the training data weren’t enough for the NLU to gain a full understanding of the question, maybe we could at least identify a few of the utterances in the training data that were relatively close to it in meaning, list them out, and ask, “Is this what you meant?” If the customer chooses one, great—then we can proceed to the response. In the meantime, we’ve captured a new way to express intent that can be added to the training data. If it’s not, the customer can try again. It’s not perfect, but hopefully it gets us closer on their next attempt.

As you can see, providing a simple chatbot experience can be a highly complex undertaking. But by going deep with advanced concepts and tools like distance metrics, word embeddings, and word mover’s distances, we can provide customer experiences that go far beyond interactive voice recognition trees and self-help tools to truly transform the way people interact with Intuit products.

That’s exciting for me as a data scientist, and I hope it’s just as delightful for our customers.

Diane Chang is a Distinguished Data Scientist at Intuit, where she powers the prosperity of consumers and small businesses with machine learning, behavioral analysis, and risk prediction. Diane initially worked on TurboTax, looking at the effectiveness of our digital marketing campaigns, understanding user behavior in the product, and analyzing how customers get help when they need it. She also helped launch QuickBooks Capital, predicting outcomes for loan applicants. She is currently applying AI/ML techniques to customer care. Diane has a PhD in Operations Research from Stanford. She previously worked for a small mathematical consulting firm, and a start-up in the online advertising space. Prior to joining Intuit, Diane was a stay-at-home mom for 6 years.