This blog is co-authored by Intuit Distinguished Engineer and Architect, Jay Yu, Distinguished Engineer and Director, Kevin McCluskey, and Distinguished Data Scientist, Saikat Mukherjee.
Intuit, maker of TurboTax, Quickbooks and Mint, is on a multi-year transformational journey to become an AI-driven Expert Platform company to help consumers and small business owners with their taxes and finances. We have been working hard to combine Symbolic AI (Knowledge Engineering) and Machine Learning to make all of our products smart and personalized to ensure we can get our customers’ taxes/finances done right with high confidence and minimal effort. In this blog post we will share a technical overview of Tax Knowledge Engine, the key innovation we’ve pioneered to make TurboTax smarter and more personalized for 37M+ consumers.
U.S. income tax is one of the most complicated compliance systems in the world. There are more than 70,000 pages to describe and interpret the tax code, 800+ federal tax forms and additional tax forms from 45 states. It is estimated that Americans spend 8.9 billion hours every year doing their taxes. While TurboTax is best known for its super-friendly question and answer experience decoupled from the underlying tax calculation logic, we found the traditional procedural programming paradigm to codify tax logic was becoming a big barrier to make the TurboTax product experience smarter and more personalized. Below are a few key limitations for the traditional approach:
- Procedural programming – done by programmers who rely on tax domain experts for the logic spec.
- Tops-down sequential execution – have to re-calculate the entire return even with a single input change.
- All inputs required for complete calc – need to collect all information upfront.
- Implicit explainability hidden in code – cost prohibitive to explain calc logic explicitly for each customer.
Thus, we developed the Tax Knowledge Engine: a fundamental paradigm shift in our approach to represent complicated tax compliance calculations and rules at scale via knowledge graphs and connect associated user data together, instead of hard-coding tax logic in procedural programming code.
The Solution: A Knowledge Engine driven by the Tax Knowledge Graph
The Tax Knowledge Engine provides simple-to-use development tools to allow tax domain experts (non-engineers) to specify tax logic at scale in knowledge graphs in a declarative and modular way. It also provides a runtime time engine embedded in the TurboTax Online service or TurboTax desktop app to mesh user data with knowledge graphs to drive a dynamic and personalized experience for tax preparation.
The key innovation of the Tax Knowledge Engine is to capture expert knowledge on tax domain into the following two knowledge graphs.
- Calc Graph: a comprehensive graph with tens of thousands calculation statements represented as interconnected calc modules, with each as a calc function node connecting input and output data nodes together.
- Completeness Graph: a special type of graph to represent complicated decision tree logic to determine applicability of specific tax topics, such as eligibility of earned income credit, based on user data.
These two knowledge graphs not only capture calculation and eligibility rules, but also can be easily queried and reasoned to automatically explain the calculation result and determine what is missing, and detect what is wrong based on each user’s data at any point during the tax return preparation process.
Figure 1 shows how the Tax Knowledge Engine is integrated in the TurboTax UI application to drive dynamic personalized experiences. Based on a Calc and Completeness Graph, at any moment and for any (partial) user data, the Tax Knowledge Engine has the knowledge to be able to tell what’s missing, what’s wrong, and explain back to the tax filer how TurboTax arrived at the final calculation.
In the sections below, we will dive a bit deeper into the calc and completeness knowledge graphs to show how knowledge is represented to drive smart and personalized experiences.
Calculation Logic and Rules as Calc Graph
The Calc Graph consists of the meta-level Generic Calc Patterns – Generating Interactions between Schemata and Texts – curated by engineers and tax experts to model generic calc patterns, and the detailed Bounded Calc Instances developed by tax domain experts in the context of a tax form by applying matching GISTs to a specific instruction.
Figure 2 below shows an example of how calculation logic is represented in the knowledge graph format. Each calculation rule for each line is an operator node connecting inputs and outputs together. Each operator node is instantiated from a generic calc pattern, bounded by current context: concrete input variables and output variables on current line. These operator nodes can be chained together to form the large calc graph.
This knowledge graph approach provides new capabilities for both internal developer benefits and external customer benefits:
- Declarative programming makes it easy for tax domain experts (non-programmers).
- Granular, incremental composition allows decomposition of complicated tax logic.
- Visible calc dependency and data flow drives execution efficiency and testability.
- Built-in, explicit explainability enables the user experience to open the calc black box.
These are extremely important to represent knowledge from the entire U.S. tax domain, at scale: 80,000+ sentences in the federal tax forms and close to 190,000 sentences in the tax instructions.
Automatic Calc Explanation in ExplainWhy
With calc represented as a knowledge graph, we get immediate visibility and insights into the relationship of various calculations via their connected input and output variables. It is straightforward for a machine to explain the outcome of the calculation by analyzing and traversing the graph.
Figure 3 illustrates how calc explanation can be done by traversing the calc graph backward from the output variable (e.g., L20). It also shows an example of the “ExplainWhy” capability in TurboTax by using a natural language based personal experience to provide insights on the underlying calculation logic.
Completeness Graph: Decision Tree to Drive Dynamic Data Collection
A special type of logic often appears in tax compliance is eligibility rules, where a set of conditions are evaluated to decide whether detailed calculations associated with a particular topic is needed or not. These eligibility rules usually include a starting point, a set of conditions and its associated input variables, and a finite number of outputs to provide the answer. We model this class of logic with a Completeness Graph backed by a truth table, which consists of a starting node, a number of output nodes for decisions, input variable nodes and conditional nodes. A Completeness Graph can be used to break down the complexity of such tax logic, and minimize user data entry, to avoid unnecessary calculations based on a tax filer’s individual situation.
Figure 4 shows an example of a Completeness Graph for determining whether an individual is qualified for a tax benefit, according to the following eligibility rule:
- If a person is not a resident of California, he/she is not qualified;
- Otherwise, he/she must be 18 or older to qualify for the benefit.
In this example, if the current user’s data shows they are younger than 18, then TurboTax will not ask for residence info since we can already reach the outcome. However, if their age is 18 or older, TurboTax can use the Completeness Graph to find out what is missing and collect the rest in order to complete the eligibility test and proceed with the calculation.
The Tax Knowledge Engine has helped TurboTax deliver substantial benefits for our customers: to help them get their taxes done in a personalized way with high confidence and minimal effort. In year 1 of the deployment, we saw a 77 percent helpful rating from our customers on the personalized ExplainWhy feature, a 46 percent reduction in customer calls for one of the top call driver topics (W2). The knowledge graph approach also greatly simplified development experience for our tax domain experts, achieving up-to 7X productivity gain for some of the tax logic development.
Together with other AI/ML innovations, Tax Knowledge Engine has played a major role in Intuit Consumer Group’s year over year growth in the highly competitive tax preparation software marketplace.
In this blog, we’ve described the key innovations in applying knowledge engineering to make TurboTax smarter and more personalized. A detailed paper version of this article will be presented at the KDD 2020 International Workshop on Knowledge Graphs on August 24, 2020 at 7 a.m. PDT. The paper also covers how we apply Natural Language Processing and Machine Learning to automate the construction of the Calculation Graph.
The Knowledge Engineering team at Intuit is working hard to generalize the technology behind Tax Knowledge Engine into a Knowledge Engine Platform to make all Intuit products smarter and more personalized, in our quest to fulfill Intuit’s mission to power the prosperity around the world for more than 50 million small business, consumer and self-employed customers. You can also sign up for the Intuit Tech Webinar on August 19, 2020 at 12 p.m. PDT to learn more about how Knowledge Engine becomes the Big Engine that Powers Prosperity and the No-Code Movement.
You can register for the KDD International Workshop on Knowledge Graphs as part of the KDD 2020 conference at https://www.kdd.org/kdd2020/attending/Registration.
Jay Yu: As a Distinguished Engineer and Architect on Intuit’s Enterprise Architecture and Technology Futures team, I am focused on how to elevate data to knowledge, and integrate symbolic AI (Knowledge Engineering / Knowledge Graph) with machine learning to accelerate Intuit’s journey towards an AI-driven Expert Platform company.
Kevin McCluskey: As a Distinguished Engineer and Director on Intuit’s Technology Futures team, I am focused on the Knowledge Engineering platform that allows non-engineers to quickly author, test, and directly deploy knowledge artifacts powering Intuit’s Products.
Saikat Mukherjee: As a Distinguished Data Scientist on Intuit’s Technology Futures team, I am focused on Knowledge Engineering, one of the key components of AI at Intuit. I am specifically interested in the connection of Knowledge Engineering with machine learning, and automating the generation of KE artifacts using ML and natural language processing (NLP).