*All data is synthetic data based on actual customer data.
Data Classification Challenge 2024
20K+ customer records* 20 minutes 1 Grand Prize
0*All data is synthetic data based on actual customer data.
The winner will be invited to meet with senior Intuit executives and win a $10,000 prize.
Think your company has what it takes?
Q&As
Address any questions to: intuit4startups@intuit.com
Answers will be published in the Q&A section of the website
Companies with a tech-based data classification solution located in the US or Israel. Participants should have a working product with existing customers (whether paying or pilot).
Companies which meet the eligibility requirements and have the highest coverage as well as accuracy and precision rate will be given a priority.
Two weeks before the challenge, participants will be provided with a rich dataset of golden annotations for training with 20K+ synthetic records. The goal is to recognize 20+ types of entities in unstructured data. Each record may include a single annotation, multiple annotations or none. In the detection phase, given an input string, the analyzer/trained model should return a list of detected entities.
On the day of the challenge, each participant will be provided with an unannotated dataset of 20K+ records at a scheduled time. A JSON file with the associated predictions should be submitted within 20 minutes. Upon submission, participants should also provide any 3rd party service used in completing the challenge.
The responses will be measured according to the following methodology: the performance for each entity type will be evaluated separately, in addition to a global, cross-label evaluation. Our evaluation metrics take into account partial matches, meaning an overlap (but not an exact match) between a detected entity and a same-type true entity. For each detected entity, we then measure precision and recall. Finally, we evaluate the overall performance by a weighted calculation of the grouped labels, taking into account their necessity and frequency. Each submission will be given a score according to that methodology. The winner will be the company with the highest score.
The winner will receive a $10,000 prize and will be invited to meet with Intuit senior executives.
For this challenge, no but you can watch a recording or review the presentation at your convenience.
The documents will be full valid JSON and can be read in any JSON parser. The text record in each JSON will be the same as the data from Intuit customers. It will not be linked to other text records.
The scoring system uses the precision and recall metrics of the labeled data. It also uses a weighted average of the different labels. Confidence metrics are not required, and if they are shared, they won't affect the scoring. You can learn more about the scoring process here.
This challenge is about testing and prioritizing solutions' precision and recall. While speed is critical at our scale, it is a secondary to ensuring the classification is correct for this challenge.
The data will only be in text format. The text data will be in different lengths and styles. There will be no images or PDFs in this challenge. Some of the texts might be in JSON format.
Participants will have access to the upload mechanism before the live challenge. The Google drive folder will be available before the challenge for you to test the upload mechanism.
The training and live challenge’s data sets will have a similar amount of records. Both the training data and test data are chosen from the same collection of data sources. Participants should consider potential variations when preparing for the live challenge.
Yes, and we will share the naming conventions prior to the live challenge. Data must be labeled according to Intuit naming conventions to enable the best assessment of solution. You can learn more about the label naming process here.
The data during the live challenge you the data will have the exact same structure as the training data but without detections.
Contact us for more information.
All answers will be published in the Q&A section of the website
© 2024 Intuit Inc. All rights reserved.
Intuit, QuickBooks, QB, TurboTax, Credit Karma, and Mailchimp are registered trademarks of Intuit Inc. Terms and conditions, features, support, pricing, and service options subject to change without notice.
Photographs © 2018 Jeremy Bittermann Photography. By accessing and using this page you agree to the terms and conditions.