Big Data ≠ Big Information
After a series of AI Winters, Big Data acted as the single biggest accelerator to AI research: when data scientists are unhappy with their model’s performance, the first thing they do is try to get more data to help with model generalization.
The problem is, collecting and processing more isn’t cheap, and it’s time consuming too, and you are left managing large amounts of ROT data. Besides, the sad truth is that you’re not even guaranteed that it will make a significant difference, as not every record will contain information that’s both novel and relevant.
At Alectio, we believe it’s time for a paradigm shift from Big Data to Big Information. And that’s exactly what Data Curation will help you achieve: get better results, with less.
How does Data Curation work?
Active Learning might be mainly a data selection process built to reduce the amount of data used to train a model, but it is actually a process to rank data by order of importance to the model. As such, it can be leveraged as an explainability tool to identify model weaknesses and understand the way information is transferred from raw data into the model.
Diagnostics model issues
The curation experiment report provides a large range of various graphs and metrics which you can leverage to analyze the weaknesses of your model and prevent overfitting.
Training-Time Explainability Framework
Some of the graphs found under the labeling report are there to help you correlate the informational content of the useful records within your data and its impact on model training. This is information that you can use to develop smart data collection strategies, to guide a synthetic data generation process, decide which augmentations to apply on your data or as a more general explainability framework.