Data Filtering

YOUR PRE-TRAINING DATA CLEANING ASSISTANT

With Data Curation, you could select records from an existing raw dataset dynamically, during the training process. so that the model could be trained optimally.

With Data Filtering, you don’t even need to start training your model to select data. In fact, you don’t need a model at all. All you need is a use case, and we will tell which data to select and which data to discard, at the time of collection. You can save big on the operational costs of data select and collect or generate the right data right away instead of collecting it all and curating later.

Decide which Data to Keep, on the Edge of a Device


Imagine that you could know if a data record will be useful to your use case at the time when you collect it. That’s exactly what Data Curation will do for you. No need to collect large datasets to end up discarding 90% of it. You can now collect more information-dense data and get better models even if you have constraints in terms of data storage and data transfer costs.

Receive Real-Time Feedback on your Data Collection Strategy


Most companies do not have strategies when collecting data. Autonomous driving companies send their entire fleet of vehicles to drive aimlessly with the hope that they will capture relevant pictures and catch those rare but highly informative records that will boost model performance. Use Alectio’s patented Data Filtering technology to redirect vehicles to places where more information-dense data can be collected. It’s like getting a training dataset on steroids from the get-go!

Clean up your Databases and Organize your Data by Use Case


Data Filters are light-weight models that predict the informational value of data records, one-by-one, for a specific use case. Need to decide which training data to keep for your use case? Run the data through the Data Filter and purge your data store from the irrelevant or harmful data.

You let the SDK orchestrate training experiments

You set up the Alectio SDK on your system and let the SDK orchestrate a series of short training experiments. The system generates a data filter (a small predictive model capable of scoring data as per its value in the context of your model). Data filters are also relatively model-agnostic so you can reuse the filter with other variations of your model.

Use your data filter to get new training data

You download the data filter on your system and use it to filter new training data prior to (re)training your model with it. You can also deploy the filter on the edge of an IoT device to decide in real-time whether a given record needs to be stored/transferred, or if you can delete it.

You can now pre-process any dataset to reduce its size before training, which enables you to train the same model much faster and much more efficiently, but also to flush any useless data out of your storage instances. The filtering process is tuned to your use case and model and ensures that no bias will be induced during the process. Data filters can be used for data management, improved data collection, data cataloging and more!

You let the SDK orchestrate training experiments

You set up the Alectio SDK on your system and let the SDK orchestrate a series of short training experiments. The system generates a data filter (a small predictive model capable of scoring data as per its value in the context of your model). Data filters are also relatively model-agnostic so you can reuse the filter with other variations of your model.

Use your data filter to get new training data

You download the data filter on your system and use it to filter new training data prior to (re)training your model with it. You can also deploy the filter on the edge of an IoT device to decide in real-time whether a given record needs to be stored/transferred, or if you can delete it.

You can now pre-process any dataset to reduce its size before training, which enables you to train the same model much faster and much more efficiently, but also to flush any useless data out of your storage instances. The filtering process is tuned to your use case and model and ensures that no bias will be induced during the process. Data filters can be used for data management, improved data collection, data cataloging and more!

You let the SDK orchestrate training experiments

You set up the Alectio SDK on your system and let the SDK orchestrate a series of short training experiments. The system generates a data filter (a small predictive model capable of scoring data as per its value in the context of your model). Data filters are also relatively model-agnostic so you can reuse the filter with other variations of your model.

Use your data filter to get new training data

You download the data filter on your system and use it to filter new training data prior to (re)training your model with it. You can also deploy the filter on the edge of an IoT device to decide in real-time whether a given record needs to be stored/transferred, or if you can delete it.

You can now pre-process any dataset to reduce its size before training, which enables you to train the same model much faster and much more efficiently, but also to flush any useless data out of your storage instances. The filtering process is tuned to your use case and model and ensures that no bias will be induced during the process. Data filters can be used for data management, improved data collection, data cataloging and more!

At the edge of your model sits how your training data is working or not working for your models. Alectio’s Data Filtering can in real-time listen and learn from your models what data they need to perform at their expected best. This can be especially useful for use cases like autonomous vehicles, where large volumes of data are collected and stored in the cloud and where labeling costs are especially high.

 

Let Your Models Tell You What Data it Needs!

Alectio’s Reactive Data Filtering gives you real-time feedback on the most useful data for your models.

Works with your existing data sets to predict which data is most useful for your models.

Works to determine model performance with a given training data set – even without internet access.

Works without needing to collect every byte of data. We’ll help you save only the most important information–in real-time.