Data Curation Solutions for Enhanced AI Performance

Data Curation For Labelling

The Issue

Issue Statement

Sifting through large visual datasets and building good training datasets is cumbersome and time consuming process

What exactly is the issue?

Often image sensors are video streams which inherently means multiple frames per second (30 FPS – 60 FPS).
If you are trying to build a robust training set, you probably want to have a diverse set of examples that is representative of the scenario you want the model to learn.
Naturally, this implies that if you want a frame in a video sequence, the neighboring 30/60 frames will be nearly identical. Picking identical frames may result in having a less diverse dataset and less impact on model performance.
Primitive methods such as downsampling or random sampling may miss out on valuable information and a hope-based approach.

The Akridata Solution

Here’s how Akridata Data Explorer helps to solve this issue.

Exploring the Dataset

Get a holistic view of the data

Let’s take a look at the dataset we have. For the purpose of the illustration, we will refer to the nuScenes dataset.
The images are a random sample of the dataset which reflects different scenes – sunny days, traffic lights, pedestrian crossings, night shots, and rainy days.

Using Our Patch Search Feature

Find more images similar to your images of interest

Let’s suppose you want to use our Patch Search feature to find images that have a traffic light.
You will see that the search results have faithfully captured the neighbouring frames in the video sequence that have the traffic light.

Find more images similar to your images of interest

Applying Coreset Sampling

Capturing diversity of scenes

To capture the diversity of scenes that have the traffic light, we can apply Coreset sampling and reduce the dataset in the feature space.
As shown in the image, we select Coreset in the sampling and the sampling fraction to 0.01 (1%) in the Tunables panel.

Finding The Traffic Lights

Run Patch Search

Get Search results

Trusted By Leaders in Technology

Designed for Data Science teams to accelerate the path to building Production Grade AI models

Data Curation For Labelling

The Issue

Issue Statement

Sifting through large visual datasets and building good training datasets is cumbersome and time consuming process

What exactly is the issue?

The Akridata Solution

Here’s how Akridata Data Explorer helps to solve this issue.

Exploring the Dataset

Get a holistic view of the data

Using Our Patch Search Feature

Find more images similar to your images of interest

Find more images similar to your images of interest

Applying Coreset Sampling

Capturing diversity of scenes

Finding The Traffic Lights

Run Patch Search

Get Search results

Trusted By Leaders in Technology

Designed for Data Science teams to accelerate the path to building Production Grade AI models

Products

Solutions

Resources

COMPANY

contact