We'll keep you in the loop with everything good going on in the Akridata world.

Image Data Set Exploration

Introduction

A dataset of images, used for computer vision tasks, could be the key to success or failure. A clean dataset could lead the way to a great algorithm, model and ultimately system, while no matter how good the model or algorithm is, garbage in – garbage out.

So how do you make sure your algorithm and model are based on strong foundations, i.e. a clean, high quality, dataset?

Data Explorer is a platform that was built to allow us focus on the data, curate it, clean it and make sure we start the development cycles with a great foundation. In a previous blog, we saw how a dataset could be visualized using Data Explorer. Visualization is just the first step, the second being: Exploration.

Let’s get straight to it.

Image Data Set Exploration

1. Sanity Check

The first check could be considered as a sanity check — randomly chose a few images from the data and visually confirm they are as expected.

I’ll continue the example form the previous blog, demonstrating on the Pascal VOC dataset. In the image below, the dataset is visualized on as a 2D map (right), and randomly selected images are displayed (left). If we press the “flashlight” icon, we can see the random images’ locations on the full map.

Left: Pascal VOC dataset visualized + randomly selected images. Highlighted “Flashlight” icon pressed. Right: Random images’ locations highlighted

2. Cluster Review

The second check, could be reviewing the clusters and what they hold. You could start by choosing “Group” (to the left of “Random”), clicking on an image from a cluster will show images from that cluster. Below we check the Red cluster, and see these are all air-related images (planes, birds on the sky, etc).

Note: Processing is done on raw image level — no metadata and no prior knowledge.

“Group” option allows you to view examples from a single cluster. Red cluster is chosen

While a cluster could be very large, we can view at a small number of examples around a single, chosen image. Click on the “knn” option, choose a single dot and view images around — as seen below:

View a few images around a chosen point using the “knn” option

3. Further Cluster Exploration

The first two steps were a sanity check and brought some understanding about each cluster. Further review of a single, or multiple clusters, could be easily done. Below, we see how 3 clusters were chosen and random images from just these 3 are visible:

Left to Right: filter clusters to view and display images from those clusters only

The above interaction allows you remove whole chunks of undesired clusters of images, while keeping only those required for development of the algorithm, model training and overall system.

We could also notice that some clusters are bigger and some are smaller — a clear indication of potential class imbalance (more on that in a future blog).

How else could Data Explorer help with data curation?

A cluster is a group of images that are different from other images. Data Explorer automatically decides on the number of cluster, but in some cases, further exploration might lead to the conclusion that a cluster should be split even further. In the example below, we see how one of the clusters was split into 3, each could be evaluated further.

Top left, following arrows: choose a cluster, zoom in on it and split into 2, 3, or more clusters.

As an example, the NuScenes dataset was explored in similar fashion — read more here.

Summary

What have we achieved thus far?

By seeing the dataset structure we could identify potential cases for cluster imbalance.
Exploring the different clusters allowed us to remove unwanted whole clusters of images that could be irrelevant.

We now have a clearer understanding of the dataset and what it holds. In the next two blogs, we continue to refine the dataset needed for our development work by two mutually completing means: sampling the data and image based search.

Stay updated with Akridata by signing up for our newsletter.

Alexander Berkovich

Alex, a principal AI/ML engineer at Akridata, has worked on vision-based systems for almost 20 years, holding positions such as an R&D manager, team lead, and algorithm developer in a variety of domains, ranging from smart cities, to medical quality inspections, manufacturing and more.

Transforming Medical Device Manufacturing with AI-Driven Automated Inspection for Quality Control

Medical devices must meet the highest quality standards to meet patient safety concerns. This means devices must be effective, defect-free, and minimize patient harm. For...

comments

No Responses

TOP PRODUCTS in SUITe

Vision Copilot

Platform for data science teams to Accelerate Model Accuracy

Learn more

Vision Command

Platform for machine vision teams to unlock efficiency with AI-powered data solutions

Learn more

Image Data Set Exploration

Introduction

Image Data Set Exploration

1. Sanity Check

2. Cluster Review

3. Further Cluster Exploration

How else could Data Explorer help with data curation?

Summary

Stay updated with Akridata by signing up for our newsletter.

Alexander Berkovich

related posts

Transforming Medical Device Manufacturing with AI-Driven Automated Inspection for Quality Control

comments

No Responses

Leave a Reply Cancel reply

TOP PRODUCTS in SUITe

Revolutionize your inspections. Try Vision Copilot now!

Latest Blogs

Manual vs Automated Inspection Quality…

A Beginner’s Guide to Non-Maximum…

Transforming Medical Device Manufacturing with…

Akridata’s interactive Object Detection Model…

Products

Solutions

Resources

COMPANY

contact