Quality of the dataset of images or videos, used for computer vision tasks, has a huge role in its success or failure. A clean dataset could lead the way to a great algorithm, model and ultimately system. Lack of clean data during development of the model or algorithm causes what is commonly known as garbage in — garbage out effect.
Given the typically large visual datasets, it is impossible to manually inspect each image, but what if there was an automatic way to validate their quality?
How do we define image quality? Is it a sharp or a high-res image? For some applications the answer is yes, but what if you want to compress it? Then the answer is of course no — you’d want a small image, with fewer details, that results in a smaller compressed file. What about color? Natural images are colorful, but x-rays, ultrasound or IR images are not. Same for black-and-white surveillance images, and a low-cost infrastructure with a limited bandwidth for communication.
While image quality definition varies with the application, we can measure some attributes and filter out irrelevant images based on the relevant attributes per application.
These attributes could be: sharpness, the amount of color, noise levels, and so on.
Automatic Image Quality Assessment
Data Explorer is an AI platform that saves time on visual data curation. It allows anyone to quickly extract a set of images for training or testing a model, evaluate its accuracy after training, and in all develop your application based on a clean dataset.
In a previous blog we saw how Data Explorer supports text-to-image search, based on OpenAI’s Clip mode. In a similar fashion, Data Explorer now supports automatic image quality assessment.
After processing your data, each image is assigned a series of values corresponding to different attributes, as outlined in the list below. Available attributes for filtering:
Below are a few common variations of the same base image that could be filtered out by this flow:
Common variations of the same base image (source: Flowers dataset)
Images can be filtered based on these attributes curating the most suitable list of training or testing images. For example, choosing a combination of Sharp but Dark images is achieved simple with the below filter:
In this blog, we saw how Data Explorer supports automatic image quality filtering. This enables quick filtering of low quality images from the training or test sets, and with a no-code approach, anyone can curate the best set of images for their needs.