Imagine you’ve just been given a new batch of 10,000 images or hours of video and you need to find only a small portion of relevant images.
How would you go about isolating that subset of the data?
In many cases, we receive a batch of visual data, images or video, with very limited control over the content. Video from vehicles will contain a lot of unnecessary frames, while surveillance cameras will record plenty of empty ones. In other cases, the acquisition process may be used for several different projects, and it is up to us to extract the relevant images.
Unless you have the proper tools, you’ll most likely be spending a ton of labor and time sifting through the images manually.
That’s why we created Data Explorer.
Akridata’s Data Explorer allows us to search for images by simply marking an example or by marking a patch to find similar images.
In a previous blog, we explored the nuScene database. In this post, we will demonstrate the search capabilities over other sets.
Finding the needle(s) in your data haystack
The dataset for this example is a set of surfaces and we will be trying to isolate the image frames that have a horizontal crack in the surface (the needle).
Start by visualizing the dataset, sampling it and looking at a few examples, shown in the image below:
With any luck, we can easily find one or two examples by scrolling through the randomly sampled gallery initially shown. If not, it may require manually adding an example of a desired image to the dataset, or introducing an example of the ‘needle’ in a different way.
In this example, luckily a quick scroll through the images provides the first example. Giving it a ‘thumbs up’ will provide a positive reinforcement to the search algorithm and it will search for images with matching characteristics.
The next step will be to try and find similar images. This can be done by going to the ‘Search’ page, and clicking ‘search’.
The first result set is promising, but still far from ideal. The results show various cracks and similar textures, but do not only show horizontal ones.
How can we refine the results to isolate only those with horizontal cracks?
Refine the search
Searching based on one example is hardly ever enough. The system allows us to refine the base search by marking the ‘thumbs up’ for the examples we want to see and a ‘thumbs down’ for those we want to avoid.
Note that each example has a weight bar that allows different weights per example.
In our case we don’t want to see vertical cracks, and keep horizontal ones:
The result now looks better:
And scrolling further down:
Based on a handful of examples, Data Explorer returned a great set of the desired needle.
Finding a tiny needle in your data haystack
Up until now, we were searching for an entire image in this example. In some cases, we are interested in a small object that is part of a larger image. Data Explorer provides a solution for this case too, patch search, where we can mark a small patch of an image and search just that.
In the following example, we use the VOC database and try to find crosswalks.
Mark an image with ‘thumbs up’, as in the previous case, we now use the ‘Patch Search’ toggle button. This, in turn, allows us to mark a patch in the image:
Searching now will result in a set of images that contain crosswalks:
This result shows us how to find a set of images based on a patch, rather than the whole image.
Get Started with Data Explorer
Data Explorer is a unique tool for data curation. It allows the user to search for specific examples, refine the search and even search based on small patches of the image.
In future posts, more capabilities of Data Explorer will be demonstrated.