A dataset of images, used for computer vision tasks, could be the key to success or failure. A clean dataset could lead the way to a great algorithm, model and ultimately system, while no matter how good the model or algorithm is, garbage in – garbage out.
How will you find the most relevant images to the current task from your dataset?
You could annotate everything, and use the metadata, or manually sift through the images and hope to get lucky. When it is a repeated task, or for a large dataset, it will be very expensive and will take a lot of time and effort.
There is a better way!
Image Based Search
Data Explorer is a platform that was built to allow us focus on the data, curate it, clean it and make sure we start the development cycles with a great foundation.
Data Explorer allows you to search the dataset for images that are similar to a certain image. Searching is performed based on the feature vector of each image, as mentioned here, with no metadata or prior knowledge involved.
In this case, we’ll use the SDNET dataset of surfaces. The image below demonstrates the visualization step of the data, as well as ‘right-click’ on an image and marking it with the ‘thumbs up’ icon:
Visualization of the SDNET dataset and marking an example to search
Once one, or more, images were marked, go to the ‘Search’ tab and click the ‘search’ button to find similar images:
Search the dataset for images that are similar to a chosen example
A search can be refined further by adding more examples to look for but also by removing undesired images. In the image below we mark an example with a “thumbs down”:
Improve the search results by removing undesired examplesThis process can be repeated several times, as in the image below:
Refining the search with more desirable and undesirable examples
After starting with a dataset of 25K images, within a few minutes you have found a few examples to continue development on.
More over, using the ‘flashlight’ icon, we can see where the images are coming from within the dataset:
Use ‘flashlight’ to view source of found images
As a next step, you could explore the marked area in the above image further, and search only it for more examples.
Image based search is great to find examples within a large and diverse dataset of images.
There is more!
What if the desired object is small, while the image is large? Will it be represented ‘strongly’ in the feature vector?
Data Explorer has a solution — Patch search!
Mark a region of interest, ROI, or a patch, within the desired image(s) and search based on them, rather than the entire image.
The UI and flow are very similar to the full image search — all you need to is to activate the toggle button at the top and you will see a grid appear on the desired images (see image below). Once the grid is visible, mark the relevant area and click the ‘quick search’ or ‘search’ options.
To illustrate Patch search, we will try find Crosswalks — a very significant area within an image, but a relatively small one. We will use the BDD dataset of dashcam data footage, as seen below —
Applying image based search, the results contain crosswalks but some results are just urban scenes:
Searching for ‘crosswalks’
Activating patch search, marking the crosswalks (note the yellow markings) and searching, results are excellent:
Searching for ‘crosswalks’ using Patch search — excellent results
In this blog we saw how Data Explorer allows us to search based on images or even based on small areas, patches or ROIs, of images.
This allows us to find the most relevant data within a large dataset to continue development. This also ensures our data is clean and contains exactly what is required.
In future blogs, we will see how further improve our dataset, and later, how to use Data Explorer to analyze training results.