Overview
Cameras are cheap.
Storage is cheap.
Compute is cheap.
Labeling is expensive!
In this blog, we’ll see how Akridata leverages existing models like CLIP and DinoV2 for an improved labeling flow.
With labeling being a bottleneck for model training, using Akridata’s efficient flow, you can prepare training and test data and eventually deploy a model to production faster and cheaper.
Image and Text
OpenAI’s CLIP – Contrastive Language-Image Pre-Training model connects the image world with the language world allowing us to label images based on class names.
This works as a Zero–Shot operation, i.e. requiring no training data, as opposed to Full–shot training where a fully labeled dataset is required to train a model.
Akridata’s Custom Improvements
CLIP’s labeling can be improved by providing a description, rather than a single class name. This also avoids ambiguities in class names, such as in the case of “crane” – a bird, a company or a machine.
Additionally, CLIP can be matched with another large model, DinoV2, where the two models complete each other and lower the risk of misclassification
Results
Below we see the accuracy improvements on publicly available datasets with Akridata’s flow:
Using class description and prompt refinement:
Dataset | Baseline | Akridata |
Caltech101 | 0.826 | 0.866 |
Caltech256 | 0.814 | 0.828 |
Food101 | 0.777 | 0.804 |
Flowers102 | 0.618 | 0.677 |
DTD | 0.435 | 0.460 |
OxfordIIITPets | 0.848 | 0.872 |
Using DinoV2 and score adjustment:
Dataset | Zero-shot CLIP | Akridata |
OxfordIIITPets | 0.85 | 0.90 |
Caltech101 | 0.83 | 0.86 |
CIFAR10 | 0.89 | 0.95 |
Flowers102 | 0.62 | 0.67 |
EuroSAT | 0.32 | 0.46 |
DTD | 0.44 | 0.5 |
Summary
In this blog we showed how Akridata improves on available models to provide a high quality, fast and cheap labeling of data, lowering the development cost and shortening the time for model deployment to production.
Learn more at: akridata.ai