A dataset of videos, used for computer vision tasks, could be the key to success or failure. A clean dataset could lead the way to a great algorithm, model and ultimately system, while no matter how good the model or algorithm is, garbage in —garbage out.
Now, we can go one step further and search for an event.
What’s an event?
An event is viewed as a sequence of frames where an object moves within the field of view. For example, it could be a person crossing the street, or a car overtaking us from the right or left.
Searching for an event requires more than just object detection but combining that ability with temporal processing of a video. An event holds more information than a single object, and the semantic meaning of an event is significantly higher than of an object in a frame. In essence, addressing an event allows us to ask a more profound question about the video, and build more sophisticated and accurate applications based on that information.
Akridata’s Data Explorer is an AI platform that saves hours on visual data curation and lowers overall development costs by reducing annotation spend and eliminating wasted training cycles.
Its latest release supports searching for events in video files through an interactive solution that allows the user to choose a few key frames — the frames that capture the essence of the event, mark the object within them, and search for this combination of key-frames and object across the dataset.
Below are the main steps to complete the process:
Define the Event Length & Stride
An event has a typical duration which should be set, which allows Data Explorer to split the videos into sequences of this length.
For example, crossing the road could be a 10 second long event. The split of a video into 10 sec long sequences will result in the following set: 0–10, 10–20, 20–30 etc.
However, if the event starts at second 4 and ends at second 14, it will be split into two sequences. To avoid that, Data Explorer allows for sequences to overlap. Sequence overlap is set via the stride duration, which controls the waiting time between consecutive time sequences. For example, setting the stride at 3 second, will result in the following overall set of sequences:
0–10, 3–13, 6–16, 9–19, 12–22, 15–25 etc.
This increases the chances of detecting the full event in one of the sequences.
Define the Key Frames & Object
After setting the sequence duration, the user marks the key frames from the query sequence, and marks the object on them. Data Explorer will focus on that object as it finds similar sequences.
The process is illustrated below — first, select a few key frames and then mark the object on them:
Selecting Key Frames from the full sequence — frames 1, 3, 6 are highlighted in green
Marking the Person as the object in the selected Key Frames
Once the search is complete, we can see the results below — 4 video clips of similar events (top left, highlighted in green, is the query event):
Search result of the chosen event — each image represents a video. Top left, highlighted in green, is the query event
This leads the way to a whole new set of video processing capabilities and applications, in different fields such as surveillance, sports and movies.