These capabilities are realized using perception and higher-level planning and routing algorithms that take as input data about the vehicle’s environment from a growing collection of rich sensors — camera, LIDAR, and RADAR, which augment traditional telematics and log information (GPS, IMU, CAN bus, etc.) — and external sources such as HD Maps and V2I/V2X platforms.
Successful development and production deployment of ADAS/AV perception and planning/routing algorithms require access to large volumes of real-world data from drives performed by test, validation, and production vehicles.
This data is collected from individual vehicle drives, often amounting to several Terabytes (TB) per vehicle per drive, and depending on the type of vehicle (test, validation, or production), is offloaded from the vehicle either using physical media (HDDs/SSDs or custom logging devices), or sent directly to the destination over cellular networks.
Independent of the mechanism, the collected vehicle data needs to be transferred to a data center or cloud environment where one can store the vast volumes of data in a cost-effective fashion, transform the data as required for analytics and perception model training tasks, enrich the data to create metadata, which is then used by model training and general analytics pipelines to retrieve the data.
The overall problem of managing the collection, transmittal, transformation, tagging, and retrieval of ADAS/AV data is complex because of multiple interrelated reasons — the geo-distributed nature of the source vehicles, the volume of data produced by each vehicle, and the shifting focus on the current most-relevant subset of the data (a function of how well the perception algorithm is performing and where its edge cases lie) — and the need to carry out all of the operations in a scalable, timely, and cost-effective fashion.
Existing solutions that attempt to collect and transmit all vehicle data to a central facility prior to running any data pipelines suffer from long delays (from data collection to data use), large resource needs (for data storage and analytics), and poor team productivity (to retrieve the data of most interest for a specific task).