Concept
Over the three-year project duration, just better DATA will develop and prototypically implement a process that provides “better data” for training AI models for autonomous driving in the future. The specified dataset will be implemented by the nine project partners within the framework of four sub-projects (SP):
Concept
Over the three-year project duration, just better DATA will develop and prototypically implement a process that provides “better data” for training AI models for autonomous driving in the future. The specified dataset will be implemented by the nine project partners within the framework of four sub-projects (SP):
Tasks of the 4 sub-projects
SP1
Requirements
In SP1, the requirements for the new system are compiled. It addresses questions regarding the types of data and metadata considered in the project, the characteristic yet diverse scenarios captured, selected, and stored in the vehicle, and the requirements for the sensor set in the subsequent project phases.
SP2
Edge Platform
SP2 develops the core of the system prototypically presented in the project for Smart Data Logging and Processing: the so-called Edge Platform, enabling real-time recording, preprocessing, and selection of data in the vehicle.
SP3
Smart Data Loop
In SP3, the traditional data loop is optimized by applying smart algorithms, covering more diversity in datasets while keeping the data volume manageable.
SP4
Prototype & Verification
SP4 is responsible for the preparation of vehicle equipment for data collection and recording real data selected in specific scenarios to generate a dataset suitable for industrial use by all partners and in line with the project’s objectives.
“Smart Data Logging”- Concept
Challenge and solution approach
Experts estimate that well over 99 percent of the data collected during a typical test drive are irrelevant for the development of autonomous driving functions and, therefore, ideally should not be recorded at all. It is necessary to identify the data that is genuinely needed.
Onboard pre-processing
The centerpiece of the hardware system prototypically developed in jbDATA for Smart Data Logging and Processing is the so-called Edge Platform. On this platform, recorded sensor data is analyzed, pre-processed, pre-sorted, and reduced to expedite the subsequent processing of this data in the cloud or backend. This aims to streamline and cost-effectively simplify data handling for the user.
To optimize the data loop with smart algorithms and high automation, a key focus of automation is a reliable data transfer method from the vehicle to a backend. To reduce the data transmission volume without losing necessary information, smart algorithms are intended to process or enrich data directly in the vehicle during recording. The prerequisite is that such methods operate on the vehicle side at the Edge, i.e., at the transition from the vehicle to the Cloud. This approach ensures that only the data genuinely needed is stored, avoiding the storage of terabytes of useless information.
Intelligent online procedures can detect Corner Cases, unusual events, as well as data gaps during recording. Here, AI-based methods come into play. Based on predefined criteria, they identify relevant scenarios during data recording and ensure that only these are saved. Through a filtering function, elements of a dataset with uncertainties or inconsistencies can be marked.
3 of 6Transmission of requirements
To enrich the missing scenarios using the synthetic data generator, relevant requirements must be communicated to it. In the data processing phase, automated methods can detect Corner Cases or anomalies and determine the relevance of different scenes. This process helps in sending new requirements to the data generation, ensuring a more comprehensive dataset.
For the training of AI-based systems, a balanced dataset is crucial, including rare events such as critical traffic situations and variations among road users, such as gender, skin color, age, physique, etc. Bias present in current datasets leads to a reduction in performance for AI systems. To counteract this bias, real data is supplemented with synthetic data, expanding datasets into hybrid ones. This approach efficiently creates a balanced, characteristic, and fair dataset. Furthermore, cloud-side functions are developed here with synthetically generated and real data, optimizing AI models between cloud and edge in terms of training cycles.
5 of 6The AI model is initially updated in the Cloud and then reloaded into the Edge device. Closing the Smart Data Loop with the model update ensures that the implemented AI functions collecting data for industrial use are continuously improved.
6 of 6