Dataset
A multimodal hybrid dataset for validating perception systems in rare, safety‑critical, and underrepresented driving scenarios
Dataset
A multimodal hybrid dataset for validating perception systems in rare, safety‑critical, and underrepresented driving scenarios
The justbetter DATA dataset is a multimodal dataset designed to support robust validation and analysis of perception systems in safety‑critical driving scenarios. The dataset has been curated with a strong focus on underrepresented, rare, and safety‑relevant situations, addressing known gaps in conventional real‑world driving data.
Dataset Overview
The dataset consists of three data batches, each collected using three independent test vehicles from the following project partners:
- AVL Deutschland GmbH (AVL)
- b-plus technologies GmbH (b-plus)
- FZI Forschungszentrum Informatik (FZI)
equipped with heterogeneous multimodal sensor setups, including:
- LiDAR
- Radar
- Camera
This setup enables comprehensive multimodal perception research and provides realistic sensor diversity reflective of real‑world deployments.
Focus on Safety‑Critical and Long‑Tail Scenarios
Unlike generic driving datasets, JBD dataset explicitly targets challenging and safety‑critical scenarios, including but not limited to:
- Unknown or uncommon objects on the road
- Diverse and adverse weather conditions
- Situations that are typically underrepresented in large‑scale fleet data
This makes the dataset particularly well‑suited for validation, robustness testing, and failure analysis of perception models.
Hybrid Real & Synthetic Data Design
The dataset follows a hybrid data strategy, combining:
- Real‑world sensor recordings
- Carefully generated synthetic data
The synthetic components are not intended to replace real data, but to systematically complement it, enabling controlled variation, improved coverage of rare scenarios, and targeted stress testing. This hybrid approach is especially valuable for model validation and generalization assessment.
Ground Truth and Pseudolabels
To support different stages of development and evaluation, the dataset provides:
- High‑quality ground truth annotations for selected samples
- Pseudolabels for other parts of the dataset
This reflects realistic industrial workflows and allows users to study:
- Model performance under varying label fidelity
- The impact of pseudolabeling in validation and benchmarking pipelines
- Strategies for combining ground truth and weak supervision
Intended Use
The jbDATA dataset is designed primarily for:
- Validation and robustness evaluation
- Safety‑oriented perception research
- Analysis of long‑tail and rare scenarios
- Benchmarking multimodal perception systems
- Research on hybrid real‑synthetic data strategies
It is not intended as a generic training dataset, but as a high‑value asset for testing, analysis and validation, particularly in safety‑critical contexts.
Key Information about the data batches
| AVL | B-PLUS | FZI | |
| NUMBER of SEQUENCES | |||
| LENGTH of SEQUENCES | 20 sec | 10 sec | 20 sec |
| DATA SIZE | |||
| DATA TYPES | - Camera images - LiDAR pointclouds | - Camera images - LiDAR pointclouds - Radar pointclouds | - Camera images - LiDAR pointclouds - Radar pointclouds |
| ANNO-TATIONS | - 3D Bounding Boxes (Detection 9Hz, Tracking 5Hz, LiDAR) - 2D Bounding Boxes (6Hz, Front Camera) | none | - 3D bounding boxes (2Hz, LiDAR) - 3D semantic segmentation (2Hz, LiDAR) - 2D scooter bb (10 Hz, prelabel camera) |
| WEATHER | - Sun - Rain - Snow | - Sun - Rain - Snow - Fog | - Sun - Rain |
| ROADTYPES | - Urban - Rural - Highway | - Urban - Rural - Highway | - Urban - Rural - Highway |
| TRIGGERS | - Criticality detection - Anomaly Detection (unusual acceleration behaviour of Ego Vehicle) | none | none |
Preview of the data batches
AVL Dynamic Ground Truth (DGT) is a highly precise reference measurement system for automated and connected driving. It was developed to operate as an independent, objective environmental reference for sensor evaluation. The modular roof-mounted system is equipped with 3 lidar sensors, 6 cameras, and a dGPS (differential GPS) device for centimeter-precision dynamic positioning.
The dataset was recorded on a variety of scenes, including complex traffic situations, targeted operational design domains. The recorded dataset comprises a total of <frames/km/sequences>, providing 3D LiDAR detections+tracking alongside 2D object detections from the front-facing camera.
The b-plus research vehicle “NOVA” was developed as part of the jbDATA project and specifically designed to meet its requirements. It features a multimodal sensor suite comprising cameras, LiDAR, and radar. The sensors are arranged to closely resemble the configuration found in production vehicles. In addition, the setup is complemented by reference sensors.
The dataset was primarily collected in eastern Bavaria and includes urban, rural, and highway scenarios. Recordings were conducted across multiple seasons to capture a wide range of weather conditions, including heavy snow, rain, and fog, as well as varying lighting conditions.
CoCar NextGen is a multi purpose research platform for automated and connected driving. It was set up in-house and operates independently from industry manufactures and OEMs. The Audi A6 Avant plug-in hybrid is equipped with 12 state-of-the-art lidar sensors, 3 radars, 9 cameras, a Car2X onboard unit, and a high precision IMU unit with dual antenna GNSS. The modular design facilitates its use in various applications and research fields in new mobility concepts.
The dataset was recorded on a variety of scenes, including urban, cross country and highway driving in various weather conditions. In total, <TBD> frames/km/sequences were recorded of which <TBD> frames were annotated with 3D bounding boxes and 3D semantic segmentation.
Download
Please note that the data batches are published under the following license: CC-BY-SA 4.0.
Please refer to the respective partner’s download page for more information and the correct citation of the data batches.