Dataset

A multimodal hybrid dataset for validating perception systems in rare, safety‑critical, and underrepresented driving scenarios

Dataset

A multimodal hybrid dataset for validating perception systems in rare, safety‑critical, and underrepresented driving scenarios

The justbetter DATA dataset is a multimodal dataset designed to support robust validation and analysis of perception systems in safety‑critical driving scenarios. The dataset has been curated with a strong focus on underrepresented, rare, and safety‑relevant situations, addressing known gaps in conventional real‑world driving data.

Dataset Overview

The dataset consists of three data batches, each collected using three independent test vehicles from the following project partners:

  • AVL Deutschland GmbH (AVL)
  • b-plus technologies GmbH (b-plus)
  • FZI Forschungszentrum Informatik (FZI)

equipped with heterogeneous multimodal sensor setups, including:

  • LiDAR
  • Radar
  • Camera

This setup enables comprehensive multimodal perception research and provides realistic sensor diversity reflective of real‑world deployments.

Focus on Safety‑Critical and Long‑Tail Scenarios

Unlike generic driving datasets, JBD dataset explicitly targets challenging and safety‑critical scenarios, including but not limited to:

  • Unknown or uncommon objects on the road
  • Diverse and adverse weather conditions
  • Situations that are typically underrepresented in large‑scale fleet data

This makes the dataset particularly well‑suited for validation, robustness testing, and failure analysis of perception models.

Hybrid Real & Synthetic Data Design

The dataset follows a hybrid data strategy, combining:

  • Real‑world sensor recordings
  • Carefully generated synthetic data

The synthetic components are not intended to replace real data, but to systematically complement it, enabling controlled variation, improved coverage of rare scenarios, and targeted stress testing. This hybrid approach is especially valuable for model validation and generalization assessment.

Ground Truth and Pseudolabels

To support different stages of development and evaluation, the dataset provides:

  • High‑quality ground truth annotations for selected samples
  • Pseudolabels for other parts of the dataset

This reflects realistic industrial workflows and allows users to study:

  • Model performance under varying label fidelity
  • The impact of pseudolabeling in validation and benchmarking pipelines
  • Strategies for combining ground truth and weak supervision

Intended Use

The jbDATA dataset is designed primarily for:

  • Validation and robustness evaluation
  • Safety‑oriented perception research
  • Analysis of long‑tail and rare scenarios
  • Benchmarking multimodal perception systems
  • Research on hybrid real‑synthetic data strategies

It is not intended as a generic training dataset, but as a high‑value asset for testing, analysis and validation, particularly in safety‑critical contexts.

Key Information about the data batches

AVLB-PLUSFZI
NUMBER of SEQUENCES
LENGTH of SEQUENCES20 sec10 sec20 sec
DATA SIZE
DATA TYPES- Camera images
- LiDAR pointclouds
- Camera images
- LiDAR pointclouds
- Radar pointclouds
- Camera images
- LiDAR pointclouds
- Radar pointclouds
ANNO-TATIONS- 3D Bounding Boxes
(Detection 9Hz,
Tracking 5Hz, LiDAR)
- 2D Bounding Boxes
(6Hz, Front Camera)
none- 3D bounding boxes
(2Hz, LiDAR)
- 3D semantic segmentation
(2Hz, LiDAR)
- 2D scooter bb
(10 Hz, prelabel camera)
WEATHER- Sun
- Rain
- Snow
- Sun
- Rain
- Snow
- Fog
- Sun
- Rain
ROADTYPES- Urban
- Rural
- Highway
- Urban
- Rural
- Highway
- Urban
- Rural
- Highway
TRIGGERS- Criticality detection
- Anomaly Detection
(unusual acceleration
behaviour of Ego Vehicle)
nonenone

Preview of the data batches

AVL Dynamic Ground Truth (DGT) is a highly precise reference measurement system for automated and connected driving. It was developed to operate as an independent, objective environmental reference for sensor evaluation. The modular roof-mounted system is equipped with 3 lidar sensors, 6 cameras, and a dGPS (differential GPS) device for centimeter-precision dynamic positioning.

The dataset was recorded on a variety of scenes, including complex traffic situations, targeted operational design domains. The recorded dataset comprises a total of <frames/km/sequences>, providing 3D LiDAR detections+tracking alongside 2D object detections from the front-facing camera.

 

The b-plus research vehicle “NOVA” was developed as part of the jbDATA project and specifically designed to meet its requirements. It features a multimodal sensor suite comprising cameras, LiDAR, and radar. The sensors are arranged to closely resemble the configuration found in production vehicles. In addition, the setup is complemented by reference sensors.

The dataset was primarily collected in eastern Bavaria and includes urban, rural, and highway scenarios. Recordings were conducted across multiple seasons to capture a wide range of weather conditions, including heavy snow, rain, and fog, as well as varying lighting conditions.

 

CoCar NextGen is a multi purpose research platform for automated and connected driving. It was set up in-house and operates independently from industry manufactures and OEMs. The Audi A6 Avant plug-in hybrid is equipped with 12 state-of-the-art lidar sensors, 3 radars, 9 cameras, a Car2X onboard unit, and a high precision IMU unit with dual antenna GNSS. The modular design facilitates its use in various applications and research fields in new mobility concepts.
The dataset was recorded on a variety of scenes, including urban, cross country and highway driving in various weather conditions. In total, <TBD> frames/km/sequences were recorded of which <TBD> frames were annotated with 3D bounding boxes and 3D semantic segmentation.

 

Download

Please note that the data batches are published under the following license: CC-BY-SA 4.0.

Please refer to the respective partner’s download page for more information and the correct citation of the data batches.

Data batch #01

AVL

data batch #02

b-plus

data batch #03

FZI

Nach oben scrollen