Dataset Viewer
The dataset viewer is taking too long to fetch the data. Try to refresh this page.
Server-side error
Error code:   ClientConnectionError

GRAID NuImages Question-Answer Dataset

This dataset is presented in the paper GRAID: Enhancing Spatial Reasoning of VLMs Through High-Fidelity Data Generation. Project page: https://ke7.github.io/graid/

Overview

This dataset was generated using GRAID (Generating Reasoning questions from Analysis of Images via Discriminative artificial intelligence), a framework for creating spatial reasoning datasets from object detection annotations.

GRAID transforms raw object detection data into structured question-answer pairs that test various aspects of object localization, visual reasoning, spatial reasoning, and object relationship comprehension.

Dataset Details

  • Total QA Pairs: 3,293,833
  • Source Dataset: NuImages Dataset
  • Generation Date: 2025-09-19
  • Image Format: Embedded in parquet files (no separate image files)
  • Question Types: 22 different reasoning patterns

Dataset Splits

  • train: 2,652,595 (80.53%)
  • val: 641,238 (19.47%)

Question Type Distribution

  • Is there at least one {object_1} to the left of any {object_2}?: 397,044 (12.05%)
  • Is there at least one {object_1} to the right of any {object_2}?: 397,044 (12.05%)
  • Is there at least one {object_1} that appears closer to the camera than any {object_2}?: 397,044 (12.05%)
  • Is there at least one {object_1} that appears farther from the camera than any {object_2}?: 397,044 (12.05%)
  • Are there more than {target} {object_1}(s) in this image? Respond Yes/No.: 382,378 (11.61%)
  • Are there less than {target} {object_1}(s) in this image? Respond Yes/No.: 382,378 (11.61%)
  • How many {object_1}(s) are there in this image?: 191,189 (5.80%)
  • Are there more {object_1}(s) than {object_2}(s) in this image?: 157,329 (4.78%)
  • What appears the most in this image: {object_1}s, {object_2}s, or {object_3}s?: 98,462 (2.99%)
  • Divide the image into a grid of {N} rows x {M} columns. Number the cells from left to right, then top to bottom, starting with 1. In what cell does the {object_1} appear?: 85,069 (2.58%)
  • Rank the {k} kinds of objects that appear the largest (by pixel area) in the image from largest to smallest. Provide your answer as a comma-separated list of object names only.: 75,780 (2.30%)
  • How many {object_1}(s) are in the image? Choose one: A) {range_a}, B) {range_b}, C) {range_c}, D) Unsure / Not Visible. Respond with the letter only.: 59,105 (1.79%)
  • Divide the image into thirds. In which third does the {object_1} primarily appear? Respond with the letter only: A) left third, B) middle third, C) right third.: 51,563 (1.57%)
  • What kind of object appears the most frequently in the image?: 47,817 (1.45%)
  • If you were to draw a tight box around each object in the image, which type of object would have the biggest box?: 43,291 (1.31%)
  • Rank the {k} kinds of objects that appear the closest to the camera in the image from closest to farthest. Provide your answer as a comma-separated list of object names only.: 35,332 (1.07%)
  • What kind of object appears the least frequently in the image?: 35,313 (1.07%)
  • What is the rightmost object in the image?: 16,381 (0.50%)
  • What is the leftmost object in the image?: 16,278 (0.49%)
  • Is the width of the {object_1} appear to be larger than the height?: 15,958 (0.48%)
  • Does the rightmost object in the image appear to be wider than it is tall?: 6,057 (0.18%)
  • Does the leftmost object in the image appear to be wider than it is tall?: 5,977 (0.18%)

Performance Analysis

Question Processing Efficiency

Question Type is_applicable Avg (ms) apply Avg (ms) Predicate -> QA Hit Rate Empty cases
Divide the image into thirds. In which third does the {object_1} primarily appear? Respond with the letter only: A) left third, B) middle third, C) right third. 0.02 0.36 75.0% 13092
Is the width of the {object_1} appear to be larger than the height? 0.01 0.68 28.5% 37510
Divide the image into a grid of {N} rows x {M} columns. Number the cells from left to right, then top to bottom, starting with 1. In what cell does the {object_1} appear? 0.01 1.57 33.4% 139683
If you were to draw a tight box around each object in the image, which type of object would have the biggest box? 0.01 3.62 74.2% 15026
Rank the {k} kinds of objects that appear the largest (by pixel area) in the image from largest to smallest. Provide your answer as a comma-separated list of object names only. 0.01 3.37 80.2% 18696
What kind of object appears the most frequently in the image? 0.01 0.01 82.0% 10500
What kind of object appears the least frequently in the image? 0.01 0.01 60.6% 23004
Is there at least one {object_1} to the left of any {object_2}? 1.39 10.00 100.0% 0
Is there at least one {object_1} to the right of any {object_2}? 1.09 7.12 100.0% 0
What is the leftmost object in the image? 0.02 0.35 31.0% 36163
What is the rightmost object in the image? 0.01 0.32 31.2% 36060
How many {object_1}(s) are there in this image? 0.01 0.01 100.0% 0
Are there more {object_1}(s) than {object_2}(s) in this image? 0.02 0.01 88.9% 6494
What appears the most in this image: {object_1}s, {object_2}s, or {object_3}s? 0.01 0.01 55.8% 25766
Does the leftmost object in the image appear to be wider than it is tall? 0.01 0.59 11.4% 46464
Does the rightmost object in the image appear to be wider than it is tall? 0.01 0.57 11.6% 46384
Are there more than {target} {object_1}(s) in this image? Respond Yes/No. 0.01 0.02 100.0% 0
Are there less than {target} {object_1}(s) in this image? Respond Yes/No. 0.01 0.01 100.0% 0
How many {object_1}(s) are in the image? Choose one: A) {range_a}, B) {range_b}, C) {range_c}, D) Unsure / Not Visible. Respond with the letter only. 0.01 0.11 57.2% 32316
Is there at least one {object_1} that appears closer to the camera than any {object_2}? 1.01 3845.03 100.0% 0
Is there at least one {object_1} that appears farther from the camera than any {object_2}? 1.44 938.91 100.0% 0
Rank the {k} kinds of objects that appear the closest to the camera in the image from closest to farthest. Provide your answer as a comma-separated list of object names only. 0.03 198.52 37.4% 59144
Notes:
  • is_applicable checks if a question type can be applied to an image
  • apply generates the actual question-answer pairs
  • Predicate -> QA Hit Rate = Percentage of applicable cases that generated at least one QA pair
  • Empty cases = Number of times is_applicable=True but apply returned no QA pairs

Usage

from datasets import load_dataset

# Load the complete dataset
dataset = load_dataset("kd7/graid-nuimages-wd")

# Access individual splits
train_data = dataset["train"]
val_data = dataset["val"]

# Example of accessing a sample
sample = dataset["train"][0]  # or "val"
print(f"Question: {sample['question']}")
print(f"Answer: {sample['answer']}")  
print(f"Question Type: {sample['question_type']}")

# The image is embedded as a PIL Image object
image = sample["image"]
image.show()  # Display the image

Dataset Schema

  • image: PIL Image object (embedded, no separate files)
  • annotations: COCO-style bounding box annotations
  • question: Generated question text
  • answer: Corresponding answer text
  • reasoning: Additional reasoning information (if applicable)
  • question_type: Type of question (e.g., "HowMany", "LeftOf", "Quadrants")
  • source_id: Original image identifier from NuImages Dataset

License

This generated dataset is licensed under Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0), which permits free use for non-commercial purposes including academic research and education.

Commercial Use Policy: Commercial entities (including startups and companies) that wish to use this dataset for commercial purposes must obtain a paid license from MESH. The CC BY-NC license prohibits commercial use without explicit permission.

To request a commercial license, please contact Karim Elmaaroufi.

Original Source Compliance: The original source datasets and their licenses still apply to the underlying images and annotations. You must comply with both the CC BY-NC terms and the source dataset terms:

This dataset is derived from the nuImages dataset. Please refer to the nuImages license terms for usage restrictions.

Citation

If you use this dataset in your research, please cite both the original dataset and the GRAID framework:

@article{graid_paper,
    title={{GRAID}: Enhancing Spatial Reasoning of VLMs Through High-Fidelity Data Generation},
    author={},
    year={2025},
    journal={arXiv preprint arXiv:2510.22118},
    url={https://huggingface.co/papers/2510.22118}
}

@dataset{graid_nuimage,
    title={GRAID NuImages Question-Answer Dataset},
    author={GRAID Framework},
    year={2025},
    note={Generated using GRAID: Generating Reasoning questions from Analysis of Images via Discriminative artificial intelligence}
}

@InProceedings{Caesar_2020_CVPR,
    author = {Caesar, Holger and Bankiti, Varun and Lang, Alex H. and Vora, Sourabh and Liong, Venice Erin and Xu, Qiang and Krishnan, Anush and Pan, Yu and Baldan, Giancarlo and Beijbom, Oscar},
    title = {nuScenes: A Multimodal Dataset for Autonomous Driving},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month = {June},
    year = {2020}
}

Contact

For questions about this dataset or the GRAID framework, please open an issue in the repository.

Downloads last month
72

Collection including kd7/graid-nuimages-wd

Paper for kd7/graid-nuimages-wd