Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
Edit Datasets filters
Main
Tasks
Libraries
Languages
Licenses
Other
Modalities
3D
Audio
Document
Geospatial
Image
Tabular
Text
Time-series
Video
Size (rows)
Reset Size
10M
100M
Format
json
csv
parquet
optimized-parquet
imagefolder
soundfolder
webdataset
text
arrow
Type
Benchmark
Traces
Apply filters
Datasets
6,703
Full-text search
Edit filters
Sort: Trending
Active filters:
10M<n<100M
Clear all
nvidia/OCR-Synthetic-Multilingual-v1
Preview
•
Updated
10 days ago
•
71
•
13
open-index/hacker-news
Updated
2 minutes ago
•
28.1k
•
306
kd13/bookcorpus-clean
Viewer
•
Updated
5 days ago
•
33.6M
•
85
•
6
HuggingFaceM4/FineVisionMax
Viewer
•
Updated
Oct 21, 2025
•
24.2M
•
21.5k
•
27
Helsinki-NLP/opus-100
Viewer
•
Updated
Feb 28, 2024
•
55.1M
•
17.7k
•
231
wikimedia/wikipedia
Viewer
•
Updated
Jan 9, 2024
•
61.6M
•
108k
•
1.2k
TAAC2025/TencentGR-1M
Viewer
•
Updated
20 days ago
•
36.3M
•
10.7k
•
15
paperswithbacktest/Stocks-Daily-Price
Viewer
•
Updated
10 days ago
•
25.1M
•
6.69k
•
52
BAAI/Infinity-Instruct
Viewer
•
Updated
Dec 4, 2025
•
21.9M
•
4.18k
•
711
amphion/Emilia-Dataset
Viewer
•
Updated
Feb 28, 2025
•
54.8M
•
69.8k
•
454
htriedman/grokipedia-v0.1-dump
Viewer
•
Updated
Nov 14, 2025
•
39.5M
•
50.4k
•
11
Vidhaan/LegalCitationWorthiness
Viewer
•
Updated
Oct 23, 2023
•
60.1M
•
12
•
7
wendlerc/RenderedText
Viewer
•
Updated
Oct 23, 2025
•
12M
•
6.69k
•
57
UCSC-VLAA/MedTrinity-25M
Viewer
•
Updated
Oct 11, 2024
•
24.9M
•
1.32k
•
205
danish-foundation-models/danish-dynaword
Viewer
•
Updated
5 days ago
•
11.3M
•
5.29k
•
19
HuggingFaceFW/finepdfs-edu
Viewer
•
Updated
Nov 11, 2025
•
49.5M
•
5.58k
•
87
nvidia/Nemotron-Pretraining-Specialized-v1.1
Viewer
•
Updated
Mar 11
•
19.8M
•
2.7k
•
40
ThetaCursed/danbooru-2026-clean-metadata
Viewer
•
Updated
5 days ago
•
10.1M
•
175
•
2
bookcorpus/bookcorpus
Updated
May 3, 2024
•
8.44k
•
352
codeparrot/github-code-clean
Viewer
•
Updated
Jul 5, 2022
•
11M
•
16.8k
•
137
wanng/wukong100m
Viewer
•
Updated
Dec 11, 2022
•
15.2M
•
174
•
17
reazon-research/reazonspeech
Updated
Nov 10, 2024
•
2.05k
•
112
roneneldan/TinyStoriesInstruct
Viewer
•
Updated
May 18, 2023
•
22M
•
461
•
42
Cainiao-AI/LaDe
Preview
•
Updated
May 7, 2024
•
1.77k
•
27
JetBrains-Research/commit-chronicle
Viewer
•
Updated
Oct 5, 2023
•
10.9M
•
1.71k
•
12
joelniklaus/Multi_Legal_Pile_Commercial
Updated
Oct 18, 2023
•
131
•
9
Timbrt/SciOL-CI
Preview
•
Updated
Apr 17, 2024
•
705
•
4
bkai-foundation-models/BKAINewsCorpus
Viewer
•
Updated
Mar 5, 2024
•
16.8M
•
169
•
14
lmms-lab/GQA
Viewer
•
Updated
Mar 8, 2024
•
24.2M
•
34.6k
•
30
HuggingFaceTB/cosmopedia
Viewer
•
Updated
Aug 12, 2024
•
31.1M
•
14.1k
•
683
Previous
1
2
3
...
100
Next