claude sonyashijin/RTL_verilog_claude_verified_to_simulate Viewer • Updated May 31, 2025 • 316 • 24 • 4 trentmkelly/USTaxCodeBench Viewer • Updated Jun 26, 2025 • 12k • 17 snorkelai/agent-finance-reasoning Viewer • Updated Aug 20, 2025 • 357 • 324 • 65 tencent/ArtifactsBenchmark Viewer • Updated Oct 15, 2025 • 1.83k • 181 • 13
audio-datasets-hindi Audio-Transcript pairs for hindi/hinglish ujs/hinglish Viewer • Updated Jun 29, 2023 • 29k • 82 • 3 asahi417/seamless-align-enA-hiA Viewer • Updated May 30, 2024 • 178k • 80 • 1 TheAIchemist13/gramvaani_preprocessed_hi_train Viewer • Updated Sep 27, 2023 • 37.1k • 171
papers CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph Databases Paper • 2408.03910 • Published Aug 7, 2024 • 18
CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph Databases Paper • 2408.03910 • Published Aug 7, 2024 • 18
datasets to filter CohereLabs/aya_dataset Viewer • Updated Apr 15, 2025 • 206k • 16.1k • 346 jamescalam/ai-arxiv2-semantic-chunks Viewer • Updated Apr 28, 2024 • 210k • 79 • 2
uspto monology/pile-uncopyrighted Viewer • Updated Aug 31, 2023 • 177M • 89.5k • 169 nickypro/minipile-split Viewer • Updated Jul 18, 2024 • 394k • 462
Small models Small models for experimentation google/gemma-2-2b Text Generation • Updated Aug 7, 2024 • 406k • 641 HuggingFaceTB/SmolLM-1.7B Text Generation • 2B • Updated Oct 16, 2024 • 47.3k • 181 h2oai/h2o-danube3-500m-base Text Generation • 0.5B • Updated Jul 18, 2024 • 979 • 33 Qwen/Qwen2-1.5B-Instruct Text Generation • 2B • Updated Jun 6, 2024 • 3.48M • • 162
claude sonyashijin/RTL_verilog_claude_verified_to_simulate Viewer • Updated May 31, 2025 • 316 • 24 • 4 trentmkelly/USTaxCodeBench Viewer • Updated Jun 26, 2025 • 12k • 17 snorkelai/agent-finance-reasoning Viewer • Updated Aug 20, 2025 • 357 • 324 • 65 tencent/ArtifactsBenchmark Viewer • Updated Oct 15, 2025 • 1.83k • 181 • 13
uspto monology/pile-uncopyrighted Viewer • Updated Aug 31, 2023 • 177M • 89.5k • 169 nickypro/minipile-split Viewer • Updated Jul 18, 2024 • 394k • 462
audio-datasets-hindi Audio-Transcript pairs for hindi/hinglish ujs/hinglish Viewer • Updated Jun 29, 2023 • 29k • 82 • 3 asahi417/seamless-align-enA-hiA Viewer • Updated May 30, 2024 • 178k • 80 • 1 TheAIchemist13/gramvaani_preprocessed_hi_train Viewer • Updated Sep 27, 2023 • 37.1k • 171
Small models Small models for experimentation google/gemma-2-2b Text Generation • Updated Aug 7, 2024 • 406k • 641 HuggingFaceTB/SmolLM-1.7B Text Generation • 2B • Updated Oct 16, 2024 • 47.3k • 181 h2oai/h2o-danube3-500m-base Text Generation • 0.5B • Updated Jul 18, 2024 • 979 • 33 Qwen/Qwen2-1.5B-Instruct Text Generation • 2B • Updated Jun 6, 2024 • 3.48M • • 162
papers CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph Databases Paper • 2408.03910 • Published Aug 7, 2024 • 18
CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph Databases Paper • 2408.03910 • Published Aug 7, 2024 • 18
datasets to filter CohereLabs/aya_dataset Viewer • Updated Apr 15, 2025 • 206k • 16.1k • 346 jamescalam/ai-arxiv2-semantic-chunks Viewer • Updated Apr 28, 2024 • 210k • 79 • 2