NFTCID

AI & ML interests

None yet

Recent Activity

reacted to Crownelius's post with 👍 4 days ago

[DAY ONE] PROJECT CROWFEATHER 4/30/2026 ...The day I forgot to attach wandb.ai Just dropped Crowfeather-50m, the first checkpoint in a series, and yeah, no graphs. https://huggingface.co/Crowfeather/Crowfeather-50m 54.5M params. Pretrain only. 17,500 steps banked on FineWeb-edu before Thunder credits ran dry. About 2.3B tokens, no SFT yet. Architecture: Gemma-4 alternating sliding/global attention (1024 window, last layer always global) plus DeepSeek-V4 Muon optimizer plus WSD scheduler plus Gemma-2 logit soft-cap plus PaLM z-loss. Recipe in the model card. What it can do: writes grammatical English. Knows that France has Rhine-adjacent monasteries (it picked Rouen instead of Paris but the vocabulary is in there). Tells stories about Mr. Fabien. What it can't do yet: facts, code, math. Base LM, no SFT, no instruction tuning. The series: Every additional training run becomes another model card here Every model card gets a matching post on this profile Continuation goes to Colab next, picking up from step 17500 out of 100k Limited to one post a day on Hugging Face, so updates will trickle out at that pace. Follow [@Crownelius](https://huggingface.co/Crownelius) and [@Crowfeather](https://huggingface.co/Crowfeather) if you want to watch this thing learn in public. Next drop will either come with the finished pre-train or whatever step I land on before the bank takes my credit card away. Graphs will be available on my NEXT model lol -Shane

reacted to sequelbox's post with 👀 4 days ago

EARLY SNEAK PREVIEW of our first DeepSeek-V4-Pro dataset, Tachibana 4! Tachibana 4 is our upcoming agentic coding dataset: - Questions prioritize real-world, challenging agentic coding tasks across a variety of programming languages and topics. - Areas of focus include back-end and front-end development, systems programming, distributed systems, performance optimization, data structures, databases and data engineering, game and mobile development, security engineering, compiler design, custom tooling, task automation, practical bugfixes, and more! - A wide variety of emphasized languages improves development capability: Python, C, C++, C#, Go, TypeScript, Java, JavaScript, Rust, Haskell, SQL, Shell, R, Ruby, assembly code, and more! - Synthethic prompts utilize a variety of personas, experience levels, and styles of communication to maximize real-world flexibility and usability. Get it now: https://huggingface.co/datasets/sequelbox/Tachibana4-DeepSeek-V4-Pro-PREVIEW These agentic datasets will power the upcoming Esper 4, and whatever you can build! We'll have more finetunes on the way as well! :) we're going to make open source better and better for your work! If you would like to see Esper 4 and these datasets faster, this is the best way you can help us: https://huggingface.co/spaces/sequelbox/SupportOpenSource for freedom, with love, allegra

reacted to ManniX-ITA's post with 🚀 4 days ago

🚀 Two releases this week pushing merge methodology forward. ▶ Qwen3.6-27B-Omnimerge-v4-MLP https://huggingface.co/ManniX-ITA/Qwen3.6-27B-Omnimerge-v4 Same-base DARE-TIES merge of Qwen3.6-27B + 3 fine-tunes (rico03 Claude distill, Esper3.1, kai-os Opus reasoning anchor) via my Omnimerge_v2 method (OBIM-lite + DAREx-q + EMR election). Hit a Qwen3.6-specific fragility: hyperparams that work flawlessly on 3.5 produced 80% unclosed-<think> on 3.6, collapsing pass@1 to ~20%. Per-tensor delta forensics localized the failure to mlp.{gate,up,down}_proj in layers 27–52. Fix: MLP-passthrough surgery — copy MLPs verbatim from base, keep merged attn + linear_attn. Leak → 0%. Q6_K results (vs Qwen3.6 base / vs Omnimerge-v2 on Qwen3.5): • HumanEval: 84.76% (= base, +5.49 pp vs v2) • MBPP corrected: 73.40% (+15.80 pp vs base, ≈ v2) • GPQA Diamond: ~84.75% partial 192/198 (+15.5 pp vs v2) ▶ Qwen3.5-4B Importance-Signal Study (M1..M5) Controlled 5-way comparison: same Qwen3.5-4B base, same 2 fine-tunes (Jackrong Claude-4.5 distill + Crow Opus-4.6 distill), only the importance signal driving DARE-TIES sparsification varies. Q6_K HE / MBPP pass@1: • M1 Vanilla DARE-TIES → 51.22 / 47.00 • M2 OMv2 (no signal) → 52.44 / 49.40 • M3 OMv2 + Fisher → 57.93 🥇 / 48.80 • M4 mergekit ex-LRP (PR #682) → 51.22 / 49.40 • M5 OMv2 + LRP → 53.05 / 51.40 🥇 Findings: Fisher wins HE (+4.88 pp over vanilla), LRP wins MBPP (+2.60 pp). Both signals + Omnimerge_v2 recipe beat vanilla. To make multimodal-LM ex-LRP work end-to-end against Qwen3_5ForConditionalGeneration, I filed 5 patches against arcee-ai/mergekit PR #682 + 1 against rachtibat/lxt. All five Mx checkpoints + Fisher/LRP signal safetensors + reproducer scripts published.

View all activity

Organizations

None yet

liked a Space 2 months ago

DINOv3 Video Tracking

🐠

In-browser video tracking, powered by Transformers.js

liked a Space 3 months ago

SoulX-Singer

🎤

157

Generate singing voice from lyrics or convert voices

liked a dataset 3 months ago

nvidia/PhysicalAI-Robotics-GR00T-Teleop-Sim

Viewer • Updated Dec 17, 2025 • 5.82M • 6.55k • 17

liked 4 Spaces 3 months ago

MioTTS 0.1B Demo

📈

TTS demo for MioTTS-0.1B

Z Image Turbo

🖼

3.1k

Generate custom images from text prompts in seconds

Transformer Training Visualized

🚀

Visualize GPT training with weights, gradients, and attention

ACE-Step v1.5

🎵

531

Music Generation Foundation Model v1.5

liked a model 3 months ago

nvidia/music-flamingo-2601-hf

Audio-Text-to-Text • 8B • Updated 25 days ago • 110k • 99

liked 2 Spaces 3 months ago

Demo Playground

⚡

407

Free platform to access multiple AI models

Qwen3-TTS Demo

🎙

1.91k

Generate speech audio from text with custom or cloned voices

liked a model 3 months ago

LiquidAI/LFM2.5-1.2B-Thinking

Text Generation • 1B • Updated Mar 30 • 37k • 341

liked a Space 3 months ago

ScriptAgent

👁

Generate videos from written scripts

liked a model 3 months ago

Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice

Text-to-Speech • 2B • Updated Jan 29 • 1.75M • 1.45k

liked a dataset 3 months ago

facebook/action100m-preview

Viewer • Updated Jan 29 • 120k • 1.17k • 141

liked a Space 3 months ago

Qwen Image Multiple Angles 3D Camera

🎥

2.41k

Edit image camera angle with interactive 3D controls

liked 2 models 3 months ago

microsoft/VibeVoice-ASR

Automatic Speech Recognition • 9B • Updated Jan 27 • 644k • 1.11k

nvidia/personaplex-7b-v1

Audio-to-Audio • Updated Mar 2 • 497k • 2.48k

liked 3 models about 1 year ago

NFTCID

AI & ML interests

Recent Activity

Organizations

NFTCID's activity

DINOv3 Video Tracking

SoulX-Singer

MioTTS 0.1B Demo

Z Image Turbo

Transformer Training Visualized

ACE-Step v1.5

Demo Playground

Qwen3-TTS Demo

ScriptAgent

Qwen Image Multiple Angles 3D Camera