In a Training Loop 🔄

3 11 39

R PRO

juiceb0xc0de

JuiceB0xC0de

AI & ML interests

destroying heuristic determination in 4 dimensions to flood the engines with diversity and a lot of swear words

Recent Activity

reacted to AbstractPhil's post with 👀 1 minute ago

Today, I'll be determining the codebook capacity and utility potential for the larger batteries; Fresnel, Johanna, Grandmaster, Freckles, and Johanna-F variants, which should give a good indication of which models are capable of handling codebooks and which are more errant. The earlier all use SVD while the later do not. The differences are noted per and the behavior divergent. I anticipate the D=16 will be more errant, and the final-state variants of those could very well be much more difficult or costly to inference as their axis bends are likely considerably harder to track. However, I'm confident that enough bounces will give the yield required so I'll set up some high-yield noise barrages to determine how much of them we can in fact extract from Johanna, and then set up similar barrages for images to map the internals of Fresnel and Grandmaster. Grandmaster will be tricky, as it was an experimental Johanna-256 finetuned series meant to map sigma noised image inputs to recreate Fresnel behavioral output. Noised image goes in -> Fresnel-grade replication comes out in high res. This allowed preliminary Dall-E Mini-esque VAE generation and will be explored further for the stereoscopic translation subsystem, to allow image generation in the unique format of diffusion that I was working out. I anticipate this system to be more than capable at making monstrosities, so I won't be posting TOO MANY prelims on this one, but the high-capacity potential of these noise makers are meaningfully powerful. Getting uniform codebooks in-place for these models will allow full transformer mapping downstream instead of just guess working the MSE piecemeal, which the earlier versions and variants were doing. I'm straying from the CLS specifically for this series because CLS creates adjudicated pools of bias orbiting the INCORRECT orbiter some SVAE. The orbital target IS the soft-hand accumulated bias with the sphere-norm, so having a competitor isn't going to be a good option.

upvoted a collection 5 minutes ago

Favorite Models

liked a Space about 1 hour ago

ml-agent-explorers/efficient-optimizer-dashboard

View all activity

Organizations

Okay, I may have been talking out of my ass about my scheduler using less VRAM compared to a FFT. What I did find though: training only ~30% of the model's weights per step consistently beat dense SFT on Hendrycks Math across 3 different seeds.

What makes it interesting isn't just the sparsity — it's that no two consecutive windows share the same active layers. The model never has a stable path from input to output decision. Adjacent layers are rarely both alive at the same time, so the model can't build shortcuts between them. I started developing this to reduce semantic redundancy across layers and stumbled onto something I didn't expect.

Results (0-shot, hendrycks_math exact match):

Dense SFT baseline: 0.0098
DeepChaos seed 1: 0.0142 (+45%)
DeepChaos seed 2: 0.0156 (+59%)
DeepChaos seed 3: 0.0138 (+41%)

Setup: Qwen2.5-3B-Instruct, simplescaling/s1K (1k reasoning traces), 5 epochs, LR 1e-5, optimizer adamw_torch_fused , and cosine scheduler with my lucky pick scheduler on an AMD MI300X 192GB.

The scheduler is still a work in progress but the current version is fully operational. You can check it out at:
https://github.com/JuiceB0xC0de/lucky-pick-scheduler

I would love to hear your experiences with sparsity training!

View all Posts

Articles 4

Article

lucky_pick_scheduler

View all Articles

Collections 1

spaces 1

Trackio

🚀

Track and visualize data sequences with interactive displays

models 10

datasets 2

juiceb0xc0de/chaotic-absurdity

Viewer • Updated Mar 12 • 100 • 20

juiceb0xc0de/bella-tao

Viewer • Updated Feb 26 • 4.37k • 53

R PRO

AI & ML interests

Recent Activity

Organizations

buckets 1

juiceb0xc0de/juicebucket

Posts 6

Articles 4

lucky_pick_scheduler

Collections 1

juiceb0xc0de/bella-bartender-8b-llama3.1

juiceb0xc0de/bella-bartender-3b

juiceb0xc0de/bella-bartender-heretic-1b

juiceb0xc0de/bella-bartender-v2-8b

juiceb0xc0de/bella-bartender-8b-llama3.1

juiceb0xc0de/bella-bartender-3b

juiceb0xc0de/bella-bartender-heretic-1b

juiceb0xc0de/bella-bartender-v2-8b

spaces 1

Trackio

models 10

juiceb0xc0de/bella-bartender-8b-llama3.1

juiceb0xc0de/bella-bartender-3b

juiceb0xc0de/bella-bartender-v2-8b

juiceb0xc0de/bella-bartender-9b-yi

juiceb0xc0de/bella-bartender-heretic-1b

juiceb0xc0de/bella-bartender-1b

juiceb0xc0de/bella-bartender-heretic-3b

juiceb0xc0de/bella-tao-merged-qwen2_5-coder-7b

juiceb0xc0de/bella-bartender-v2-moody-8b

juiceb0xc0de/dread-llama-8b-existential

datasets 2

juiceb0xc0de/chaotic-absurdity

juiceb0xc0de/bella-tao

R PRO

AI & ML interests

Recent Activity

Organizations

buckets 1

juiceb0xc0de/juicebucket

Posts 6

Articles 4

lucky_pick_scheduler

Collections 1

spaces 1

Trackio

models 10 Sort: Recently updated

datasets 2 Sort: Recently updated

models 10

datasets 2