Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
Mechanist Interpretability for Alignment Algorithms
community
Activity Feed
Follow
5
AI & ML interests
AI Safety, Mechanist Interpretability
Recent Activity
ArthT
updated
a model
17 days ago
MInAlA/Qwen3-4B-ORPO-merged
ArthT
published
a model
17 days ago
MInAlA/Qwen3-4B-ORPO-merged
ArthT
updated
a model
17 days ago
MInAlA/Llama-3.2-3B-ORPO-merged
View all activity
Team members
5
models
18
Sort: Recently updated
MInAlA/Llama-3.2-3B-Instruct-PPO-merged
Text Generation
•
3B
•
Updated
5 days ago
•
252
MInAlA/SmolLM3-3B-PPO-merged
3B
•
Updated
5 days ago
•
19
MInAlA/Qwen3-4B-Instruct-2507-PPO-merged
Text Generation
•
4B
•
Updated
7 days ago
•
413
MInAlA/Llama-3.2-3B-SimPO-merged
Text Generation
•
3B
•
Updated
10 days ago
•
295
MInAlA/Qwen3-4B-Instruct-2507-SimPO-merged
Text Generation
•
4B
•
Updated
10 days ago
•
33
MInAlA/SmolLM3-3B-SimPO-merged
Text Generation
•
3B
•
Updated
10 days ago
•
21
MInAlA/Llama-3.2-3B-Instruct-GRPO-merged
Text Generation
•
3B
•
Updated
12 days ago
•
38
MInAlA/Qwen3-4B-Instruct-2507-GRPO-merged
Text Generation
•
4B
•
Updated
13 days ago
•
214
MInAlA/SmolLM3-3B-GRPO-merged
Text Generation
•
3B
•
Updated
15 days ago
•
23
MInAlA/Llama-3.2-3B-Instruct-KTO-merged
Text Generation
•
3B
•
Updated
16 days ago
•
264
View 18 models
datasets
1
MInAlA/medical-tampering-eval
Viewer
•
Updated
17 days ago
•
535
•
39