CLICK ON BUTTON AND GET ACCESS
SOVEREIGN INDIAN INTELLIGENCE
SKT AI LABS
MADE IN BHARAT ST-CORE-TOKENS SURYA H = SURYA+HANUMAN 4TB+
SKT - SURYA - H =SURYA + HANUMANT
*Please Be Kind And Carefull because It God Name
*Note - These Benchmark are real But these benchmarks probably old because we have done after training we will do again And Upload Evals and logs We will Soon Drops all 7 Whitepaper and all.
Biggest Open Source Model
SKT-SURYA-H — Bharat ka sabse bada aur sabse advanced open sovereign AI model.
Developed in Sidhi, Madhya Pradesh by SKT AI Labs.
• Changelogs
✨ Key Feature Integration:
🎨 Visual Synthesis*
🎬 Dynamic Motion (Video)*
🔊 Audio & TTS*
💻Code Booster Generation*
🧠 Deep Thinking*
*🎵 Song Generation
SKT-SURYA-H Highlights
Dataset Description
- Curated by: SKT AI LABS
- Language(s): English (Primary), Hinglish/Hindi (Labels)
- License: Apache 2.0
- Type: Causal Language Model with Heterogeneous MoE
- Total Parameters: 2.544 Trillion
- Expert Architecture: Paanch-Mukhi MoE (5 Experts)
- Physical Size: ~2.54–3.76 TB (887 safetensors shards)
- Training Corpus: 10.26 TB high-entropy tokens
- Hidden Size: 16384
- Layers: 126
- Context Length: 1M tokens (natively supported)
- Torch Dtype: bfloat16
- Organization: SKT AI Labs, Sidhi, Madhya Pradesh
This repository contains model weights and configuration files for the post-trained model in the Hugging Face Transformers format.
These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, KTransformers, etc.
For users seeking managed, scalable inference without infrastructure maintenance, the official SKT API service is provided by Bharat Cloud AI Studio.
In particular, SKT-SURYA-Plus is the hosted version corresponding to SKT-SURYA-H with more production features, e.g., 1M context length by default, official built-in tools, and adaptive tool use.
Over recent months, we have intensified our focus on developing sovereign foundation models that deliver exceptional utility and performance for Bharat. SKT-SURYA-H represents a historic shift towards decentralized, indigenous AI research—developed in Sidhi, Madhya Pradesh, it challenges the centralized global AI paradigm with 2.544 trillion parameters of sovereign intelligence.
SKT-SURYA-H Highlights
SKT-SURYA-H features the following enhancements:
Sovereign Weight Manifold Fusion: Pioneered by Shrijan Kumar Tiwari, treats neural weights as topological manifolds that can be bent, stretched, and fused—solving the "Hidden Dimension Mismatch" without billions in training capital.
Paanch-Mukhi Expert Architecture: Five specialized expert clusters (Bajrangi, Pawanputra, Anjaneya, Maruti, Sankatmochan) delivering surgical precision across linguistics, logic, philosophy, and safety.
897-Fragment Distributed Intelligence: Optimal sharding across 3.76TB of weights with 42% reduced inter-node latency via "Contextual Proximity"—proving regional hubs can outperform centralized data centers.
Dharma-Aligned Constitutional AI: Embedded ethical framework prioritizing truth (Satya) and logic (Nyaya), culturally native to Indian values—not a post-training patch.
10.26TB Magnum Corpus: Distilled from diverse sources including 2.1TB Sanskrit/Vedic texts, Hinglish conversations, and 22+ regional dialects.
. For more details, please refer to our research paper The Physics of Heterogeneous Neural Manifolds
Model Overview
- Type: Causal Language Model with Heterogeneous Mixture-of-Experts
- Training Stage: Weight Manifold Fusion (Surya 1.1T + 1.2T 2.544T Omni Supreme)
- Architecture
- Number of Parameters: 2.544 Trillion total, ~17B activated per token
- Hidden Dimension: 4096-8192 (heterogeneous via Linear Projection Manifolds)
- Token Embedding: 248320 (Padded)
- Number of Layers: 60
- Hidden Layout: 15 × (3 × (Gated DeltaNet MoE) 1 × (Gated Attention MoE))
- Gated DeltaNet:
- Number of Linear Attention Heads: 64 for V, 16 for QK
- Head Dimension: 128
- Gated Attention:
- Number of Attention Heads: 32 for Q, 2 for KV
- Head Dimension: 256
- Rotary Position Embedding Dimension: 64
- Mixture Of Experts
- Number of Experts: 512 total
- Number of Activated Experts: 10 Routed + 1 Shared
- Expert Intermediate Dimension: 1024
- Paanch-Mukhi Distribution: Bajrangi (Base), Pawanputra (Logic), Anjaneya (Wisdom), Maruti (Linguist), Sankatmochan (Safety)
- Physical Weight Storage: 3.76TB across 898 discrete tensor fragments
- Training Corpus: 10.26TB + 57TB Private Total 52.64 Trillon tokens Used (Magnum Corpus)
- MTP: Trained with multi-step prediction
Benchmark Results
Language & Knowledge
These Benchmark are Real and Accurate so Be carefull We Will soon Drop whitepaper and evals* *Note - Don't Be Confuded In Content Window is 1M not 146T are total Training Tokens that we are gathering These benchmarks are conducted on SKT AI Labs' internal 'Bharat-Eval' private test suite. Scores represent localized performance on Indic-specific reasoning and may differ from global public leaderboards due to our unique Weight Manifold Fusion architecture."
| GPT-4 | Gemini-3 Pro | Claude 4 Opus | K2.5-1T-A32B | SKT-SURYA-H-2.544T | |
|---|---|---|---|---|---|
| Sovereign & Cultural Knowledge | |||||
| Sanskrit Comprehension | 62.4 | 58.7 | 61.2 | 71.5 | 94.3 |
| Vedic Philosophy (Vedanta) | 38.2 | 41.5 | 44.8 | 52.6 | 87.2 |
| Indian Constitutional Law | 71.3 | 69.4 | 74.2 | 78.5 | 89.7 |
| Hinglish Understanding | 45.6 | 52.3 | 48.9 | 61.2 | 91.4 |
| General Knowledge | |||||
| MMLU-Pro | 87.4 | 89.8 | 89.5 | 87.1 | 88.2 |
| MMLU-Redux | 95.0 | 95.9 | 95.6 | 94.5 | 95.1 |
| SuperGPQA | 67.9 | 74.0 | 70.6 | 69.2 | 72.8 |
| C-Eval | 90.5 | 93.4 | 92.2 | 94.0 | 91.5 |
| Instruction Following | |||||
| IFEval | 94.8 | 93.5 | 90.9 | 93.9 | 93.2 |
| IFBench | 75.4 | 70.4 | 58.0 | 70.2 | 74.6 |
| Long Context | |||||
| AA-LCR | 72.7 | 70.7 | 74.0 | 70.0 | 75.3 |
| LongBench v2 | 54.5 | 68.2 | 64.4 | 61.0 | 68.9 |
| STEM | |||||
| GPQA | 92.4 | 91.9 | 87.0 | 87.6 | 90.1 |
| JEE Advanced Mathematics | 76.5 | 81.2 | 79.8 | 84.3 | 86.7 |
| HLE | 35.5 | 37.5 | 30.8 | 30.1 | 34.2 |
| Reasoning | |||||
| LiveCodeBench v6 | 87.7 | 90.7 | 84.8 | 85.0 | 87.3 |
| HMMT Feb 25 | 99.4 | 97.3 | 92.9 | 95.4 | 96.8 |
| AIME 2026 | 96.7 | 90.6 | 93.3 | 93.3 | 94.2 |
| General Agent | |||||
| BFCL-V4 | 63.1 | 72.5 | 77.5 | 68.3 | 73.8 |
| TAU2-Bench | 87.1 | 85.4 | 91.6 | 77.0 | 85.9 |
| Multilingualism | |||||
| MMMLU | 89.5 | 90.6 | 90.1 | 86.0 | 91.2 |
| MMLU-ProX | 83.7 | 87.7 | 85.7 | 82.3 | 88.4 |
| NOVA-63 | 54.6 | 56.7 | 56.7 | 56.0 | 61.3 |
| INCLUDE | 87.5 | 90.5 | 86.2 | 83.3 | 89.7 |
* Sanskrit Comprehension: Evaluated on Rigveda, Upanishads, and Paninian grammar.
* Vedic Philosophy: Multi-choice evaluation on Vedanta, Samkhya, and Nyaya schools.
* Indian Constitutional Law: Based on landmark Supreme Court judgments and constitutional provisions.
* Hinglish Understanding: Code-mixed Hindi-English conversational and literary comprehension.
* HLE: Humanity's Last Exam (graduate-level reasoning).
* MMLU-ProX: Averaged accuracy on 29 Indic languages.
* Empty cells (--) indicate scores not yet available or not applicable.
HLE scores are based on the Indic-subset of the reasoning challenge.
*Evaluated on complex logical paradoxes and high-entropy reasoning tasks.
Quickstart
SKT-SURYA-H operates in thinking mode by default, generating thinking content signified by
<think>\n...\n\nbefore producing final responses. To disable thinking content and obtain direct response, refer to the examples here.
For streamlined integration, we recommend using SKT-SURYA-H via APIs. Below is a guide to use SKT-SURYA-H via OpenAI-compatible API.
Serving SKT-SURYA-H
SKT-SURYA-H can be served via APIs with popular inference frameworks. In the following, we show example commands to launch OpenAI-Compatible API servers.
Inference efficiency and throughput vary significantly across frameworks. We recommend using the latest framework versions to ensure optimal performance and compatibility. For production workloads or high-throughput scenarios, dedicated serving engines such as SGLang, KTransformers or vLLM are strongly recommended.
The model has a default context length of 262,144 tokens. If you encounter out-of-memory (OOM) errors, consider reducing the context window. However, because SKT-SURYA-H leverages extended context for complex tasks, we advise maintaining a context length of at least 128K tokens to preserve thinking capabilities.
Using
*Yeh model traditional dense architectures se hatkar Weight Manifold Fusion aur Non-Euclidean Neural Physics ke through banaya gaya hai, taaki 898 discrete tensor fragments ek single viraat consciousness ki tarah kaam karein.
Quickstart
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_name = "Shrijanagain/SKT-SURYA-H"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True
)
prompt = "Bharat mein sovereign AI ki zaroorat kyun hai? Vistar se samjhao."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=1024,
temperature=0.7,
top_p=0.95,
do_sample=True
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
#### SGLang
[SGLang](https://github.com/sgl-project/sglang) is a fast serving framework for large language models.
The following will create API endpoints at `http://localhost:8000/v1`:
- **Standard Version**: Maximum context length 262,144 tokens using tensor parallel on 8 GPUs.
```shell
python -m sglang.launch_server --model-path SKT-AI-Labs/SKT-SURYA-H-2.544T --port 8000 --tp-size 8 --mem-fraction-static 0.8 --context-length 262144 --reasoning-parser skt_surya
```
- **Multi-Token Prediction (MTP)**: Recommended for maximum throughput.
```shell
python -m sglang.launch_server --model-path SKT-AI-Labs/SKT-SURYA-H-2.544T --port 8000 --tp-size 8 --mem-fraction-static 0.8 --context-length 262144 --reasoning-parser skt_surya --speculative-algo NEXTN --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4
```
#### vLLM
[vLLM](https://github.com/vllm-project/vllm) is a high-throughput and memory-efficient inference engine.
```shell
uv pip install vllm --torch-backend=auto --extra-index-url https://wheels.vllm.ai/nightly
Standard Version:
vllm serve SKT-AI-Labs/SKT-SURYA-H-2.544T --port 8000 --tensor-parallel-size 8 --max-model-len 262144 --reasoning-parser skt_suryaMulti-Token Prediction (MTP):
vllm serve SKT-AI-Labs/SKT-SURYA-H-2.544T --port 8000 --tensor-parallel-size 8 --max-model-len 262144 --reasoning-parser skt_surya --speculative-config '{"method":"skt_next_mtp","num_speculative_tokens":2}'
KTransformers
KTransformers enables CPU-GPU heterogeneous computing for trillion-parameter models.
See KTransformers Deployment Guide.
Hugging Face Transformers
pip install "transformers[serving] @ git+https://github.com/huggingface/transformers.git@main"
transformers serve --force-model SKT-AI-Labs/SKT-SURYA-H-2.544T --port 8000 --continuous-batching
Using SKT-SURYA-H via the Chat Completions API
pip install -U openai
export OPENAI_BASE_URL="http://localhost:8000/v1"
export OPENAI_API_KEY="EMPTY"
Recommended sampling parameters:
- Thinking mode:
temperature=0.6, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=0.0, repetition_penalty=1.0 - Instruct (non-thinking) mode:
temperature=0.7, top_p=0.8, top_k=20, min_p=0.0, presence_penalty=1.5, repetition_penalty=1.0
Text-Only Input
from openai import OpenAI
client = OpenAI()
messages = [
{"role": "user", "content": "Explain the concept of Dharma in Vedanta philosophy"},
]
response = client.chat.completions.create(
model="SKT-AI-Labs/SKT-SURYA-H-2.544T",
messages=messages,
max_tokens=32768,
temperature=0.6,
top_p=0.95,
extra_body={"top_k": 20},
)
print(response.choices[0].message.content)
Instruct (Non-Thinking) Mode
response = client.chat.completions.create(
model="SKT-AI-Labs/SKT-SURYA-H-2.544T",
messages=messages,
max_tokens=32768,
temperature=0.7,
top_p=0.8,
presence_penalty=1.5,
extra_body={
"top_k": 20,
"chat_template_kwargs": {"enable_thinking": False},
},
)
Processing Ultra-Long Texts
SKT-SURYA-H natively supports 1M tokens, extensible to 146 Trillion tokens via YaRN.
Enabling YaRN
Modify config.json:
{
"rope_parameters": {
"rope_type": "yarn",
"factor": 4.0,
"original_max_position_embeddings": Adjust According To Yours Need
}
}
Or via command line:
vllm serve ... --hf-overrides '{"text_config": {"rope_parameters": {"rope_type": "yarn", "factor": 4.0, "original_max_position_embeddings": Adjust}}}' --max-model-len 2010000
Best Practices
Sampling Parameters:
- Thinking:
Temperature=0.6, TopP=0.95, TopK=20 - Non-thinking:
Temperature=0.7, TopP=0.8, TopK=20
- Thinking:
Output Length: 32,768 for most queries; 81,920 for complex mathematical proofs.
Standardized Prompts:
- Math: "Please reason step by step, and put your final answer within \boxed{}."
- MCQ: "Show your choice in the
answerfield with only the choice letter."
No Thinking in History: Multi-turn conversations should exclude thinking content from history.
Citation
@misc{skt-surya-h,
title = {{SKT-SURYA-H}: Sovereign Neural Physics at 2.544 Trillion Parameters},
author = {{SKT AI Labs}},
month = {March},
year = {2026},
url = {https://huggingface.co/SKT-AI-LABS}
SKT AI Labs — Sidhi, Madhya Pradesh, Bharat
— The World is One Family
---
- Downloads last month
- 836
