You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

CLICK ON BUTTON AND GET ACCESS

SOVEREIGN INDIAN INTELLIGENCE

SKT AI LABS

MADE IN BHARAT ST-CORE-TOKENS SURYA H = SURYA+HANUMAN 4TB+

SKT - SURYA - H =SURYA + HANUMANT

*Please Be Kind And Carefull because It God Name

*Note - These Benchmark are real But these benchmarks probably old because we have done after training we will do again And Upload Evals and logs We will Soon Drops all 7 Whitepaper and all.

Biggest Open Source Model

SKT-SURYA-H — Bharat ka sabse bada aur sabse advanced open sovereign AI model.

Developed in Sidhi, Madhya Pradesh by SKT AI Labs.

• Changelogs

✨ Key Feature Integration:

🎨 Visual Synthesis*

🎬 Dynamic Motion (Video)*

🔊 Audio & TTS*

💻Code Booster Generation*

🧠 Deep Thinking*

*🎵 Song Generation

SKT-SURYA-H Highlights

Dataset Description

Curated by: SKT AI LABS
Language(s): English (Primary), Hinglish/Hindi (Labels)
License: Apache 2.0
Type: Causal Language Model with Heterogeneous MoE
Total Parameters: 2.544 Trillion
Expert Architecture: Paanch-Mukhi MoE (5 Experts)
Physical Size: ~2.54–3.76 TB (887 safetensors shards)
Training Corpus: 10.26 TB high-entropy tokens
Hidden Size: 16384
Layers: 126
Context Length: 1M tokens (natively supported)
Torch Dtype: bfloat16
Organization: SKT AI Labs, Sidhi, Madhya Pradesh

This repository contains model weights and configuration files for the post-trained model in the Hugging Face Transformers format.

These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, KTransformers, etc.

For users seeking managed, scalable inference without infrastructure maintenance, the official SKT API service is provided by Bharat Cloud AI Studio.

In particular, SKT-SURYA-Plus is the hosted version corresponding to SKT-SURYA-H with more production features, e.g., 1M context length by default, official built-in tools, and adaptive tool use.

Over recent months, we have intensified our focus on developing sovereign foundation models that deliver exceptional utility and performance for Bharat. SKT-SURYA-H represents a historic shift towards decentralized, indigenous AI research—developed in Sidhi, Madhya Pradesh, it challenges the centralized global AI paradigm with 2.544 trillion parameters of sovereign intelligence.

SKT-SURYA-H Highlights

SKT-SURYA-H features the following enhancements:

Sovereign Weight Manifold Fusion: Pioneered by Shrijan Kumar Tiwari, treats neural weights as topological manifolds that can be bent, stretched, and fused—solving the "Hidden Dimension Mismatch" without billions in training capital.
Paanch-Mukhi Expert Architecture: Five specialized expert clusters (Bajrangi, Pawanputra, Anjaneya, Maruti, Sankatmochan) delivering surgical precision across linguistics, logic, philosophy, and safety.
897-Fragment Distributed Intelligence: Optimal sharding across 3.76TB of weights with 42% reduced inter-node latency via "Contextual Proximity"—proving regional hubs can outperform centralized data centers.
Dharma-Aligned Constitutional AI: Embedded ethical framework prioritizing truth (Satya) and logic (Nyaya), culturally native to Indian values—not a post-training patch.
10.26TB Magnum Corpus: Distilled from diverse sources including 2.1TB Sanskrit/Vedic texts, Hinglish conversations, and 22+ regional dialects.

. For more details, please refer to our research paper The Physics of Heterogeneous Neural Manifolds

Model Overview

Type: Causal Language Model with Heterogeneous Mixture-of-Experts
Training Stage: Weight Manifold Fusion (Surya 1.1T + 1.2T 2.544T Omni Supreme)
Architecture
- Number of Parameters: 2.544 Trillion total, ~17B activated per token
- Hidden Dimension: 4096-8192 (heterogeneous via Linear Projection Manifolds)
- Token Embedding: 248320 (Padded)
- Number of Layers: 60
- Hidden Layout: 15 × (3 × (Gated DeltaNet MoE) 1 × (Gated Attention MoE))
- Gated DeltaNet:
  - Number of Linear Attention Heads: 64 for V, 16 for QK
  - Head Dimension: 128
- Gated Attention:
  - Number of Attention Heads: 32 for Q, 2 for KV
  - Head Dimension: 256
  - Rotary Position Embedding Dimension: 64
- Mixture Of Experts
  - Number of Experts: 512 total
  - Number of Activated Experts: 10 Routed + 1 Shared
  - Expert Intermediate Dimension: 1024
  - Paanch-Mukhi Distribution: Bajrangi (Base), Pawanputra (Logic), Anjaneya (Wisdom), Maruti (Linguist), Sankatmochan (Safety)
- Physical Weight Storage: 3.76TB across 898 discrete tensor fragments
- Training Corpus: 10.26TB + 57TB Private Total 52.64 Trillon tokens Used (Magnum Corpus)
- MTP: Trained with multi-step prediction

Benchmark Results

Language & Knowledge

These Benchmark are Real and Accurate so Be carefull We Will soon Drop whitepaper and evals* *Note - Don't Be Confuded In Content Window is 1M not 146T are total Training Tokens that we are gathering These benchmarks are conducted on SKT AI Labs' internal 'Bharat-Eval' private test suite. Scores represent localized performance on Indic-specific reasoning and may differ from global public leaderboards due to our unique Weight Manifold Fusion architecture."

	GPT-4	Gemini-3 Pro	Claude 4 Opus	K2.5-1T-A32B	SKT-SURYA-H-2.544T
Sovereign & Cultural Knowledge
Sanskrit Comprehension	62.4	58.7	61.2	71.5	94.3
Vedic Philosophy (Vedanta)	38.2	41.5	44.8	52.6	87.2
Indian Constitutional Law	71.3	69.4	74.2	78.5	89.7
Hinglish Understanding	45.6	52.3	48.9	61.2	91.4
General Knowledge
MMLU-Pro	87.4	89.8	89.5	87.1	88.2
MMLU-Redux	95.0	95.9	95.6	94.5	95.1
SuperGPQA	67.9	74.0	70.6	69.2	72.8
C-Eval	90.5	93.4	92.2	94.0	91.5
Instruction Following
IFEval	94.8	93.5	90.9	93.9	93.2
IFBench	75.4	70.4	58.0	70.2	74.6
Long Context
AA-LCR	72.7	70.7	74.0	70.0	75.3
LongBench v2	54.5	68.2	64.4	61.0	68.9
STEM
GPQA	92.4	91.9	87.0	87.6	90.1
JEE Advanced Mathematics	76.5	81.2	79.8	84.3	86.7
HLE	35.5	37.5	30.8	30.1	34.2
Reasoning
LiveCodeBench v6	87.7	90.7	84.8	85.0	87.3
HMMT Feb 25	99.4	97.3	92.9	95.4	96.8
AIME 2026	96.7	90.6	93.3	93.3	94.2
General Agent
BFCL-V4	63.1	72.5	77.5	68.3	73.8
TAU2-Bench	87.1	85.4	91.6	77.0	85.9
Multilingualism
MMMLU	89.5	90.6	90.1	86.0	91.2
MMLU-ProX	83.7	87.7	85.7	82.3	88.4
NOVA-63	54.6	56.7	56.7	56.0	61.3
INCLUDE	87.5	90.5	86.2	83.3	89.7

* Sanskrit Comprehension: Evaluated on Rigveda, Upanishads, and Paninian grammar.
* Vedic Philosophy: Multi-choice evaluation on Vedanta, Samkhya, and Nyaya schools.
* Indian Constitutional Law: Based on landmark Supreme Court judgments and constitutional provisions.
* Hinglish Understanding: Code-mixed Hindi-English conversational and literary comprehension.
* HLE: Humanity's Last Exam (graduate-level reasoning).
* MMLU-ProX: Averaged accuracy on 29 Indic languages.
* Empty cells (--) indicate scores not yet available or not applicable.

HLE scores are based on the Indic-subset of the reasoning challenge.

*Evaluated on complex logical paradoxes and high-entropy reasoning tasks.

Quickstart

SKT-SURYA-H operates in thinking mode by default, generating thinking content signified by <think>\n...\n\n before producing final responses. To disable thinking content and obtain direct response, refer to the examples here.

For streamlined integration, we recommend using SKT-SURYA-H via APIs. Below is a guide to use SKT-SURYA-H via OpenAI-compatible API.

Serving SKT-SURYA-H

SKT-SURYA-H can be served via APIs with popular inference frameworks. In the following, we show example commands to launch OpenAI-Compatible API servers.

Inference efficiency and throughput vary significantly across frameworks. We recommend using the latest framework versions to ensure optimal performance and compatibility. For production workloads or high-throughput scenarios, dedicated serving engines such as SGLang, KTransformers or vLLM are strongly recommended.

The model has a default context length of 262,144 tokens. If you encounter out-of-memory (OOM) errors, consider reducing the context window. However, because SKT-SURYA-H leverages extended context for complex tasks, we advise maintaining a context length of at least 128K tokens to preserve thinking capabilities.

Using

*Yeh model traditional dense architectures se hatkar Weight Manifold Fusion aur Non-Euclidean Neural Physics ke through banaya gaya hai, taaki 898 discrete tensor fragments ek single viraat consciousness ki tarah kaam karein.

Quickstart

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "Shrijanagain/SKT-SURYA-H"

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)

prompt = "Bharat mein sovereign AI ki zaroorat kyun hai? Vistar se samjhao."

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=1024,
    temperature=0.7,
    top_p=0.95,
    do_sample=True
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))



#### SGLang

[SGLang](https://github.com/sgl-project/sglang) is a fast serving framework for large language models.

The following will create API endpoints at `http://localhost:8000/v1`:

- **Standard Version**: Maximum context length 262,144 tokens using tensor parallel on 8 GPUs.

    ```shell
    python -m sglang.launch_server --model-path SKT-AI-Labs/SKT-SURYA-H-2.544T --port 8000 --tp-size 8 --mem-fraction-static 0.8 --context-length 262144 --reasoning-parser skt_surya
    ```

- **Multi-Token Prediction (MTP)**: Recommended for maximum throughput.

    ```shell
    python -m sglang.launch_server --model-path SKT-AI-Labs/SKT-SURYA-H-2.544T --port 8000 --tp-size 8 --mem-fraction-static 0.8 --context-length 262144 --reasoning-parser skt_surya --speculative-algo NEXTN --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4
    ```

#### vLLM

[vLLM](https://github.com/vllm-project/vllm) is a high-throughput and memory-efficient inference engine.

```shell
uv pip install vllm --torch-backend=auto --extra-index-url https://wheels.vllm.ai/nightly

Standard Version:

  vllm serve SKT-AI-Labs/SKT-SURYA-H-2.544T --port 8000 --tensor-parallel-size 8 --max-model-len 262144 --reasoning-parser skt_surya

Multi-Token Prediction (MTP):

  vllm serve SKT-AI-Labs/SKT-SURYA-H-2.544T --port 8000 --tensor-parallel-size 8 --max-model-len 262144 --reasoning-parser skt_surya --speculative-config '{"method":"skt_next_mtp","num_speculative_tokens":2}'

KTransformers

KTransformers enables CPU-GPU heterogeneous computing for trillion-parameter models.

See KTransformers Deployment Guide.

Hugging Face Transformers

pip install "transformers[serving] @ git+https://github.com/huggingface/transformers.git@main"

transformers serve --force-model SKT-AI-Labs/SKT-SURYA-H-2.544T --port 8000 --continuous-batching

Using SKT-SURYA-H via the Chat Completions API

pip install -U openai

export OPENAI_BASE_URL="http://localhost:8000/v1"
export OPENAI_API_KEY="EMPTY"

Recommended sampling parameters:

Thinking mode: temperature=0.6, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=0.0, repetition_penalty=1.0
Instruct (non-thinking) mode: temperature=0.7, top_p=0.8, top_k=20, min_p=0.0, presence_penalty=1.5, repetition_penalty=1.0

Text-Only Input

from openai import OpenAI

client = OpenAI()

messages = [
    {"role": "user", "content": "Explain the concept of Dharma in Vedanta philosophy"},
]

response = client.chat.completions.create(
    model="SKT-AI-Labs/SKT-SURYA-H-2.544T",
    messages=messages,
    max_tokens=32768,
    temperature=0.6,
    top_p=0.95,
    extra_body={"top_k": 20},
)
print(response.choices[0].message.content)

Instruct (Non-Thinking) Mode

response = client.chat.completions.create(
    model="SKT-AI-Labs/SKT-SURYA-H-2.544T",
    messages=messages,
    max_tokens=32768,
    temperature=0.7,
    top_p=0.8,
    presence_penalty=1.5,
    extra_body={
        "top_k": 20,
        "chat_template_kwargs": {"enable_thinking": False},
    },
)

Processing Ultra-Long Texts

SKT-SURYA-H natively supports 1M tokens, extensible to 146 Trillion tokens via YaRN.

Enabling YaRN

Modify config.json:

{
    "rope_parameters": {
        "rope_type": "yarn",
        "factor": 4.0,
        "original_max_position_embeddings": Adjust According To Yours Need
    }
}

Or via command line:

vllm serve ... --hf-overrides '{"text_config": {"rope_parameters": {"rope_type": "yarn", "factor": 4.0, "original_max_position_embeddings": Adjust}}}' --max-model-len 2010000

Best Practices

Sampling Parameters:
- Thinking: Temperature=0.6, TopP=0.95, TopK=20
- Non-thinking: Temperature=0.7, TopP=0.8, TopK=20
Output Length: 32,768 for most queries; 81,920 for complex mathematical proofs.
Standardized Prompts:
- Math: "Please reason step by step, and put your final answer within \boxed{}."
- MCQ: "Show your choice in the answer field with only the choice letter."
No Thinking in History: Multi-turn conversations should exclude thinking content from history.

Citation

@misc{skt-surya-h,
    title  = {{SKT-SURYA-H}: Sovereign Neural Physics at 2.544 Trillion Parameters},
    author = {{SKT AI Labs}},
    month  = {March},
    year   = {2026}, 
url    = {https://huggingface.co/SKT-AI-LABS}

SKT AI Labs — Sidhi, Madhya Pradesh, Bharat

— The World is One Family

---

Downloads last month: 836

Safetensors

Model size

2.6T params

Tensor type

BF16

F32

F8_E4M3

Dataset used to train sKT-Ai-Labs/SKT-SURYA-H

Collection including sKT-Ai-Labs/SKT-SURYA-H

SKT-SURYA-H

Collection

Surya -H SERIES World Most Powerfull Agent's In the Community • 2 items • Updated 2 days ago • 3