VAJRAM Edge & Cloud Models: Clinical Decision Support Toolkit

This repository contains the highly optimized .gguf weights, vision projectors, and LoRA adapters powering Project VAJRAM, an agentic Clinical Decision Support System (CDSS) for Multiple Myeloma.

These models are heavily quantized and formatted to run entirely offline on cpu, hf spaces and even on android phones (via llama.rn) utilizing a LangGraph-driven Mixture of Adapters (MoA) architecture.

📦 Repository Contents (The MoA Arsenal)

This is a modular toolkit. Rather than running one massive monolithic model, VAJRAM utilizes a fast base model and hot-swaps lightweight LoRA adapters into memory depending on the clinical task.

1. Base Models & Vision

medgemma_q4km.gguf (2.49 GB): The core foundational medical reasoning engine. Quantized to Q4_K_M to perfectly balance speed, RAM footprint, and preservation of the 256k Gemma medical vocabulary. Fits comfortably in <4GB RAM.
medgemma_vision_base.gguf: The vision-aligned base model for multi-modal tasks.
medgemma_Bone_marrow_vision.gguf : The Llava-style Multimodal Projector (mmproj). This converts clinical WSI (Whole Slide Image) patch pixels into embeddings the LLM can understand.

2. The Clinical LoRA Adapters

These are lightweight (<120MB), hot-swappable domain experts trained for specific LangGraph agent nodes:

lora_module2.gguf: Agent Tool2 - For Multiple Myeloma risk stratification.
lora_module3.gguf: Agent Tool3 - For Bone Marrow biopsy analysis for figure out percentage of malignant myeloma cells.
lora_module4.gguf: Agent Tool4 - For Myeloma Progression analysis.

3. Build Artifacts

llama_cpp_python-0.3.16-cp310-cp310-linux_x86_64.whl (4.75 MB): A custom, pre-compiled Python wheel with OpenBLAS hardware acceleration baked in. Used to bypass strict compilation timeouts when deploying the VAJRAM orchestrator to serverless environments like Hugging Face Spaces.

💻 Open Source Architecture & Code

The complete orchestrator codebase, including the LangGraph MoA setup, Python backend, and the native cpu and gpu based application, is fully open-source.

For all the codes and brief documentation: 👉 View Project VAJRAM on GitHub

🌐 Live Cloud Demo

Want to test the cloud-based MoA orchestrator without installing anything? The LangGraph Python backend is currently deployed as a live interactive web app.

👉 Test VAJRAM on Hugging Face Spaces (Note: This demo is hosted on a free-tier CPU environment. Inference generation will be significantly slower than the native, hardware-accelerated Edge deployment).

🚀 Quick Start: Inference via CLI

To test the base model locally on your terminal using llama.cpp:

# Standard Text Inference
./llama-cli -m medgemma_q4km.gguf -p "User: What are the distinct morphological features of a myeloblast? \n\nAssistant:" -n 256 --temp 0.2

# Testing a LoRA Hot-Swap
./llama-cli -m medgemma_q4km.gguf --lora lora_module2.gguf -p "User: Analyze this patient protocol... \n\nAssistant:" -n 256

Downloads last month: 15

GGUF

Model size

29.8M params

Architecture

gemma3

Hardware compatibility

We're not able to determine the quantization variants.

View all variants

Model tree for shrishSVaidya/VAJRAM-Models

Base model

google/medgemma-1.5-4b-it

Quantized

(33)

this model

shrishSVaidya
/

VAJRAM-Models