VAJRAM Edge & Cloud Models: Clinical Decision Support Toolkit

This repository contains the highly optimized .gguf weights, vision projectors, and LoRA adapters powering Project VAJRAM, an agentic Clinical Decision Support System (CDSS) for Multiple Myeloma.

These models are heavily quantized and formatted to run entirely offline on cpu, hf spaces and even on android phones (via llama.rn) utilizing a LangGraph-driven Mixture of Adapters (MoA) architecture.

馃摝 Repository Contents (The MoA Arsenal)

This is a modular toolkit. Rather than running one massive monolithic model, VAJRAM utilizes a fast base model and hot-swaps lightweight LoRA adapters into memory depending on the clinical task.

1. Base Models & Vision

  • medgemma_q4km.gguf (2.49 GB): The core foundational medical reasoning engine. Quantized to Q4_K_M to perfectly balance speed, RAM footprint, and preservation of the 256k Gemma medical vocabulary. Fits comfortably in <4GB RAM.
  • medgemma_vision_base.gguf: The vision-aligned base model for multi-modal tasks.
  • medgemma_Bone_marrow_vision.gguf : The Llava-style Multimodal Projector (mmproj). This converts clinical WSI (Whole Slide Image) patch pixels into embeddings the LLM can understand.

2. The Clinical LoRA Adapters

These are lightweight (<120MB), hot-swappable domain experts trained for specific LangGraph agent nodes:

  • lora_module2.gguf: Agent Tool2 - For Multiple Myeloma risk stratification.
  • lora_module3.gguf: Agent Tool3 - For Bone Marrow biopsy analysis for figure out percentage of malignant myeloma cells.
  • lora_module4.gguf: Agent Tool4 - For Myeloma Progression analysis.

3. Build Artifacts

  • llama_cpp_python-0.3.16-cp310-cp310-linux_x86_64.whl (4.75 MB): A custom, pre-compiled Python wheel with OpenBLAS hardware acceleration baked in. Used to bypass strict compilation timeouts when deploying the VAJRAM orchestrator to serverless environments like Hugging Face Spaces.

馃捇 Open Source Architecture & Code

The complete orchestrator codebase, including the LangGraph MoA setup, Python backend, and the native cpu and gpu based application, is fully open-source.

For all the codes and brief documentation: 馃憠 View Project VAJRAM on GitHub

馃寪 Live Cloud Demo

Want to test the cloud-based MoA orchestrator without installing anything? The LangGraph Python backend is currently deployed as a live interactive web app.

馃憠 Test VAJRAM on Hugging Face Spaces (Note: This demo is hosted on a free-tier CPU environment. Inference generation will be significantly slower than the native, hardware-accelerated Edge deployment).

馃殌 Quick Start: Inference via CLI

To test the base model locally on your terminal using llama.cpp:

# Standard Text Inference
./llama-cli -m medgemma_q4km.gguf -p "User: What are the distinct morphological features of a myeloblast? \n\nAssistant:" -n 256 --temp 0.2

# Testing a LoRA Hot-Swap
./llama-cli -m medgemma_q4km.gguf --lora lora_module2.gguf -p "User: Analyze this patient protocol... \n\nAssistant:" -n 256
Downloads last month
15
GGUF
Model size
29.8M params
Architecture
gemma3
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for shrishSVaidya/VAJRAM-Models

Quantized
(33)
this model

Space using shrishSVaidya/VAJRAM-Models 1