VAJRAM Edge & Cloud Models: Clinical Decision Support Toolkit
This repository contains the highly optimized .gguf weights, vision projectors, and LoRA adapters powering Project VAJRAM, an agentic Clinical Decision Support System (CDSS) for Multiple Myeloma.
These models are heavily quantized and formatted to run entirely offline on cpu, hf spaces and even on android phones (via llama.rn) utilizing a LangGraph-driven Mixture of Adapters (MoA) architecture.
馃摝 Repository Contents (The MoA Arsenal)
This is a modular toolkit. Rather than running one massive monolithic model, VAJRAM utilizes a fast base model and hot-swaps lightweight LoRA adapters into memory depending on the clinical task.
1. Base Models & Vision
medgemma_q4km.gguf(2.49 GB): The core foundational medical reasoning engine. Quantized toQ4_K_Mto perfectly balance speed, RAM footprint, and preservation of the 256k Gemma medical vocabulary. Fits comfortably in <4GB RAM.medgemma_vision_base.gguf: The vision-aligned base model for multi-modal tasks.medgemma_Bone_marrow_vision.gguf: The Llava-style Multimodal Projector (mmproj). This converts clinical WSI (Whole Slide Image) patch pixels into embeddings the LLM can understand.
2. The Clinical LoRA Adapters
These are lightweight (<120MB), hot-swappable domain experts trained for specific LangGraph agent nodes:
lora_module2.gguf: Agent Tool2 - For Multiple Myeloma risk stratification.lora_module3.gguf: Agent Tool3 - For Bone Marrow biopsy analysis for figure out percentage of malignant myeloma cells.lora_module4.gguf: Agent Tool4 - For Myeloma Progression analysis.
3. Build Artifacts
llama_cpp_python-0.3.16-cp310-cp310-linux_x86_64.whl(4.75 MB): A custom, pre-compiled Python wheel with OpenBLAS hardware acceleration baked in. Used to bypass strict compilation timeouts when deploying the VAJRAM orchestrator to serverless environments like Hugging Face Spaces.
馃捇 Open Source Architecture & Code
The complete orchestrator codebase, including the LangGraph MoA setup, Python backend, and the native cpu and gpu based application, is fully open-source.
For all the codes and brief documentation: 馃憠 View Project VAJRAM on GitHub
馃寪 Live Cloud Demo
Want to test the cloud-based MoA orchestrator without installing anything? The LangGraph Python backend is currently deployed as a live interactive web app.
馃憠 Test VAJRAM on Hugging Face Spaces (Note: This demo is hosted on a free-tier CPU environment. Inference generation will be significantly slower than the native, hardware-accelerated Edge deployment).
馃殌 Quick Start: Inference via CLI
To test the base model locally on your terminal using llama.cpp:
# Standard Text Inference
./llama-cli -m medgemma_q4km.gguf -p "User: What are the distinct morphological features of a myeloblast? \n\nAssistant:" -n 256 --temp 0.2
# Testing a LoRA Hot-Swap
./llama-cli -m medgemma_q4km.gguf --lora lora_module2.gguf -p "User: Analyze this patient protocol... \n\nAssistant:" -n 256
- Downloads last month
- 15
We're not able to determine the quantization variants.
Model tree for shrishSVaidya/VAJRAM-Models
Base model
google/medgemma-1.5-4b-it