17 4

MLLM PRO

Anran-MLLM

AI & ML interests

None yet

Recent Activity

reacted to theirpost with 👍 about 11 hours ago

🚀 Introducing PerceptionDLM — the first multimodal diffusion LLM for parallel region perception! Most MLLMs are autoregressive, so captioning N regions costs N sequential passes. PerceptionDLM instead describes ALL masked regions in a single denoising process. 🧩 ✨ Highlights • ⚡ Up to 3.4× faster on dense multi-region captioning, with stable per-image latency • 🏆 PerceptionDLM-Base beats LLaDA-V on 15/16 multimodal benchmarks (new SOTA among open diffusion VLMs) • 📊 New benchmark: ParaDLC-Bench — jointly evaluates caption quality AND inference efficiency • 🔓 Code, models & benchmark all open-sourced 🤖 Models https://huggingface.co/MSALab/PerceptionDLM-Base https://huggingface.co/MSALab/PerceptionDLM 📊 Benchmark https://huggingface.co/datasets/MSALab/ParaDLC-Bench 📄 Paper: https://huggingface.co/papers/2606.19534 💻 Code: https://github.com/MSALab-PKU/PerceptionDLM Diffusion LLMs aren't just for text — they unlock efficient, parallel visual perception. 👁️✨ #multimodal #diffusion #VLM #perception

reacted to theirpost with 🔥 about 11 hours ago

upvoted a paper 2 days ago

PerceptionDLM: Parallel Region Perception with Multimodal Diffusion Language Models

View all activity

Organizations

None yet

reacted to their post with 👍🔥 about 11 hours ago

Post

3435

🚀 Introducing PerceptionDLM — the first multimodal diffusion LLM for parallel region perception!

Most MLLMs are autoregressive, so captioning N regions costs N sequential passes. PerceptionDLM instead describes ALL masked regions in a single denoising process. 🧩

✨ Highlights
• ⚡ Up to 3.4× faster on dense multi-region captioning, with stable per-image latency
• 🏆 PerceptionDLM-Base beats LLaDA-V on 15/16 multimodal benchmarks (new SOTA among open diffusion VLMs)
• 📊 New benchmark: ParaDLC-Bench — jointly evaluates caption quality AND inference efficiency
• 🔓 Code, models & benchmark all open-sourced

🤖 Models
MSALab/PerceptionDLM-Base
MSALab/PerceptionDLM

📊 Benchmark
MSALab/ParaDLC-Bench

📄 Paper: PerceptionDLM: Parallel Region Perception with Multimodal Diffusion Language Models (2606.19534)
💻 Code: https://github.com/MSALab-PKU/PerceptionDLM

Diffusion LLMs aren't just for text — they unlock efficient, parallel visual perception. 👁️✨

#multimodal #diffusion #VLM #perception

posted an update 3 days ago

Post

3435

🚀 Introducing PerceptionDLM — the first multimodal diffusion LLM for parallel region perception!

Most MLLMs are autoregressive, so captioning N regions costs N sequential passes. PerceptionDLM instead describes ALL masked regions in a single denoising process. 🧩

✨ Highlights
• ⚡ Up to 3.4× faster on dense multi-region captioning, with stable per-image latency
• 🏆 PerceptionDLM-Base beats LLaDA-V on 15/16 multimodal benchmarks (new SOTA among open diffusion VLMs)
• 📊 New benchmark: ParaDLC-Bench — jointly evaluates caption quality AND inference efficiency
• 🔓 Code, models & benchmark all open-sourced

🤖 Models
MSALab/PerceptionDLM-Base
MSALab/PerceptionDLM

📊 Benchmark
MSALab/ParaDLC-Bench

📄 Paper: PerceptionDLM: Parallel Region Perception with Multimodal Diffusion Language Models (2606.19534)
💻 Code: https://github.com/MSALab-PKU/PerceptionDLM

Diffusion LLMs aren't just for text — they unlock efficient, parallel visual perception. 👁️✨

#multimodal #diffusion #VLM #perception

MLLM PRO

AI & ML interests

Recent Activity

Organizations

Anran-MLLM's activity