239

Fu-En Yang

FuEnYang

https://fuenyang1127.github.io/

AI & ML interests

Computer Vision, Deep Learning, Vision-Language Models (VLMs), Vision-Language-Action Models (VLAs), Reasoning Models, Embodied AI

Recent Activity

upvoted a paper 1 day ago

LLaVA-OneVision-2: Towards Next-Generation Perceptual Intelligence

upvoted a paper 1 day ago

EvalVerse: Pipeline-Aware and Expert-Calibrated Benchmarking for Professional Cinematic Video Generation

upvoted a paper 1 day ago

GEM: Generative Supervision Helps Embodied Intelligence

View all activity

Organizations

upvoted 5 papers 1 day ago

LLaVA-OneVision-2: Towards Next-Generation Perceptual Intelligence

Paper • 2605.25979 • Published 5 days ago • 24

EvalVerse: Pipeline-Aware and Expert-Calibrated Benchmarking for Professional Cinematic Video Generation

Paper • 2605.23271 • Published 8 days ago • 76

GEM: Generative Supervision Helps Embodied Intelligence

Paper • 2605.28548 • Published 3 days ago • 37

ProRL: Effective Reinforcement Learning for Proactive Recommendation via Rectified Policy Gradient Estimation

Paper • 2605.28293 • Published 3 days ago • 78

Gamma-World: Generative Multi-Agent World Modeling Beyond Two Players

Paper • 2605.28816 • Published 3 days ago • 353

upvoted 2 papers 3 days ago

SpatialBench: Is Your Spatial Foundation Model an All-Round Player?

Paper • 2605.27367 • Published 4 days ago • 66

LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding

Paper • 2605.27365 • Published 4 days ago • 118

upvoted a paper 25 days ago

MolmoAct2: Action Reasoning Models for Real-world Deployment

Paper • 2605.02881 • Published 26 days ago • 345

upvoted 7 papers 3 months ago

Generated Reality: Human-centric World Simulation using Interactive Video Generation with Hand and Camera Control

Paper • 2602.18422 • Published Feb 20 • 30

RISE: Self-Improving Robot Policy with Compositional World Model

Paper • 2602.11075 • Published Feb 11 • 29

GigaBrain-0.5M*: a VLA That Learns From World Model-Based Reinforcement Learning

Paper • 2602.12099 • Published Feb 12 • 62

Xiaomi-Robotics-0: An Open-Sourced Vision-Language-Action Model with Real-Time Execution

Paper • 2602.12684 • Published Feb 13 • 7

RLinf-Co: Reinforcement Learning-Based Sim-Real Co-Training for VLA Models

Paper • 2602.12628 • Published Feb 13 • 12

OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence

Paper • 2602.08683 • Published Feb 9 • 52

Olaf-World: Orienting Latent Actions for Video World Modeling

Paper • 2602.10104 • Published Feb 10 • 27

upvoted 5 papers 4 months ago

Recurrent-Depth VLA: Implicit Test-Time Compute Scaling of Vision-Language-Action Models via Latent Iterative Reasoning

Paper • 2602.07845 • Published Feb 8 • 71

MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents

Paper • 2602.02474 • Published Feb 2 • 63

Fu-En Yang

AI & ML interests

Recent Activity

Organizations

FuEnYang's activity