None defined yet.
$π$-Bench: Evaluating Proactive Personal Assistant Agents in Long-Horizon Workflows