Abstract: Safe reinforcement learning (RL) aims to learn policy while also ensuring the safety constraints. An increasingly common approach is to design a safety filter based on control barrier ...
Abstract: In multi-robot systems (MRS) operating across various applications, real-time task allocation and path planning pose significant challenges, often requiring extensive human intervention ...
DR Tulu-8B is the first open Deep Research (DR) model trained for long-form DR tasks. DR Tulu-8B matches OpenAI DR on long-form DR benchmarks. Feburary 9, 2026: šŸ”„ We released a free interactive demo ...
To address data selection for RLVR post-training, LearnAlign is proposed—utilizing "gradient alignment" as a representativeness metric and "success rate $V(\xi)=p(1 ...