Reinforcement Learning Tutorial Code

Differential High Order Control Barrier Function-Based Safe Reinforcement Learning

Abstract: Safe reinforcement learning (RL) aims to learn policy while also ensuring the safety constraints. An increasingly common approach is to design a safety filter based on control barrier ...

IEEE

Prompt Optimization Through Reinforcement Learning for Generative Language Model Code Synthesis in Multi-Robot Systems

Abstract: In multi-robot systems (MRS) operating across various applications, real-time task allocation and path planning pose significant challenges, often requiring extensive human intervention ...

GitHub

DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research

DR Tulu-8B is the first open Deep Research (DR) model trained for long-form DR tasks. DR Tulu-8B matches OpenAI DR on long-form DR benchmarks. Feburary 9, 2026: 🔥 We released a free interactive demo ...

GitHub

learnalign_data_selection_for_llm_reinforcement_learning_with_improved_gradient_.md

To address data selection for RLVR post-training, LearnAlign is proposed—utilizing "gradient alignment" as a representativeness metric and "success rate $V(\xi)=p(1 ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results