240119 AA289 Annie Chen

03 Feb 2024 (over 1 year ago)

Reinforcement Learning for Autonomous Robots

Recent advances in autonomous robots have led to robots that can perform tasks in controlled environments.
However, these robots often struggle to adapt to unexpected circumstances and novel scenarios during real-world deployment.
Reinforcement learning provides a framework for robots to adapt autonomously, but it is challenging to apply directly during deployment due to the need for feedback, retries, and the ability to learn from scratch.

Reset-free reinforcement learning addresses some of these challenges by allowing robots to practice both learning the task and undoing it without human intervention.
Single-life reinforcement learning is introduced as a paradigm where the agent is given prior experience and must adapt to a new scenario without human intervention or supervision within a single episode.

The proposed method, Robust Autonomous Modulation (REALM), leverages the expressive power of each behavior's value function to guide behavior selection during adaptation.
REALM fine-tunes the value functions of pre-trained behaviors to correct for overestimation in out-of-distribution states.
The selection mechanism in REALM quickly identifies appropriate behaviors in a given situation, eliminating the need for a separate high-level controller or adaptation module.
REALM is agnostic to how the policies and value functions of the prior behaviors are trained and can provide improvements in new situations with either a small or large number of pre-trained behaviors.
The adaptation process in REALM happens within a single episode at test time, allowing robots to adapt to a variety of situations without the need for extensive online training.

Rome is a simple algorithm for autonomous deployment-time adaptation.
Rome outperforms prior methods in simulated and real-world experiments.
Rome can adapt to novel situations within a single episode.
Rome can handle dynamic changing payloads and unseen objects.
Rome can leverage parts of each relevant behavior to complete tasks.
Rome provides a mechanism for single-life test-time adaptation to unseen situations.