Stanford Seminar - Towards Safe and Efficient Learning in the Physical World

Stanford Seminar - Towards Safe and Efficient Learning in the Physical World

Safe Bayesian Optimization

  • Safe Bayesian optimization addresses the challenge of learning efficiently and safely by interacting with the real world.
  • It models unknown rewards and constraints with a stochastic process prior, such as Gaussian process models or Bayesian neural networks.
  • Uncertainty estimates from these models guide exploration within plausibly optimal regions while ensuring constraint satisfaction.
  • Safe Bayesian optimization has been successfully applied in various domains, including tuning scientific instruments, industrial manufacturing tasks, and quadruped robots.

Learning Informative Priors

  • To scale safe Bayesian optimization to richer and more complex applications, learning informative priors is crucial.
  • The speaker proposes using Bayesian meta-learning to learn priors from related tasks.
  • A flexible neural architecture based on Transformer models predicts the score of the stochastic process prior.
  • Empirical results demonstrate the effectiveness of the proposed approach in meta-learning probabilistic models for sequential decision-making.

Safe Reinforcement Learning

  • The speaker explores theoretical questions and parametric regimes of Bayesian optimization.
  • They discuss the importance of safety in tasks where conservative and certainty estimates are crucial.
  • They introduce the idea of using the Gaussian process as a hyper prior and shaping it through key hyper parameters.
  • They propose a frontier search algorithm to find the optimal hyper parameter settings that maximize informativeness while ensuring calibration.
  • They demonstrate substantial acceleration in performance using meta-learning ideas in hardware experiments.
  • They explore the application of ideas from Bayesian optimization to learning-based control, specifically model-based reinforcement learning.
  • They introduce the concept of quantifying uncertainty in the dynamics of an unknown dynamical system using confidence sets.
  • They suggest using epistemic uncertainty in the transition model for introspective planning to avoid unsafe states.
  • They present an optimistic exploration protocol for model-based RL, where a policy is optimized under the most plausible realization of a set of plausible transition models.
  • They describe a method for reducing the problem of propagating uncertainty in the dynamics model to a standard approximate dynamic programming problem.

Optimistic Exploration

  • The speaker introduces a method for exploration in reinforcement learning called optimistic exploration.
  • In optimistic exploration, the agent chooses where within a set of plausible next states it wants to end up, effectively controlling its luck.
  • This approach is more efficient than standard policy gradients, especially when action penalties are used.
  • The speaker also discusses how optimistic exploration can be combined with pessimistic constraint satisfaction to ensure safety in reinforcement learning.
  • Experiments show that the optimistic-pessimistic algorithm outperforms other model-based and model-free algorithms in terms of task completion, constraint satisfaction, and safety during training.

Bridging the Sim-to-Real Gap

  • The speaker concludes by discussing how optimistic exploration can be used to bridge the sim-to-real gap in reinforcement learning.
  • They propose a method for training reinforcement learning agents using a learned neural network prior that is regularized towards a physics simulator.
  • This approach outperforms uninformed neural network models and gray-box models that combine physics-informed priors with neural networks.
  • The speaker argues that models should learn to know what they don't know, which is a key challenge in developing safe and efficient agents that can learn by interacting with the real world.

Overwhelmed by Endless Content?