In “Learning Agile Robotic Locomotion Skills by Imitating Animals”, we present a framework that takes a reference motion clip recorded from an animal (a dog, in this case) and uses RL to train a control policy that enables a robot to imitate the motion in the real world. By providing the system with different reference motions, we are able to train a quadruped robot to perform a diverse set of agile behaviors, ranging from fast walking gaits to dynamic hops and turns. The policies are trained primarily in simulation, and then transferred to the real world using a latent space adaptation technique that can efficiently adapt a policy using only a few minutes of data from the real robot. All simulations are performed using PyBullet.
Robots can use meta-learning to quickly adapt from simulation to various real world conditions.
See paper at https://arxiv.org/abs/2003.01239
Become the squishy glowing creature you’ve always wanted to be. Take control of the worlds inhabitants to solve puzzles. Overthrow the Mastermote! Check it out at https://store.steampowered.com/app/791240/Lumote
Check out the Wired article about the Alphabet ‘Everyday Robot’.
PyBullet and Bullet Physics is used in the collaboration, as discussed in this “Speeding up robot learning by 100x with simulation” paper and described in those sim-to-real slides and the “Challenges of Self-Supervision via Interaction in Robotics” slides.
Facebook AI Habitat is a new open source simulation platform created by Facebook AI that’s designed to train embodied agents (such as virtual robots) in photo-realistic 3D environments. The latest version adds Bullet Physics.
The github repo is here: https://github.com/facebookresearch/habitat-sim
This Robot Table Tennis project shows some very exciting research by Reza Mahjourian. It uses PyBullet and its Virtual Reality physics server support. Check out https://sites.google.com/corp/view/robottabletennis and his PhD thesis, Arxiv paper and video links here: https://www.cs.utexas.edu/~reza
TossingBot, a new paper by Google Robotics (Andy Zeng, Shuran Song, Johnny Lee, Alberto Rodriguez, Thomas Funkhouser
) using PyBullet.
We investigate whether a robot arm can learn to pick and throw arbitrary objects into selected boxes quickly and accurately. Throwing has the potential to increase the physical reachability and picking speed of a robot arm. However, precisely throwing arbitrary objects in unstructured settings presents many challenges: from acquiring reliable pre-throw conditions (e.g. initial pose of object in manipulator) to handling varying object-centric properties (e.g. mass distribution, friction, shape) and dynamics (e.g. aerodynamics). In this work, we propose an end-to-end formulation that jointly learns to infer control parameters for grasping and throwing motion primitives from visual observations (images of arbitrary objects in a bin) through trial and error. Within this formulation, we investigate the synergies between grasping and throwing (i.e., learning grasps that enable more accurate throws) and between simulation and deep learning (i.e., using deep networks to predict residuals on top of control parameters predicted by a physics simulator). The resulting system, TossingBot, is able to grasp and throw arbitrary objects into boxes located outside its maximum reach range at 500+ mean picks per hour (600+ grasps per hour with 85% throwing accuracy); and generalizes to new objects and target locations.
See https://arxiv.org/abs/1903.11239 and a video here: https://www.youtube.com/watch?v=f5Zn2Up2RjQ&feature=youtu.be
A new ICRA 2019 paper by INRIA/CNRS (Eloïse Dalin, Pierre Desreumaux, Jean-Baptiste Mouret) using PyBullet:
See a preprint here https://hal.inria.fr/hal-02084619
A new paper by Google Robotics using PyBullet.
Interest in derivative-free optimization (DFO) and “evolutionary strategies” (ES) has recently surged in the Reinforcement Learning (RL) community, with growing evidence that they match state of the art methods for policy optimization tasks. However, blackbox DFO methods suffer from high sampling complexity since they require a substantial number of policy rollouts for reliable updates. They can also be very sensitive to noise in the rewards, actuators or the dynamics of the environment. In this paper we propose to replace the standard ES derivative-free paradigm for RL based on simple reward-weighted averaged random perturbations for policy updates, that has recently become a subject of voluminous research, by an algorithm where gradients of blackbox RL functions are estimated via regularized regression methods. In particular, we propose to use L1/L2 regularized regression-based gradient estimation to exploit sparsity and smoothness, as well as LP decoding techniques for handling adversarial stochastic and deterministic noise. Our methods can be naturally aligned with sliding trust region techniques for efficient samples reuse to further reduce sampling complexity. This is not the case for standard ES methods requiring independent sampling in each epoch. We show that our algorithms can be applied in locomotion tasks, where training is conducted in the presence of substantial noise, e.g. for learning in sim transferable stable walking behaviors for quadruped robots or training quadrupeds how to follow a path. We further demonstrate our methods on several OpenAI Gym Mujoco RL tasks. We manage to train effective policies even if up to 25% of all measurements are arbitrarily corrupted, where standard ES methods produce sub-optimal policies or do not manage to learn at all. Our empirical results are backed by theoretical guarantees.
See also https://arxiv.org/abs/1903.02993
A new ICRA 2019 paper using PyBullet:
Abhik Singla, Shounak Bhattacharya and Dhaivat Dholakiya are with the Robert Bosch Centre for Cyber-Physical Systems, IISc, Bangalore, India.
Humans and animals are believed to use a very minimal set of trajectories to perform a wide variety of tasks including walking. Our main objective in this paper is two fold 1) Obtain an effective tool to realize these basic motion patterns for quadrupedal walking, called the kinematic motion primitives (kMPs), via trajectories learned from deep reinforcement learning (D-RL) and 2) Realize a set of behaviors, namely trot, walk, gallop and bound from these kinematic motion primitives in our custom four legged robot, called the `Stoch’. D-RL is a data driven approach, which has been shown to be very effective for realizing all kinds of robust locomotion behaviors, both in simulation and in experiment. On the other hand, kMPs are known to capture the underlying structure of walking and yield a set of derived behaviors. We first generate walking gaits from D-RL, which uses policy gradient based approaches. We then analyze the resulting walking by using principal component analysis. We observe that the kMPs extracted from PCA followed a similar pattern irrespective of the type of gaits generated. Leveraging on this underlying structure, we then realize walking in Stoch by a straightforward reconstruction of joint trajectories from kMPs. This type of methodology improves the transferability of these gaits to real hardware, lowers the computational overhead on-board, and also avoids multiple training iterations by generating a set of derived behaviors from a single learned gait.
See also https://arxiv.org/abs/1810.03842 and a video here: https://www.youtube.com/watch?v=kiLKSqI4KhE&feature=youtu.be