Part 3 – Reinforcement learning for robotics applications

Reinforcement learning is an extremely active research field. In this article, I will review the some of the latest research publications in the field of reinforcement learning for robotics applications. Do you want more good news? Most of these publications can be found in open access!

Year 2019

Data-efficient Learning of Morphology and Controller for a Microrobot

Authors: Thomas Liao, Grant Wang, Brian Yang, Rene Lee, Kristofer Pister, Sergey Levine, and Roberto Calandra

Research group: University of California, Berkeley / Facebook AI research

Task: ball repositioning, analog stick deflection, rolling a 20-sided dice

Algorithm: Hierarchical Process Constrained Batch Bayesian Optimization (HPC-BBO) – Bayesian optimization and Gaussian Process

Sample efficiency: 25

Supporting web page: https://sites.google.com/view/learning-robot-morphology

Robot used: 6 legged microbot

Virtual environment: V-REP

Manipulation by Feel: Touch-Based Control with Deep Predictive Models

Authors: Stephen Tian, Frederik Ebert, Dinesh Jayaraman, Mayur Mudigonda, Chelsea Finn, Roberto Calandra, and Sergey Levine

Research group: Google / University of California, Berkeley

Task: Ball repositioning, analog stick deflection, rolling a 20-sided dice

Algorithm: Model Predictive Control (Model-based RL)

Sample efficiency: ?

Supporting web page:
https://sites.google.com/view/deeptactilempc

Robot used: 3-axis CNC machine

Virtual environment: ?

Soft Actor-Critic Algorithms and Applications

Authors: Tuomas Haarnoja, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar, Henry Zhu, Abhishek Gupta, Pieter Abbeel, and Sergey Levine

Research group: Google Brain / University of California, Berkeley

Task: Walking and rotating a valve

Algorithm: Soft Actor Critic (off-policy)

Sample efficiency: 160k to 300k

Supporting web page:
https://sites.google.com/view/sac-and-applications/

Robot used: Minitaur robot / Dynamixel Claw

Virtual environment: ?

Hierarchical Policy Design for Sample-Efficient Learning of Robot Table Tennis Through Self-Play

Authors: Reza Mahjourian, Risto Miikkulainen, Nevena Lazic, Sergey Levine, and Navdeep Jaitly.

Research group: Google Brain / University of Texas, Austin

Task: Playing table tennis

Algorithm: Proximal Policy Optimization (PPO) and Augmented Random Search (ARS)

Sample efficiency: 24k

Supporting web page:
https://sites.google.com/view/robottabletennis

Robot used: Minitaur robot / Dynamixel Claw

Virtual environment: ?

Year 2018

QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation

Authors: Dmitry Kalashnikov, Alex Irpan, Peter Pastor, Julian Ibarz, Alexander Herzog, Eric Jang, Deirdre Quillen, Ethan Holly, Mrinal Kalakrishnan, Vincent Vanhoucke, and Sergey Levine

Research group: Google / University of California, Berkeley

Task: Grasping (pick and place)

Algorithm: QT-opt (off-policy) + Cross Entropy Method

Sample efficiency: 580k

Supporting web page:
https://sites.google.com/view/end2endgrasping
https://ai.googleblog.com/2018/06/scalable-deep-reinforcement-learning.html

Robot used: Kuka LBR IIWA

Virtual environment: Bullet Physics simulator

Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations

Authors: Aravind Rajeswaran, Vikash Kumar, Abhishek Gupta, Giulia Vezzani, John Schulman, Emanuel Todorov, and
Sergey Levine

Research group: University of Washington Seattle, OpenAI, University of California Berkeley

Task: Dexterous tasks (Object relocation, in-hand manipulation, door opening, nail and hammer)

Algorithm: Demo Augmented Policy Gradient (DAPG)

Sample efficiency: 3 to 6 hours

Supporting web page:
https://sites.google.com/view/deeprl-dexterous-manipulation

Robot used: None

Virtual environment: MuJoCo physics simulator (24 degree of freedom ADROIT hand)

Dexterous Manipulation with Deep Reinforcement Learning: Efficient, General, and Low-Cost

Authors: Henry Zhu, Abhishek Gupta, Aravind Rajeswaran, Sergey Levine, and Vikash Kumar.

Research group: University of Washington Seattle, University of California Berkeley, Google Brain

Task: Dexterous tasks (valve rotation, vertical box flipping, door opening)

Algorithm: Demo Augmented Policy Gradient (DAPG)

Sample efficiency: very low (only a few trials)

Supporting web page:
https://sites.google.com/view/deeprl-handmanipulation

Robot used: D’Claw (9 dof) / Allegro (16 dof)

Virtual environment: None

Setting up a Reinforcement Learning Task with a Real-World Robot

Authors: A. Rupam Mahmood, Dmytro Korenkevych, Brent J. Komer, and James Bergstra

Research group: Kindred Inc

Task: Reaching a target position

Algorithm: Trust Region Policy Optimization (TRPO)

Sample efficiency: 750 min

Supporting web page: None

Robot used: UR5

Virtual environment: None

Residual Reinforcement Learning for Robot Control

Authors: Tobias Johannink, Shikhar Bahl, Ashvin Nair, Jianlan Luo, Avinash Kumar, Matthias Loskyll, Juan Aparicio Ojea, Eugen Solowjow, and Sergey Levine.

Research group: Siemens Corporation, University of Berkeley, California, Hamburg university of technology

Task: Inserting a foam block between 2 other foam blocks

Algorithm: Twin delayed deep deterministic policy gradients (TD3)

Sample efficiency: 3k

Supporting web page: https://residualrl.github.io/

Robot used: Sawyer robot arm

Virtual environment: MuJoCo

Composable Action-Conditioned Predictors: Flexible Off-Policy Learning for Robot Navigation.

Authors: Gregory Kahn, Adam Villaflor, Pieter Abbeel, and Sergey Levine.

Research group: University of Berkeley, California,

Task: Collision avoidance for remote controlled car

Algorithm: Composable Action-Conditioned Predictors (CAPs) – off-policy

Sample efficiency: 11 hours

Supporting web page: https://github.com/gkahn13/CAPs

Robot used: Remote controlled car

Virtual environment: CARLA (robot car simulator)

Self-Supervised Deep Reinforcement Learning with Generalized Computation Graphs for Robot Navigation

Authors: Gregory Kahn, Adam Villaflor, Bosen Ding, Pieter Abbeel, and Sergey Levine

Research group: University of Berkeley, California,

Task: Autonomous driving

Algorithm: Generalized Computation Graph – combine model-free and model-based

Sample efficiency: 4 hours

Supporting web page: https://github.com/gkahn13/gcg

Robot used: Remote controlled car

Virtual environment: None

Data-Efficient Hierarchical Reinforcement Learning

Authors: Ofir Nachum, Shixiang Gu, Honglak Lee, and Sergey Levine.

Research group: Google Brain

Task: Maze navigation

Algorithm: HIerarchical Reinforcement learning with Off-policy correction (HIRO)

Sample efficiency: 2 to 4 millions steps

Supporting web page: https://sites.google.com/view/efficient-hrl

Robot used: None

Virtual environment: MuJoCo

Composable Deep Reinforcement Learning for Robotic Manipulation

Authors: Tuomas Haarnoja, Vitchyr Pong, Aurick Zhou, Murtaza Dalal, Pieter Abbeel, and Sergey Levine

Research group: University of California, Berkeley / OpenAI

Task: Pushing objects, Lego stacking, Obstracle avoidance

Algorithm: Soft Q learning

Sample efficiency: 2 hours

Supporting web page: https://sites.google.com/view/composing-real-world-policies/

Robot used: Sawyer robot

Virtual environment: MuJoCo

Vision-based multi-task manipulation for inexpensive robots using end-to-end learning from demonstration

Authors: Rouhollah Rahmatizadeh, Pooya Abolghasemi, Ladislau Boloni, and Sergey Levine

Research group: University of California, Berkeley / University of Central Florida

Task: Picking and push objects

Algorithm: Behaviour cloning (not reinforcement learning) – Variational Autoencoder + Generative Adversarial Network + Long Short Term Memory (VAE + GAN + LSTM)

Sample efficiency: ?

Supporting web page: https://github.com/rrahmati/roboinstruct-2

Robot used: 6-axis Lynxmotion AL5D robot with a two-finger gripper

Virtual environment: None

Year 2017

Data-efficient Deep Reinforcement Learning for Dexterous Manipulation

Authors: Ivaylo Popov, Nicolas Heess, Timothy Lillicrap, Roland Hafner, Gabriel Barth-Maron, Matej Vecerik, Thomas Lampe, Yuval Tassa, Tom Erez, and Martin Riedmiller

Research group: DeepMind

Task: Picking and stacking a lego brick

Algorithm: Deep Deterministic Policy Gradient (DDPG)

Sample efficiency: 200k to 300k

Supporting web page: None

Robot used: None

Virtual environment: MuJoCo (9 dof robotic arm)

Learning Robotic Manipulation of Granular Media

Authors: Connor Schenck, Jonathan Tompson, Dieter Fox, and Sergey Levine

Research group: University of Washington, Google, University of California, Berkeley

Task: Scoop beans in a tray and dump them in another tray

Algorithm: Convolutional Neural Network (not reinforcement learning)

Sample efficiency: ?

Supporting web page: None

Robot used: KUKA LBR IIWA

Virtual environment: ?

Deep Reinforcement Learning for Tensegrity Robot Locomotion

Authors: Connor Schenck, Jonathan Tompson, Dieter Fox, and Sergey Levine

Research group: University of California, Berkeley / OpenAI

Task: Locomotion task

Algorithm: Mirror Descent Guided Policy Search (MDGPS)

Sample efficiency: 200 (2 hours)

Supporting web page: http://rll.berkeley.edu/drl_tensegrity/

Robot used: SUPERball tensegrity robot

Virtual environment: NASA Tensegrity Robotics Toolkit

Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

Authors: Chelsea Finn, Pieter Abbeel, and Sergey Levine

Research group: University of California, Berkeley / OpenAI

Task: Locomotion task

Algorithm: Model-Agnostic Meta-Learning (MAML)

Sample efficiency: ?

Supporting web page: https://sites.google.com/view/maml

Robot used: None

Virtual environment: MuJoCo

Year 2016

Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection

Authors: Sergey Levine, Peter Pastor, Alex Krizhevsky, Deirdre Quillen

Research group: Google / University of California, Berkeley

Task: Grasping (pick and place)

Algorithm: Cross Entropy Method

Sample efficiency: 800k

Supporting web page:
https://sites.google.com/site/brainrobotdata/home
https://ai.googleblog.com/2016/03/deep-learning-for-robots-learning-from.html

Robot used: ?

Virtual environment: None

End-to-End Training of Deep Visuomotor Policies

Authors: Sergey Levine, Chelsea Finn, Trevor Darrell, and Pieter Abbeel

Research group: Google / University of California, Berkeley

Task: inserting a block into a shape sorting cube; screwing a cap onto a bottle; fitting the claw of a toy hammer under a nail with various
grasps; placing a coat hanger on a rack

Algorithm: Guided Policy Search

Sample efficiency: 156 to 288 (3 to 4 hours)

Supporting web page:
https://sites.google.com/site/visuomotorpolicy/

Robot used: PR2

Virtual environment: ?

Deep Visual Foresight for Planning Robot Motion

Authors: Chelsea Finn and Sergey Levine

Research group: Google Brain / University of California, Berkeley

Task: Pushing objects

Algorithm: Model Predictive Control

Sample efficiency: 50,000

Supporting web page:
https://sites.google.com/site/robotforesight/

Robot used: 7 Dof arm

Virtual environment: None

Optimal Control with Learned Local Models : Application to Dexterous Manipulation

Authors: Vikash Kumar, Emanuel Todorov, and Sergey Levine

Research group: University of California, Berkeley

Task: Object manipulation (rotation)

Algorithm: Linear Gaussian controller + linear quadratic regulator (LQR)

Sample efficiency: ?

Supporting web page:
https://bair.berkeley.edu/blog/2018/08/31/dexterous-manip/

Robot used: pneumatically-actuated tendon-driven
24-DoF hand

Virtual environment: None

Source

Learning dexterous manipulation for a soft robotic hand from human demonstrations

Authors: Abhishek Gupta, Clemens Eppner, Sergey Levine, and Pieter Abbeel

Research group: University of California, Berkeley

Task: Turning a valve, pushing beads on an abacus and grasping a bottle from a table

Algorithm: Guided Policy Search

Sample efficiency: ?

Supporting web page:
https://bair.berkeley.edu/blog/2018/08/31/dexterous-manip/

Robot used: RBO Hand 2

Virtual environment: None

Deep Reinforcement Learning for Robotic Manipulation with Asynchronous Off-Policy Updates

Authors: Shixiang Gu, Ethan Holly, Timothy Lillicrap, and Sergey Levine

Research group: Google Brain / University of California, Berkeley / University of Cambridge / MPI Tubingen / Google DeepMind

Task: reaching, door pushing and pulling, pick up a stick suspending in the air by a string and place it near the target upward in the space

Algorithm: Asynchronous Normalised Advantage Estimation (NAF)

Sample efficiency: 20 trials

Supporting web page:
https://sites.google.com/site/deeproboticmanipulation/

Robot used: 6-DoF Kinova JACO arm

Virtual environment: MuJoCo

Deep Object-Centric Representations for Generalizable Robot Learning

Authors: Coline Devin, Pieter Abbeel, Trevor Darrell, and Sergey Levine

Research group: University of California, Berkeley / OpenAI

Task: pouring into mugs, sweeping fruits in a dustpan

Algorithm: Guided Policy Search

Sample efficiency: ?

Supporting web page:
https://sites.google.com/berkeley.edu/object-representations

Robot used: PR2

Virtual environment: ?

Year 2015

Learning Contact-Rich Manipulation Skills with Guided Policy Search

Authors: Sergey Levine, Nolan Wagener, and Pieter Abbeel

Research group: Google / University of California, Berkeley

Task: stacking large lego blocks; threading wooden rings onto a tight-fitting peg; assembling a toy airplane by inserting the wheels into a slot; inserting a shoe tree into a shoe; screwing caps onto bottles

Algorithm: Guided Policy Search

Sample efficiency: 20 to 25 (10 minutes)

Supporting web page:
http://rll.berkeley.edu/icra2015gps/index.htm

Robot used: PR2

Virtual environment: ?

Year 2011

Learning to Control a Low-Cost Manipulator Using Data-Efficient Reinforcement Learning

Authors: Marc Peter Deisenroth, Carl Edward Rasmussen, and Dieter Fox

Research group: University of Washington / University of Cambridge

Task: stacking cubes and collision avoidance

Algorithm: PILCO: Gaussian process dynamics model (model-based)

Sample efficiency: 30

Supporting web page: None

Robot used: off-the-shelf robotic manipulator ($370)

Virtual environment: ?

Policy search for learning robot control using sparse data

Authors: B. Bischoff, D. Nguyen-Tuong, H. Van Hoof, A. McHutchon, C. E. Rasmussen, A. Knoll, J. Peters, and M. P. Deisenroth

Research group: Bosch Corporate Research, TU Darmstadt, Univ. of Cambridge, TU München, Max Planck Institute for Intelligent Systems, Imperial College London

Task: stacking cubes and collision avoidance

Algorithm: PILCO: Gaussian process dynamics model (model-based)

Sample efficiency: ?

Supporting web page: None

Robot used: Festo Robotino XT

Virtual environment: ?

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

Authors: Marc Peter Deisenroth and Carl Edward Rasmussen

Research group: University of Washington / University of Cambridge

Task: Cart-pole balancing (1 DOF)

Algorithm: PILCO: Gaussian process dynamics model (model-based)

Sample efficiency: 7 (17 s)

Supporting web page: None

Robot used: Cart-Pole swing-up

Virtual environment: ?

Leave a Reply

Your email address will not be published. Required fields are marked *