Part 3 – Reinforcement learning for robotics applications

Reinforcement learning is an extremely active research field. In this article, I will review the some of the latest research publications in the field of reinforcement learning for robotics applications. Do you want more good news? Most of these publications can be found in open access!

Year 2019

Simultaneously Learning Vision and Feature-based Control Policies for Real-world Ball-in-a-Cup

Authors: Devin Schwab, Tobias Springenberg, Murilo F. Martins, Thomas Lampe, Michael Neunert, Abbas Abdolmaleki, Tim Hertweck, Roland Hafner, Francesco Nori, and Martin Riedmiller

Research group: Google DeepMind

Task: Swing a tennis ball attached by a string and throw it in a cup.

Algorithm: Extension of Scheduled Auxiliary Control (SAC-X)

Sample efficiency: 8636 episodes (72h)

Supporting web page: https://sites.google.com/view/rss-2019-sawyer-bic/

Robot used: 7-Dof Sawyer robot arm

Virtual environment: MuJoCo

Regularized Hierarchical Policies for Compositional Transfer in Robotics

Authors: Markus Wulfmeier, Abbas Abdolmaleki, Roland Hafner, Jost Tobias Springenberg, Michael Neunert, Tim Hertweck, Thomas Lampe, Noah Siegel, Nicolas Heess, and Martin Riedmiller

Research group: Google DeepMind

Task: Stacking blocks on top of each other, cleaning the scene (i.e. placing cubes inside a box with a closed lid.

Algorithm: Regularized Hierarchical Policy Optimization (RHPO)

Sample efficiency: 5,000 to 80,000 episodes (depending on the task)

Supporting web page: https://sites.google.com/view/rhpo/

Robot used: Kinova Jaco + Rethink Robotics Sawyer robot

Virtual environment: MuJoCo

Source
Source
Source
Source

End-to-End Robotic Reinforcement Learning without Reward Engineering

Authors: Avi Singh, Larry Yang, Kristian Hartikainen, Chelsea Finn, and Sergey Levine

Research group: University of California, Berkeley

Task: Push a mug onto a coaster, place a cloth over a box, insert a book in an empty slot on a bookshelf

Algorithm: VICE-RAQ (Variational Inverse Control with Events – Reinforcement learning with Active Queries)

Sample efficiency: 500k timesteps

Supporting web page: https://sites.google.com/view/reward-learning-rl/home
https://bair.berkeley.edu/blog/2019/05/28/end-to-end/

Robot used: Rethink Sawyer

Virtual environment: MuJoCo

Source
Source
Source

Learning Latent Plans from Play

Authors: Corey Lynch, Mohi Khansari, Ted Xiao, Vikash Kumar, Jonathan Tompson, Sergey Levine, and Pierre Sermanet

Research group: Google Brain

Task: Grasping, open and close sliding or drawer, knock object, push button, sweep

Algorithm: Play-LMP (play-supervised Latent Motor Plans) (not RL)

Sample efficiency: ?

Supporting web page: https://learning-from-play.github.io/

Robot used: NA

Virtual environment: MuJoCo HAPTIX

Source

Robust Recovery Controller for a Quadrupedal Robot using Deep Reinforcement Learning

Authors: Jemin Hwangbo, Joonho Lee, Alexey Dosovitskiy, Dario Bellicoso, Vassilios Tsounis, Vladlen Koltun, and Marco Hutter

Research group: ETH Zurich / Intel

Task: Recovery maneuver from a fall position (standing-up and walk) / locomotion task

Algorithm: Trust Region Policy Optimization algorithm (TRPO) + Generalized Advantage Estimator (GAE)

Sample efficiency: 1500 to 2400

Supporting web page: None

Robot used: quadrupedal robot ANYmal (dog-sized quadrupedal system with 12 degrees of freedom)

Virtual environment: Robotic Artificial Intelligence (RAI) : software framework developped by ETH Zurich (https://bitbucket.org/leggedrobotics/rai/src/master/)

Learning agile and dynamic motor skills for legged robots

Authors: Joonho Lee, Jemin Hwangbo, and Marco Hutter

Research group: ETH Zurich

Task: Recovery maneuver from a fall position (standing-up and walk)

Algorithm: Trust Region Policy Optimization algorithm (TRPO) + Generalized Advantage Estimator (GAE)

Sample efficiency: ?

Supporting web page: None

Robot used: quadrupedal robot ANYmal (dog-sized quadrupedal system with 12 degrees of freedom)

Virtual environment: Robotic Artificial Intelligence (RAI) : software framework developped by ETH Zurich (https://bitbucket.org/leggedrobotics/rai/src/master/)

Source

Data-efficient Learning of Morphology and Controller for a Microrobot

Authors: Thomas Liao, Grant Wang, Brian Yang, Rene Lee, Kristofer Pister, Sergey Levine, and Roberto Calandra

Research group: University of California, Berkeley / Facebook AI research

Task: Locomotion

Algorithm: Hierarchical Process Constrained Batch Bayesian Optimization (HPC-BBO) – Bayesian optimization and Gaussian Process

Sample efficiency: 25

Supporting web page: https://sites.google.com/view/learning-robot-morphology

Robot used: 6 legged microbot

Virtual environment: V-REP

Manipulation by Feel: Touch-Based Control with Deep Predictive Models

Authors: Stephen Tian, Frederik Ebert, Dinesh Jayaraman, Mayur Mudigonda, Chelsea Finn, Roberto Calandra, and Sergey Levine

Research group: Google / University of California, Berkeley

Task: Ball re-positioning, analog stick deflection, rolling a 20-sided dice

Algorithm: Model Predictive Control (Model-based RL)

Sample efficiency: ?

Supporting web page:
https://sites.google.com/view/deeptactilempc

Robot used: 3-axis CNC machine

Virtual environment: ?

Soft Actor-Critic Algorithms and Applications

Authors: Tuomas Haarnoja, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar, Henry Zhu, Abhishek Gupta, Pieter Abbeel, and Sergey Levine

Research group: Google Brain / University of California, Berkeley

Task: Walking and rotating a valve

Algorithm: Soft Actor Critic (off-policy)

Sample efficiency: 160k to 300k

Supporting web page:
https://sites.google.com/view/sac-and-applications/

Robot used: Minitaur robot / Dynamixel Claw

Virtual environment: ?

Self-Supervised Exploration via Disagreement

Authors: Deepak Pathak, Dhiraj Gandhi, and Abhinav Gupta

Research group: University of California, Berkeley / Facebook AI Research

Task: Exploration / manipulation

Algorithm: Self supervised exploration (not RL)

Sample efficiency: NA

Supporting web page: https://pathak22.github.io/exploration-by-disagreement/

Robot used: 7 DoF Sawyer arm

Virtual environment: MuJoCo / Unity

Hierarchical Policy Design for Sample-Efficient Learning of Robot Table Tennis Through Self-Play

Authors: Reza Mahjourian, Risto Miikkulainen, Nevena Lazic, Sergey Levine, and Navdeep Jaitly.

Research group: Google Brain / University of Texas, Austin

Task: Playing table tennis

Algorithm: Proximal Policy Optimization (PPO) and Augmented Random Search (ARS)

Sample efficiency: 24k

Supporting web page:
https://sites.google.com/view/robottabletennis

Robot used: None

Virtual environment: ?

Risk Averse Robust Adversarial Reinforcement Learning

Authors: Xinlei Pan, Daniel Seita, Yang Gao, and John Canny

Research group: University of California, Berkeley

Task: Autonomous driving

Algorithm: Risk Adverse Robust Adversarial Learning (RARARL) + Ensemble DQN (BSDQN)

Sample efficiency: ?

Supporting web page: https://sites.google.com/view/rararl

Robot used: None

Virtual environment: TORCS (The Open Racing Car Simulator)

Source

REPLAB: A Reproducible Low-Cost Arm Benchmark Platform for Robotic Learning

Authors: Brian Yang, Jesse Zhang, Vitchyr Pong, Sergey Levine, and Dinesh Jayaraman

Research group: University of California, Berkeley

Task: Grasping (pick and place)

Algorithm: Twin Delayed Deep Deterministic policy gradient (TD3) (model-free)

Sample efficiency: N/A

Supporting web page:
https://sites.google.com/view/replab/

Robot used: REPLAB

Virtual environment: N/A

PyRobot: An Open-source Robotics Framework for Research and Benchmarking

Authors: Adithyavairavan Murali, Tao Chen, Kalyan Vasudev Alwala, Dhiraj Gandhi, Lerrel Pinto, Saurabh Gupta, and Abhinav Gupta

Research group: Carnegie Mellon University / Facebook AI Research

Task: Manipulation and navigation

Algorithm: Robotic benchmark platform

Sample efficiency: N/A

Supporting web page: https://www.pyrobot.org/

Robot used: LoCoBot and Sawyer

Virtual environment: N/A

TossingBot: Learning to Throw Arbitrary Objects with Residual Physics

Authors: Andy Zeng, Shuran Song, Johnny Lee, Alberto Rodriguez, and Thomas Funkhouser

Research group: Princeton University / Google / Columbia University / Massachusetts Institute of Technology

Task: Grasping and toss objects into boxes

Algorithm: Deep Neural Networks

Sample efficiency: 10k (14 hours)

Supporting web page: https://tossingbot.cs.princeton.edu/

Robot used: UR5 Universal Robot + RG2 gripper

Virtual environment: PyBullet

DensePhysNet: Learning Dense Physical Object Representations via Multi-step Dynamic Interactions

Authors: Zhenjia Xu, Jiajun Wu, Andy Zeng, Joshua B. Tenenbaum, and Shuran Song

Research group: Shanghai Jiao Tong University / Massachusetts Institute of Technology / Princeton University / Google MIT Center for Brains, Minds and Machines / Columbia University

Task: Sliding and colliding objects

Algorithm: DensePhysNet

Sample efficiency: 8k

Supporting web page: http://www.zhenjiaxu.com/DensePhysNet/

Robot used: UR5 Universal Robot + RG2 gripper + Intel RealSense D415 camera

Virtual environment: PyBullet

Learning ambidextrous robot grasping policies

Authors: Jeffrey Mahler, Michael Danielczuk, Bill DeRose, Vishal Satish, Ken Goldberg, Matthew Matl, and Stephen McKinley

Research group: University of California, Berkeley

Task: Ambidextrous grasping

Algorithm: Grasp Quality Convolutional Neural Networks (GQ-CNNs) (not RL)

Sample efficiency: ?

Supporting web page: http://berkeleyautomation.github.io/dex-net/#dexnet_4

Robot used: ABB YuMi

Virtual environment: N/A

Year 2018

QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation

Authors: Dmitry Kalashnikov, Alex Irpan, Peter Pastor, Julian Ibarz, Alexander Herzog, Eric Jang, Deirdre Quillen, Ethan Holly, Mrinal Kalakrishnan, Vincent Vanhoucke, and Sergey Levine

Research group: Google / University of California, Berkeley

Task: Grasping (pick and place)

Algorithm: QT-opt (off-policy) + Cross Entropy Method

Sample efficiency: 580k

Supporting web page:
https://sites.google.com/view/end2endgrasping
https://ai.googleblog.com/2018/06/scalable-deep-reinforcement-learning.html

Robot used: Kuka LBR IIWA

Virtual environment: Bullet Physics simulator

Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations

Authors: Aravind Rajeswaran, Vikash Kumar, Abhishek Gupta, Giulia Vezzani, John Schulman, Emanuel Todorov, and
Sergey Levine

Research group: University of Washington Seattle, OpenAI, University of California Berkeley

Task: Dexterous tasks (Object relocation, in-hand manipulation, door opening, nail and hammer)

Algorithm: Demo Augmented Policy Gradient (DAPG)

Sample efficiency: 3 to 6 hours

Supporting web page:
https://sites.google.com/view/deeprl-dexterous-manipulation

Robot used: None

Virtual environment: MuJoCo physics simulator (24 degree of freedom ADROIT hand)

Learning Dexterous In-Hand Manipulation

Authors: Marcin Andrychowicz, Bowen Baker, Maciek Chociej, Rafal Jozefowicz, Bob McGrew, Jakub Pachocki, Arthur Petron, Matthias Plappert, Glenn Powell, Alex Ray, Jonas Schneider, Szymon Sidor, Josh Tobin, Peter Welinder, Lilian Weng, and Wojciech Zaremba

Research group: OpenAI

Task: Rotating objects

Algorithm: Proximal Policy Optimization (PPO) + Generalized Advantage Estimator (GAE) + domain randomization

Sample efficiency: 5 hours of training

Supporting web page: https://openai.com/blog/learning-dexterity/

Robot used: 24 DOF Shadow Dexterous Hand

Virtual environment: MuJoCo physics engine + Unity renderer

Sim-to-Real via Sim-to-Sim: Data-efficient Robotic Grasping via Randomized-to-Canonical Adaptation

Authors: Stephen James, Paul Wohlhart, Mrinal Kalakrishnan, Dmitry Kalashnikov, Alex Irpan, Julian Ibarz, Sergey Levine, Raia Hadsell, and Konstantinos Bousmali

Research group: Imperial College London / X Developments / Google Brain / DeepMind / University of California, Berkeley

Task: Grasping objects unseen during training

Algorithm: Randomized-to-Canonical Adaptation Networks (RCANs) + QT-Opt

Sample efficiency: 5,000 grasps in the real-world

Supporting web page: https://sites.google.com/view/rcan/

Robot used: Kuka IIWA

Virtual environment: PyBullet

Source
Source

Sim-to-Real Reinforcement Learning for Deformable Object Manipulation

Authors: Jan Matas, Stephen James, and Andrew J. Davison

Research group: Imperial College London

Task: Diagonal folding of a cloth, drape a cloth over a hanger, fold a cloth up to a designed mark

Algorithm: DDPG with various extensions

Sample efficiency: 150 epoch

Supporting web page: https://sites.google.com/view/sim-to-real-deformable

Robot used: Kinova Mico 7DOF robotic arm

Virtual environment: PyBullet

Learning to Grasp Without Seeing

Authors: Adithyavairavan Murali, Yin Li, Dhiraj Gandhi, and Abhinav Gupta

Research group: Carnegie Mellon University

Task: Grasping

Algorithm: Recurrent Conditional AutoEncoder (not RL)

Sample efficiency: 28k grasps

Supporting web page: http://www.cs.cmu.edu/afs/cs/user/amurali/www/projects/GraspingWithoutSeeing/

Robot used: Fetch robot + Robotiq gripper

Virtual environment: None ?

Robot Learning in Homes: Improving Generalization and Reducing Dataset Bias

Authors: Abhinav Gupta, Adithyavairavan Murali, Dhiraj Gandhi, and Lerrel Pinto.

Research group: Carnegie Mellon University

Task: Grasping

Algorithm: ?

Sample efficiency: 28k grasps

Supporting web page: None

Robot used: Dobot Magician robotic arm + Kobuki mobile base + Intel R200 RGBD camera

Virtual environment: None ?

Source

Sim-to-Real: Learning Agile Locomotion For Quadruped Robots

Authors: Jie Tan, Tingnan Zhang, Erwin Coumans, Atil Iscen, Yunfei Bai, Danijar Hafner, Steven Bohez, and Vincent Vanhoucke

Research group: Google Brain / Google DeepMind

Task: Walking and running

Algorithm: Proximal Policy Optimization (PPO)

Sample efficiency: 4 million steps

Supporting web page:
https://github.com/bulletphysics/bullet3/tree/master/examples/pybullet/gym/pybullet_envs/minitaur/envs

Robot used: Minitaur (Ghost Robotics)

Virtual environment: PyBullet

Hardware Conditioned Policies for Multi-Robot Transfer Learning

Authors: Tao Chen, Adithyavairavan Murali, and Abhinav Gupta

Research group: Carnegie Mellon University

Task: Peg insertion + reaching

Algorithm: Hardware Conditioned Policies (HCP) + Proximal Policy Optimization (PPO) + Deep deterministic policy gradient (DDPG)

Sample efficiency: ?

Supporting web page: https://sites.google.com/view/robot-transfer-hcp

Robot used: Sawyer + Fetch

Virtual environment: MuJoCo

Learning by Playing – Solving Sparse Reward Tasks from Scratch

Authors: Martin Riedmiller, Roland Hafner, Thomas Lampe, Michael Neunert, Jonas Degrave, Tom Van de Wiele, Volodymyr Mnih, Nicolas Heess, and Jost Tobias Springenberg

Research group: Google DeepMind

Task: Stacking block, cleaning-up objects into a box with a lid

Algorithm: Scheduled Auxiliary Control (SAC-X)

Sample efficiency: 10,000 episodes

Supporting web page: https://deepmind.com/blog/learning-playing/

Robot used: Jaco robot arm

Virtual environment: MuJoCo

Deep reinforcement learning for vision-based robotic grasping: A simulated comparative evaluation of off-policy methods

Authors: Jie Tan, Tingnan Zhang, Erwin Coumans, Atil Iscen, Yunfei Bai, Danijar Hafner, Steven Bohez, and Vincent Vanhoucke

Research group: University of California Berkeley / Google Brain

Task: Grasping

Algorithm: supervised learning, Q-learning, Monte Carlo, Corrected Monte Carlo, DDPG, Path Consistency Learning (PCL)

Sample efficiency: 1 million grasps

Supporting web page:
https://github.com/bulletphysics/bullet3/blob/master/examples/pybullet/gym/pybullet_envs/bullet/kuka_diverse_object_gym_env.py

Robot used: None

Virtual environment: PyBullet

Dexterous Manipulation with Deep Reinforcement Learning: Efficient, General, and Low-Cost

Authors: Henry Zhu, Abhishek Gupta, Aravind Rajeswaran, Sergey Levine, and Vikash Kumar.

Research group: University of Washington Seattle, University of California Berkeley, Google Brain

Task: Dexterous tasks (valve rotation, vertical box flipping, door opening)

Algorithm: Demo Augmented Policy Gradient (DAPG)

Sample efficiency: very low (only a few trials)

Supporting web page:
https://sites.google.com/view/deeprl-handmanipulation

Robot used: D’Claw (9 dof) / Allegro (16 dof)

Virtual environment: None

Setting up a Reinforcement Learning Task with a Real-World Robot

Authors: A. Rupam Mahmood, Dmytro Korenkevych, Brent J. Komer, and James Bergstra

Research group: Kindred Inc

Task: Reaching a target position

Algorithm: Trust Region Policy Optimization (TRPO)

Sample efficiency: 750 min

Supporting web page: None

Robot used: UR5

Virtual environment: None

Residual Reinforcement Learning for Robot Control

Authors: Tobias Johannink, Shikhar Bahl, Ashvin Nair, Jianlan Luo, Avinash Kumar, Matthias Loskyll, Juan Aparicio Ojea, Eugen Solowjow, and Sergey Levine.

Research group: Siemens Corporation, University of Berkeley, California, Hamburg university of technology

Task: Inserting a foam block between 2 other foam blocks

Algorithm: Twin delayed deep deterministic policy gradients (TD3) – model-free

Sample efficiency: 3k

Supporting web page: https://residualrl.github.io/

Robot used: Sawyer robot arm

Virtual environment: MuJoCo

Sample-Efficient Learning of Nonprehensile Manipulation Policies via Physics-Based Informed State Distributions

Authors: Lerrel Pinto, Aditya Mandalika, Brian Hou, and Siddhartha Srinivasa.

Research group: Carnegie Mellon University / University of Washington

Task: Rearrangement manipulation tasks (reach a target object through a clutter of other movable objects)

Algorithm: Learning with Planned Episodic Resets (LeaPER)

Sample efficiency: 5,000 episodes

Supporting web page: https://www.groundai.com/project/sample-efficient-learning-of-nonprehensile-manipulation-policies-via-physics-based-informed-state-distributions/

Robot used: 7-DOF robot manipulator

Virtual environment: MuJoCo

Composable Action-Conditioned Predictors: Flexible Off-Policy Learning for Robot Navigation.

Authors: Gregory Kahn, Adam Villaflor, Pieter Abbeel, and Sergey Levine.

Research group: University of California, Berkeley

Task: Collision avoidance for remote controlled car

Algorithm: Composable Action-Conditioned Predictors (CAPs) – off-policy

Sample efficiency: 11 hours

Supporting web page: https://github.com/gkahn13/CAPs

Robot used: Remote controlled car

Virtual environment: CARLA (robot car simulator)

Self-Supervised Deep Reinforcement Learning with Generalized Computation Graphs for Robot Navigation

Authors: Gregory Kahn, Adam Villaflor, Bosen Ding, Pieter Abbeel, and Sergey Levine

Research group: University of California, Berkeley

Task: Autonomous driving

Algorithm: Generalized Computation Graph – combine model-free and model-based

Sample efficiency: 4 hours

Supporting web page: https://github.com/gkahn13/gcg

Robot used: Remote controlled car

Virtual environment: None

Data-Efficient Hierarchical Reinforcement Learning

Authors: Ofir Nachum, Shixiang Gu, Honglak Lee, and Sergey Levine.

Research group: Google Brain

Task: Maze navigation

Algorithm: HIerarchical Reinforcement learning with Off-policy correction (HIRO)

Sample efficiency: 2 to 4 millions steps

Supporting web page: https://sites.google.com/view/efficient-hrl

Robot used: None

Virtual environment: MuJoCo

Composable Deep Reinforcement Learning for Robotic Manipulation

Authors: Tuomas Haarnoja, Vitchyr Pong, Aurick Zhou, Murtaza Dalal, Pieter Abbeel, and Sergey Levine

Research group: University of California, Berkeley / OpenAI

Task: Pushing objects, Lego stacking, Obstacle avoidance

Algorithm: Soft Q learning

Sample efficiency: 2 hours

Supporting web page: https://sites.google.com/view/composing-real-world-policies/

Robot used: Sawyer robot

Virtual environment: MuJoCo

Vision-based multi-task manipulation for inexpensive robots using end-to-end learning from demonstration

Authors: Rouhollah Rahmatizadeh, Pooya Abolghasemi, Ladislau Boloni, and Sergey Levine

Research group: University of California, Berkeley / University of Central Florida

Task: Picking and pushing objects

Algorithm: Behaviour cloning (not reinforcement learning) – Variational Autoencoder + Generative Adversarial Network + Long Short Term Memory (VAE + GAN + LSTM)

Sample efficiency: ?

Supporting web page: https://github.com/rrahmati/roboinstruct-2

Robot used: 6-axis Lynxmotion AL5D robot with a two-finger gripper

Virtual environment: None

Learning Synergies between Pushing and Grasping with Self-Supervised Deep Reinforcement Learning

Authors: Andy Zeng, Shuran Song, Stefan Welker, Johnny Lee, Alberto Rodriguez, and Thomas Funkhouser

Research group: Princeton university / Google / Massachusetts Institute of Technology

Task: Pushing and grasping objects

Algorithm: Deep Q Network + CNN

Sample efficiency: 2.5k

Supporting web page: http://vpg.cs.princeton.edu/

Robot used: UR5 robot arm with an RG2 gripper

Virtual environment: V-REP

Source

Rearrangement with Nonprehensile Manipulation Using Deep Reinforcement Learning

Authors: Weihao Yuan, Johannes A. Stork, Danica Kragic, Michael Y. Wang, and Kaiyu Hang

Research group: Hong Kong University of Science and Technology and HKUST Robotics Institute / KTH Royal institute of Technology

Task: Pushing an object and avoiding collisions with obstacles + moving target

Algorithm: DQN

Sample efficiency: 1k

Supporting web page: https://www.semanticscholar.org/paper/Rearrangement-with-Nonprehensile-Manipulation-Using-Yuan-Stork/52f665301aa1e217d3069a5c97a84933be6ef178

Robot used: Baxter robot arm

Virtual environment: Gazebo

Year 2017

Using simulation and domain adaptation to improve efficiency of deep robotic grasping

Authors: Konstantinos Bousmalis, Alex Irpan, Paul Wohlhart, Yunfei Bai, Matthew Kelcey, Mrinal Kalakrishnan, Laura Downs, Julian Ibarz, Peter Pastor, Kurt Konolige, Sergey Levine, and Vincent Vanhoucke

Research group: Google Brain / X Development

Task: Grasping object unseen during training

Algorithm: GraspGAN (not reinforcement learning)

Sample efficiency: 25,000 grasps

Supporting web page: https://sites.google.com/view/graspgan
https://ai.googleblog.com/2017/10/closing-simulation-to-reality-gap-for.html

Robot used: 6 Kuka IIWA robot arms

Virtual environment: PyBullet

Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards

Authors: Mel Vecerik, Todd Hester, Jonathan Scholz, Fumin Wang, Olivier Pietquin, Bilal Piot, Nicolas Heess, Thomas Rothörl, Thomas Lampe, and Martin Riedmiller.

Research group: Google DeepMind

Task: Insertion task

Algorithm: Deep Deterministic Policy Gradient + demonstration

Sample efficiency: ?

Supporting web page: None

Robot used: Sawyer

Virtual environment: MuJoCo

Transferring End-to-End Visuomotor Control from Simulation to Real World for a Multi-Stage Task

Authors: Stephen James, Andrew J. Davison, and Edward Johns

Research group: Imperial College London / Dyson Robotics Lab

Task: Grasping a cube and dropping it into a basket

Algorithm: CNN + LSTM (not reinforcement learning)

Sample efficiency: 1 million images (training only in simulation, immediately used in the real-world).

Supporting web page: None

Robot used: Kinova Jaco (?)

Virtual environment: V-REP

Domain Randomization for Transferring Deep Neural Networks from
Simulation to the Real World

Authors: Josh Tobin, Rachel Fong, Alex Ray, Jonas Schneider, Wojciech Zaremba, and Pieter Abbeel

Research group: OpenAI / University of California, Berkeley

Task: Detecting object’s location and grasping

Algorithm: CNN (VGG-16)

Sample efficiency: 5,000 training samples

Supporting web page: https://sites.google.com/view/domainrandomization/

Robot used: Fetch robot arm

Virtual environment: MuJoCo

Time-Contrastive Networks: Self-Supervised Learning from Video

Authors: Pierre Sermanet, Corey Lynch, Yevgen Chebotar, Jasmine Hsu, Eric Jang, Stefan Schaal, Sergey Levine

Research group: Google Brain / University of Southern California

Task: Pouring, dish placement, human pose imitation

Algorithm: PILQR algorithm

Sample efficiency: ?

Supporting web page: https://sermanet.github.io/imitate/

Robot used: 7 Dof Kuka IIWA

Virtual environment: PyBullet

Data-efficient Deep Reinforcement Learning for Dexterous Manipulation

Authors: Ivaylo Popov, Nicolas Heess, Timothy Lillicrap, Roland Hafner, Gabriel Barth-Maron, Matej Vecerik, Thomas Lampe, Yuval Tassa, Tom Erez, and Martin Riedmiller

Research group: DeepMind

Task: Picking and stacking a lego brick

Algorithm: Deep Deterministic Policy Gradient (DDPG)

Sample efficiency: 200k to 300k

Supporting web page: None

Robot used: None

Virtual environment: MuJoCo (9 dof robotic arm)

Asymmetric Actor Critic for Image-Based Robot Learning

Authors: Lerrel Pinto, Marcin Andrychowicz, Peter Welinder, Wojciech Zaremba, and Pieter Abbeel

Research group: OpenAI / Carnegie Mellon University

Task: Picking, forward pushing, moving a block

Algorithm: Asymmetric Hindsight Experience Replay (HER)

Sample efficiency: 9,000 to 50,000 (depending on the task)

Supporting web page: http://www.cs.cmu.edu/~lerrelp/asym_ac.html

Robot used: 7-DOF Fetch robot arm

Virtual environment: MuJoCo

CASSL: Curriculum Accelerated Self-Supervised Learning

Authors: Adithyavairavan Murali, Lerrel Pinto, Dhiraj Gandhi, and Abhinav Gupta

Research group: Carnegie Mellon University

Task: Grasping

Algorithm: Curriculum Accelerated Self-Supervised Learning (CASSL)

Sample efficiency: ?

Supporting web page: None

Robot used: Fetch robot + adaptive 3-fingered gripper from Robotiq

Virtual environment: ?

Learning Robotic Manipulation of Granular Media

Authors: Connor Schenck, Jonathan Tompson, Dieter Fox, and Sergey Levine

Research group: University of Washington, Google, University of California, Berkeley

Task: Scoop beans in a tray and dump them in another tray

Algorithm: Convolutional Neural Network (not reinforcement learning)

Sample efficiency: ?

Supporting web page: None

Robot used: KUKA LBR IIWA

Virtual environment: ?

Learning Invariant Feature Spaces to Transfer Skills with Reinforcement Learning

Authors: Abhishek Gupta, Coline Devin, YuXuan Liu, Pieter Abbeel, and Sergey Levine

Research group: University of California, Berkeley / OpenAI

Task: Pulling a block in the direction indicated, button pushing, inserting a peg

Algorithm: Guided Policy Search with skills transfer

Sample efficiency: ?

Supporting web page: https://sites.google.com/site/invariantfeaturetransfer/

Robot used: None

Virtual environment: MuJoCo

Control of a Quadrotor with Reinforcement Learning

Authors: J. Hwangbo, Inkyu Sa, Roland Siegwart, and Marco Hutter

Research group: ETH Zurich

Task: Fly a quadrotor to a target position

Algorithm: Policy network + value network

Sample efficiency: 2,150 iterations

Supporting web page: None

Robot used: Humming-bird quadrotor from Ascending Technologies

Virtual environment: Robotic Artificial Intelligence (RAI) – software framework developped by ETH Zurich

Learning to fly by crashing

Authors: Dhiraj Gandhi, Lerrel Pinto, and Abhinav Gupta

Research group: Carnegie Mellon University

Task: Fly a quadrotor and avoid obstacles

Algorithm: AlexNet (CNN)

Sample efficiency: 11k crashes

Supporting web page: None

Robot used: ? quadrotor

Virtual environment: None

Deep Reinforcement Learning for Tensegrity Robot Locomotion

Authors: Connor Schenck, Jonathan Tompson, Dieter Fox, and Sergey Levine

Research group: University of California, Berkeley / OpenAI

Task: Locomotion task

Algorithm: Mirror Descent Guided Policy Search (MDGPS)

Sample efficiency: 200 (2 hours)

Supporting web page: http://rll.berkeley.edu/drl_tensegrity/

Robot used: SUPERball tensegrity robot

Virtual environment: NASA Tensegrity Robotics Toolkit

Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

Authors: Chelsea Finn, Pieter Abbeel, and Sergey Levine

Research group: University of California, Berkeley / OpenAI

Task: Locomotion task

Algorithm: Model-Agnostic Meta-Learning (MAML)

Sample efficiency: 500k time steps (depending on the task)

Supporting web page: https://sites.google.com/view/maml

Robot used: None

Virtual environment: MuJoCo

Hindsight Experience Replay

Authors: Marcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, Pieter Abbeel, and Wojciech Zaremba

Research group: OpenAI

Task: Pushing, sliding, pick-and-place

Algorithm: Hindsight Experience Replay (HER) + DDPG

Sample efficiency: 200 episodes

Supporting web page: https://sites.google.com/site/hindsightexperiencereplay/

Robot used: 7-DOF Fetch Robotics arm

Virtual environment: OpenAI Gym + MuJoCo

Proximal Policy Optimization Algorithms

Authors: John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov

Research group: OpenAI

Task: Locomotion task

Algorithm: Proximal Policy Optimization (PPO)

Sample efficiency: ?

Supporting web page: https://openai.com/blog/openai-baselines-ppo/

Robot used: None

Virtual environment: OpenAI Gym + MuJoCo

Source

Action Branching Architectures for Deep Reinforcement Learning

Authors: Arash Tavakoli, Fabio Pardo, and Petar Kormushev

Research group: Imperial College London

Task: Reaching (2 to 6 DOF) and locomotion tasks

Algorithm: Branching Dueling Q-Network (BDQ)

Sample efficiency: 5k to 15k (depending on the task)

Supporting web page:
https://github.com/atavakol/action-branching-agents

Robot used: None

Virtual environment: OpenAI Gym + MuJoCo

Source 2 DOF
Source 3 DOF
Source 4 DOF
Source 5 DOF
Source 6 DOF

Learning Deep Policies for Robot Bin Picking by Simulating Robust Grasping Sequences

Authors: Jeffrey Mahler and Ken Goldberg

Research group: University of California, Berkeley

Task: Grasping

Algorithm: Dex-Net (Dexterity network): Grasp Quality Convolutional Neural Network (GQ-CNN)

Sample efficiency: ?

Supporting web page: https://berkeleyautomation.github.io/dex-net/#dexnet_21

Robot used: ABB YuMi

Virtual environment: ?

Source

Dex-Net 3.0: Computing Robust Vacuum Suction Grasp Targets in Point Clouds Using a New Analytic Model and Deep Learning

Authors: Jeffrey Mahler, Matthew Matl, Xinyu Liu, Albert Li, David Gealy, and Ken Goldberg

Research group: University of California, Berkeley

Task: Suction grasping

Algorithm: Dex-Net (Dexterity network): Grasp Quality Convolutional Neural Network (GQ-CNN)

Sample efficiency: 1.5k grasps

Supporting web page: https://berkeleyautomation.github.io/dex-net/#dexnet_3

Robot used: ABB YuMi fitted with a pneumatic suction gripper

Virtual environment: ?

Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics

Authors: Jeffrey Mahler, Matthew Matl, Xinyu Liu, Albert Li, David Gealy, and Ken Goldberg

Research group: University of California, Berkeley

Task: Grasping

Algorithm: Dex-Net (Dexterity network): Grasp Quality Convolutional Neural Network (GQ-CNN)

Sample efficiency: 1k grasps

Supporting web page: https://berkeleyautomation.github.io/dex-net/#dexnet_2

Robot used: ABB YuMi fitted with a pneumatic suction gripper

Virtual environment: ?

Robotic pick-and-place of novel objects in clutter with multi-affordance grasping and cross-domain image matching

Authors: Andy Zeng, Shuran Song, Kuan Ting Yu, Elliott Donlon, Francois R. Hogan, Maria Bauza, Daolin Ma, Orion Taylor, Melody Liu, Eudald Romo, Nima Fazeli, Ferran Alet, Nikhil Chavan Dafle, Rachel Holladay, Isabella
Morena, Prem Qu Nair, Druck Green, Ian Taylor, Weber Liu, Thomas Funkhouser, and Alberto Rodriguez.

Research group: Princeton University / Massachusetts Institute of Technology

Task: Grasp, recognize and place objects

Algorithm: CNN (not RL)

Sample efficiency: ?

Supporting web page: https://vision.princeton.edu/projects/2017/arc/

Robot used: 6DOF ABB IRB 1600id robot arm

Virtual environment: None

PRM-RL: Long-range robotic navigation tasks by combining reinforcement learning and sampling- based planning

Authors: Aleksandra Faust, Kenneth Oslund, Oscar Ramirez, Anthony Francis, Lydia Tapia, Marek Fiser, and James Davidson

Research group: Google Brain / University of New Mexico

Task: Indoor navigation and aerial cargo delivery

Algorithm: Probabilistic Roadmaps – Reinforcement Learning (PRM-RL)

Sample efficiency: ?

Supporting web page: https://vision.princeton.edu/projects/2017/arc/

Robot used: Fetch robot + quadrotor UAV

Virtual environment: ?

Combining Model-Based and Model-Free Updates for Trajectory-Centric Reinforcement Learning

Authors: Yevgen Chebotar, Karol Hausman, Marvin Zhang, Gaurav Sukhatme, Stefan Schaal, and Sergey Levine

Research group: University of Southern California / Max Plank Institute for Intelligent Systems / University of California, Berkeley

Task: Push object to a target position / door opening / plugging into a power socket / play hockey

Algorithm: PILQR algorithm (model based + model free)

Sample efficiency: 200 to 1,000 samples depending on the task

Supporting web page: https://sites.google.com/site/icml17pilqr/

Robot used: PR2

Virtual environment: ?

Year 2016

Dex-Net 1.0: A Cloud-Based Network of 3D Objects for Robust Grasp Planning Using a Multi-Armed Bandit Model with Correlated Rewards

Authors: Jeffrey Mahler, Matthew Matl, Xinyu Liu, Albert Li, David Gealy, and Ken Goldberg

Research group: University of California, Berkeley / Google

Task: Grasping

Algorithm: Multi-View Convolutional Neural Networks (MV-CNNs) (not RL)

Sample efficiency: ?

Supporting web page: https://berkeleyautomation.github.io/dex-net/#dexnet_1

Robot used: ?

Virtual environment: ?

Source

Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection

Authors: Sergey Levine, Peter Pastor, Alex Krizhevsky, Deirdre Quillen

Research group: Google / University of California, Berkeley

Task: Grasping (pick and place)

Algorithm: Cross Entropy Method

Sample efficiency: 800k

Supporting web page:
https://sites.google.com/site/brainrobotdata/home
https://ai.googleblog.com/2016/03/deep-learning-for-robots-learning-from.html

Robot used: ?

Virtual environment: None

Unsupervised perceptual rewards for imitation learning

Authors: Pierre Sermanet, Kelvin Xu, and Sergey Levine

Research group: Google Brain

Task: Door opening

Algorithm: Maximum Entropy Inverse Reinforcement Learning (MaxEnt IRL)

Sample efficiency: ?

Supporting web page: https://sermanet.github.io/rewards/

Robot used: 7-DoF robotic arm (r3d32) with a two-finger gripper

Virtual environment: NA

Path integral guided policy search

Authors: Yevgen Chebotar, Mrinal Kalakrishnan, Ali Yahya, Adrian Li, Stefan Schaal, and Sergey Levine

Research group: University of Southern California / X / Google Brain

Task: Door opening, pick and place

Algorithm: Policy improvement with path integrals (PI 2) – extension of Guided Policy Search

Sample efficiency: ?

Supporting web page: None

Robot used: 7-DoF robotic arm (r3d12) with a two-finger gripper

Virtual environment: NA

Sim-to-Real Robot Learning from Pixels with Progressive Nets

Authors: Andrei A. Rusu, Mel Vecerik, Thomas Rothörl, Nicolas Heess, Razvan Pascanu, and Raia Hadsell

Research group: Google DeepMind

Task: Conveyor task (grasping?)

Algorithm: A3C + LSTM

Sample efficiency: 60,000 steps (4 hours)

Supporting web page:
https://sites.google.com/site/visuomotorpolicy/

Robot used: Jaco robot arm

Virtual environment: MuJoCo

Source

Learning to push by grasping: Using multiple tasks for effective learning

Authors: Lerrel Pinto and Abhinav Gupta

Research group: Carnegie Mellon University

Task: pushing and grasping

Algorithm: CNN (not reinforcement learning ?)

Sample efficiency: 2,500 grasps

Supporting web page: None

Robot used: Baxter robot

Virtual environment: Graspit simulator

Source

Supervision via competition: Robot adversaries for learning tasks

Authors: Lerrel Pinto, James Davidson, and Abhinav Gupta

Research group: Carnegie Mellon University / Google Brain / Google Research

Task: Grasping

Algorithm: Adversarial learning

Sample efficiency: 56k grasps

Supporting web page: None

Robot used: Sawyer (?)

Virtual environment: ?

End-to-End Training of Deep Visuomotor Policies

Authors: Sergey Levine, Chelsea Finn, Trevor Darrell, and Pieter Abbeel

Research group: Google / University of California, Berkeley

Task: Inserting a block into a shape sorting cube; screwing a cap onto a bottle; fitting the claw of a toy hammer under a nail with various
grasps; placing a coat hanger on a rack

Algorithm: Guided Policy Search

Sample efficiency: 156 to 288 (3 to 4 hours)

Supporting web page:
https://sites.google.com/site/visuomotorpolicy/

Robot used: PR2

Virtual environment: ?

Learning modular neural network policies for multi-task and multi-robot transfer (MODULAR APPROACH)

Authors: Coline Devin, Abhishek Gupta, Trevor Darrell, Pieter Abbeel, and Sergey Levine

Research group: Google / University of California, Berkeley

Task: Opening a drawer, pushing a block, reaching a target, inserting a peg

Algorithm: Bregman Alternating Direction Method of Multipliers (BADMM) variant of the Guided Policy Search (GPS)

Sample efficiency: ?

Supporting web page: https://sites.google.com/site/modularpolicynetworks/

Robot used: None

Virtual environment: MuJoCo (3 to 5 links robot arm)

Unsupervised Learning for Physical Interaction through Video Prediction

Authors: Chelsea Finn, Ian Goodfellow, and Sergey Levine

Research group: University of California, Berkeley / OpenAI / Google Brain

Task: Pushing objects, rotating objects

Algorithm: Convolutional dynamic neural advection (CDNA) + Spatial Transformer Predictors (STP) + Convolutional LSTM (Not reinforcement learning)

Sample efficiency: 57,000 interaction sequences

Supporting web page:
https://sites.google.com/site/robotprediction/

Robot used: 10 7-dof robot arm (R3D31)

Virtual environment: None

Deep Visual Foresight for Planning Robot Motion

Authors: Chelsea Finn and Sergey Levine

Research group: Google Brain / University of California, Berkeley

Task: Pushing objects

Algorithm: Visual MPC (Model Predictive Control) – model-based RL

Sample efficiency: 50,000

Supporting web page:
https://sites.google.com/site/robotforesight/

Robot used: 7 Dof arm

Virtual environment: None

Optimal Control with Learned Local Models : Application to Dexterous Manipulation

Authors: Vikash Kumar, Emanuel Todorov, and Sergey Levine

Research group: University of California, Berkeley

Task: Object manipulation (rotation)

Algorithm: Linear Gaussian controller + linear quadratic regulator (LQR)

Sample efficiency: ?

Supporting web page:
https://bair.berkeley.edu/blog/2018/08/31/dexterous-manip/

Robot used: Pneumatically-actuated tendon-driven 24-DoF hand

Virtual environment: None

Source

Learning dexterous manipulation for a soft robotic hand from human demonstrations

Authors: Abhishek Gupta, Clemens Eppner, Sergey Levine, and Pieter Abbeel

Research group: University of California, Berkeley

Task: Turning a valve, pushing beads on an abacus and grasping a bottle from a table

Algorithm: Guided Policy Search

Sample efficiency: ?

Supporting web page:
https://bair.berkeley.edu/blog/2018/08/31/dexterous-manip/

Robot used: RBO Hand 2

Virtual environment: None

Deep Reinforcement Learning for Robotic Manipulation with Asynchronous Off-Policy Updates

Authors: Shixiang Gu, Ethan Holly, Timothy Lillicrap, and Sergey Levine

Research group: Google Brain / University of California, Berkeley / University of Cambridge / MPI Tubingen / Google DeepMind

Task: reaching, door pushing and pulling, pick up a stick suspending in the air by a string and place it near the target upward in the space

Algorithm: Asynchronous Normalised Advantage Estimation (NAF)

Sample efficiency: 20 trials

Supporting web page:
https://sites.google.com/site/deeproboticmanipulation/

Robot used: 6-DoF Kinova JACO arm

Virtual environment: MuJoCo

Comparing human-centric and robot-centric sampling for robot deep learning from demonstrations

Authors: Michael Laskey, Caleb Chuck, Jonathan Lee, Jeffrey Mahler, Sanjay Krishnan, Kevin Jamieson, Anca Dragan, and Ken Goldberg.

Research group: University of California, Berkeley

Task: Singulation (separate an object from a cluster of other objects)

Algorithm: Dagger (Dataset aggregation) – model-based

Sample efficiency: ?

Supporting web page: https://berkeleyautomation.github.io/lfd_icra2017/

Robot used: 2-DOF Zymark robot

Virtual environment: ?

Deep Object-Centric Representations for Generalizable Robot Learning

Authors: Coline Devin, Pieter Abbeel, Trevor Darrell, and Sergey Levine

Research group: University of California, Berkeley / OpenAI

Task: Pouring liquid into mugs, sweeping fruits in a dustpan

Algorithm: Guided Policy Search

Sample efficiency: ?

Supporting web page:
https://sites.google.com/berkeley.edu/object-representations

Robot used: PR2

Virtual environment: ?

TSC-DL: Unsupervised trajectory segmentation of multi-modal surgical demonstrations with Deep Learning

Authors: Coline Devin, Pieter Abbeel, Trevor Darrell, and Sergey Levine

Research group: University of California, Berkeley / OpenAI

Task: Surgery (suturing and needle passing)

Algorithm: Transition State Clustering with Deep Learning (not RL)

Sample efficiency: ?

Supporting web page: http://berkeleyautomation.github.io/tsc-dl/

Robot used: PR2

Virtual environment: ?

Collective Robot Reinforcement Learning with Distributed Asynchronous Guided Policy Search

Authors: Ali Yahya, Adrian Li, Mrinal Kalakrishnan, Yevgen Chebotar, and Sergey Levine.

Research group: Google Brain / University of Southern California, Los Angeles / X Development

Task: Door opening task

Algorithm: Asynchronous Distributed Guided Policy Search

Sample efficiency: ?

Supporting web page: None

Robot used: R3D

Virtual environment: ?

Continuous Deep Q-Learning with Model-based Acceleration

Authors: Shixiang Gu, Timothy Lillicrap, Ilya Sutskever, and Sergey Levine

Research group: University of Cambridge, Max Plank Institute for Intelligent Systems, Google Brain, Google DeepMind

Task: Manipulation tasks and locomotion tasks

Algorithm: Normalized Advantage Function (NAF): model-based approach

Sample efficiency: 100 to 300 episodes (depending on the task)

Supporting web page: None

Robot used: None

Virtual environment: MuJoCo

Source

Multi-view self-supervised deep learning for 6D pose estimation in the Amazon Picking Challenge

Authors: Andy Zeng, Kuan Ting Yu, Shuran Song, Daniel Suo, Ed Walker, Alberto Rodriguez, and Jianxiong Xiao

Research group: Princeton University / Massachusetts Institute of Technology / Google / AutoX

Task: Amazon Picking Challenge

Algorithm: CNN (not RL)

Sample efficiency: 136,575 RGB-D images of 39 objects

Supporting web page: http://apc.cs.princeton.edu/

Robot used: 6DOF industrial manipulator ABB IRB1600id + RealSense F200 RGB-D Camera

Virtual environment: ?

The curious robot: Learning visual representations via physical interactions

Authors: Lerrel Pinto, Dhiraj Gandhi, Yuanfeng Han, Yong Lae Park, and Abhinav Gupta

Research group: Carnegie Mellon University

Task: Grasping, pushing, poking

Algorithm: CNN (not RL)

Sample efficiency: N/A

Supporting web page: None

Robot used: Baxter robot

Virtual environment: NA

Source

Year 2015

Learning Contact-Rich Manipulation Skills with Guided Policy Search

Authors: Sergey Levine, Nolan Wagener, and Pieter Abbeel

Research group: Google / University of California, Berkeley

Task: stacking large lego blocks; threading wooden rings onto a tight-fitting peg; assembling a toy airplane by inserting the wheels into a slot; inserting a shoe tree into a shoe; screwing caps onto bottles

Algorithm: Guided Policy Search

Sample efficiency: 20 to 25 (10 minutes)

Supporting web page:
http://rll.berkeley.edu/icra2015gps/index.htm

Robot used: PR2

Virtual environment: ?

Adapting Deep Visuomotor Representations with Weak Pairwise Constraints

Authors: Eric Tzeng, Coline Devin, Judy Hoffman, Chelsea Finn, Pieter Abbeel, Sergey Levine, Kate Saenko, and Trevor Darrell

Research group: University of California, Berkeley / Boston University

Task: position a loop of rope over the hook of a supermarket scale.

Algorithm: Guided Policy Search

Sample efficiency: 20 to 25 (10 minutes)

Supporting web page: None

Robot used: PR2

Virtual environment: ?

Source

Learning by observation for surgical subtasks: Multilateral cutting of 3D viscoelastic and 2D Orthotropic Tissue Phantoms.

Authors: Adithyavairavan Murali, Siddarth Sen, Ben Kehoe, Animesh Garg, Seth McFarland, Sachin Patil, W. Douglas Boyd, Susan Lim, Pieter Abbeel, and Ken Goldberg

Research group: University of California, Berkeley / University of California Davis Medical Center; Sacramento

Task: Cutting deformable cancer tissues (surgery), cutting a circular pattern in a deformable sheet

Algorithm: Learning By Observation (LBO)

Sample efficiency: ?

Supporting web page: None

Robot used: Da Vinci Research Kit (DVRK) robotic surgical assistant

Virtual environment: ?

A single-use haptic palpation probe for locating subcutaneous blood vessels in robot-assisted minimally invasive surgery

Authors: Stephen McKinley, Animesh Garg, Siddarth Sen, Rishi Kapadia, Adithyavairavan Murali, Kirk Nichols, Susan Lim, Sachin Patil, Pieter Abbeel, Allison M. Okamura, and Ken Goldberg.

Research group: University of California, Berkeley / Stanford University

Task: Surgical manipulation

Algorithm: ?

Sample efficiency: ?

Supporting web page: http://cal-mr.berkeley.edu/

Robot used: Da Vinci Research Kit (DVRK) robotic surgical assistant

Virtual environment: ?

Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours

Authors: Lerrel Pinto and Abhinav Gupta

Research group: Carnegie Mellon University

Task: Grasping

Algorithm: AlexNet (CNN)

Sample efficiency: 50k grasps (700 hours)

Supporting web page: https://sites.google.com/site/trpopaper/

Robot used: Baxter robot

Virtual environment: None ?

Trust Region Policy Optimization

Authors: John Schulman, Sergey Levine, Philipp Moritz, Michael I. Jordan, and Pieter Abbeel

Research group: University of California, Berkeley

Task: Locomotion task, cart pole, swimmer, hopper, walker

Algorithm: Trust Region Policy Optimization (TRPO)

Sample efficiency: 20 to 300 policy iterations.

Supporting web page: https://sites.google.com/site/trpopaper/

Robot used: None

Virtual environment: MuJoCo

Continuous control with deep reinforcement learning

Authors: Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra

Research group: Google DeepMind

Task: Swing-up task, Reaching task, grasp and move task, puck-hitting task, monoped balancing task, locomotion task.

Algorithm: Deep Deterministic Policy Gradient (DDPG)

Sample efficiency: ?

Supporting web page: None

Robot used: None

Virtual environment: MuJoCo

Efficient reinforcement learning for robots using informative simulated prior

Authors: Mark Cutler and Jonathan P. How

Research group: Massachusetts Institute of Technology

Task: Inverted pendulum balancing

Algorithm: Probabilistic Inference for Learning Control (PILCO)

Sample efficiency: 2 episodes

Supporting web page: None

Robot used: 1D pendulum

Virtual environment: None

Year 2014

Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics

Authors: Sergey Levine and Pieter Abbeel

Research group: University of California, Berkeley

Task: Peg insertion / swimming / walking

Algorithm: Guided Policy Search (GPS)

Sample efficiency: 400 (depending on the task)

Supporting web page: http://rll.berkeley.edu/nips2014gps/

Robot used: None

Virtual environment: ?

Real-time grasp detection using convolutional neural networks

Authors: Joseph Redmon and Anelia Angelova

Research group: University of Washington / Google Research

Task: Grasp detection

Algorithm: CNN (not RL)

Sample efficiency: N/A

Supporting web page: None

Robot used: None

Virtual environment: None

Learning accurate kinematic control of cable-driven surgical robots using data cleaning and Gaussian Process Regression

Authors: Jeffrey Mahler, Sanjay Krishnan, Michael Laskey, Siddarth Sen, Adithyavairavan Murali, Ben Kehoe, Sachin Patil, Jiannan Wang, Mike Franklin, Pieter Abbeel, and Ken Goldberg

Research group: University of California, Berkeley

Task: Find, grasp, and transport “damaged tissue” fragments

Algorithm: Gaussian Process Regression (not RL)

Sample efficiency: ?

Supporting web page: http://rll.berkeley.edu/surgical/

Robot used: Raven II surgical robot (open-architecture surgical robot for laparoscopic surgery)

Virtual environment: ?

Source

Year 2013

Hierarchical Reinforcement Learning for Robot Navigation

Authors: B. Bischoff, D Nguyen-Tuong, I-h. Lee, F. Streichert, and A. Knoll

Research group: Robert Bosch / TU Munich

Task: Robot navigation with dynamic obstacles

Algorithm: Hierarchical Reinforcement Learning (HRL) + Probabilistic Inference for Learning Control (PILCO)

Sample efficiency: ?

Supporting web page: None

Robot used: Festo Robotino

Virtual environment: ?

Year 2011

Learning to Control a Low-Cost Manipulator Using Data-Efficient Reinforcement Learning

Authors: Marc Peter Deisenroth, Carl Edward Rasmussen, and Dieter Fox

Research group: University of Washington / University of Cambridge

Task: stacking cubes and collision avoidance

Algorithm: Probabilistic Inference for Learning Control (PILCO): Gaussian process dynamics model (model-based)

Sample efficiency: 30

Supporting web page: None

Robot used: off-the-shelf robotic manipulator ($370)

Virtual environment: ?

Policy search for learning robot control using sparse data

Authors: B. Bischoff, D. Nguyen-Tuong, H. Van Hoof, A. McHutchon, C. E. Rasmussen, A. Knoll, J. Peters, and M. P. Deisenroth

Research group: Bosch Corporate Research, TU Darmstadt, University of Cambridge, TU München, Max Planck Institute for Intelligent Systems, Imperial College London

Task: stacking cubes and collision avoidance

Algorithm: Probabilistic Inference for Learning Control (PILCO): Gaussian process dynamics model (model-based)

Sample efficiency: ?

Supporting web page: None

Robot used: Festo Robotino XT

Virtual environment: ?

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

Authors: Marc Peter Deisenroth and Carl Edward Rasmussen

Research group: University of Washington / University of Cambridge

Task: Cart-pole balancing (1 DOF)

Algorithm: Probabilistic Inference for Learning Control (PILCO): Gaussian process dynamics model (model-based)

Sample efficiency: 7 (17 s)

Supporting web page: None

Robot used: Cart-Pole swing-up

Virtual environment: ?

Year 2007

Hierarchical reinforcement learning for robot navigation using the intelligent space concept

Authors: L. A. Jeni, Z. Istenes, P. Korondi, and H. Hashimoto.

Research group: Eotvos Lorannd University, Budapest University of Technology and Economics, University of Tokyo

Task: Navigation

Algorithm: TD-learning + Hierarchical Execution (HEXQ)

Sample efficiency: ?

Supporting web page: None

Robot used: Unicycle-type wheeled LEGO robot with ultrasonic sensor

Virtual environment: ?

Source

Year 2006

Policy gradient methods for robotics

Authors: Jan Peters and Stefan Schaal

Research group: University of Southern California, Los Angeles / ATR Computational Neuroscience Laboratories, Kyoto

Task: Hit a ball with a baseball bat

Algorithm: Natural Actor-Critic (policy gradient)

Sample efficiency: 200-300 trials

Supporting web page: None

Robot used: SARCOS Master Arm

Virtual environment: ?

Source

Non Robotic applications

Reinforcement learning with unsupervised auxiliary tasks

Authors: Max Jaderberg, Volodymyr Mnih, Wojciech Marian Czarnecki, Tom Schaul, Joel Z Leibo, David Silver, and Koray Kavukcuoglu

Research group: Google DeepMind

Year: 2016

Algorithm: UNsupervised REinforcement and Auxiliary Learning (UNREAL)

Supporting web page: https://deepmind.com/blog/reinforcement-learning-unsupervised-auxiliary-tasks/

Curiosity-Driven Exploration by Self-Supervised Prediction

Authors: Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, and Trevor Darrell

Research group: University of California, Berkeley

Year: 2017

Algorithm: Intrinsic Curiosity Module (ICM)

Supporting web page: https://pathak22.github.io/noreward-rl/

Leave a Reply

Your email address will not be published. Required fields are marked *