Part 8 – Virtual environments for reinforcement learning

The need for custom virtual environments

As we have seen before, reinforcement learning agents need to interact with an “Environment” by inputting an action and receiving a new state and reward from this environment. This environment is very often a virtual physics simulator that represents a real-world situation. OpenAI has developed a Python library called Gym that implements benchmark virtual environments for comparing the performance of different reinforcement learning algorithms. Gym makes it very easy to query the environment, perform an action, receive a reward and state or render the environment. Example of such virtual environments include the cart-pole, mountain-car or inverted pendulum problems.

Even though these toy benchmark problems are useful to compare RL algorithms, they do not represent a real-life situation. In particular, in robotics – where RL has found many successful applications – the training is usually performed on a virtual environment first before deploying it to the real robot. Otherwise, we would have to execute millions of orders on a physical robot which could damage it and it would not be very efficient anyway.

Gym integrates very nicely with physics engines in order to allow creating custom virtual environments. One of the most used physics engine is MuJoCo (Multi-Joint dynamics with Contact). However, it requires a paid license, which can be an issue for some projects. That’s why in this post, I will focus on Pybullet, which is free. (MuJoCo has a faster performance though, according to one of their own paper…)

Getting started with Pybullet

Please follow the installation instructions here. Let’s check that everything was installed correctly. Run this code, say hi to Pybullet and play with R2D2.

The Pybullet commands are described in the documentation. The robot’s geometry and mechanics are described in XML files and can be loaded in Pybullet. The XML file format compatible with Pybullet are:

  • URDF: the most common format. Many robots have public URDF file. URDF files are used by the ROS project (Robot Operating System), see here.
  • SDF: this file format was developed as part of the Gazebo robot simulator, see here.
  • MJCF: file format developed for the MuJoCo physics engine.

Many of these files are already included in Pybullet, see here. Let’s see another example where we import a Kuka robot. First, let’s download the Kuka urdf files from the ROS Github.

$ git clone

Then run this code:

The Pybullet – Gym interface

Ok so we know how to import robots in Pybullet. Now, how do we train them? We need a way to interact with the simulation in a similar fashion as with Gym. Fortunately, Pybullet interfaces very nicely with Gym using its pybullet_envs library. For example, you can import the cart-pole environment this way:

You can also enjoy pre-trained environments here. For example, see this Kuka grasping robot following a continuous downward policy using the following command.

$ python -m pybullet_envs.baselines.enjoy_kuka_diverse_object_grasping

However, the environments found in pybullet_envs are not exactly the same as those offered by MuJoCo. Fortunately, the Pybullet-gym library has just re-implemented most MuJoCo and Roboschool environments in Pybullet and they seamlessly integrate with OpenAI Gym. For example, the MuJoCo reacher environment can be loaded using this code:

Leave a Reply