Part 2 – Creating a simple Gym environment – Tic-Tac-Toe

In the previous article, we have created, installed and registered a minimalist Gym environment. However, this environment was not doing anything since we didn’t implement the 4 methods of the environment class: __init__, step, reset and render.

In this article, we will see how to implement these 4 methods for a simple game: the tic-tac-toe.

Let’s remind ourselves the rules of the game. The game is played on a grid that’s 3 squares by 3 squares. There are 2 players, one with X and the other with O. Players take turns putting their marks in empty squares. The first player to get 3 of her marks in a row (up, down, across, or diagonally) is the winner. A reward of +100 is given to the player wining the game.

The tic-tac-toe

Here is an example of implementation of the 4 methods.

This code is largely based on this article. The code can be found on GitHub.

As previously, we install and register the environment.

pip install -e .

We can test the environment using this code.

import gym
import gym_tictac

env = gym.make('tictac-v0')

for e in range(3):
     env.reset()
     print("######")
     print("EPISODE: ", e)
     print("######")

     for t in range(9):
          env.render()
          action = t
          state, reward, done, info = env.step(action) 
          print("reward: ", reward)
          print("")

 env.close()

You should see the following output.

######
EPISODE:  0
######
 - - -
 - - -
 - - -
 reward:  0
 ****** 
 o - - 
 - - -
 - - -
 reward:  0
 ******
 o x - 
 - - -
 - - -
 reward:  0
 ******
 o x o 
 - - -
 - - -
 reward:  0
 ******
 o x o 
 x - - 
 - - -
 reward:  0
 ******
 o x o 
 x o - 
 - - -
 reward:  0
 ******
 o x o 
 x o x 
 - - -
 Player 1 wins.
 reward:  100
 ******
 o x o 
 x o x 
 o - - 
 Game Over
 reward:  100
 ******
 o x o 
 x o x 
 o - - 
 Game Over
 reward:  100
 ******

This is not very exciting, as each player are adding their token one after the other but this is just to illustrate how to use the environment. The 4 methods are working without errors.

In the next article, we will see how to create a more interesting Gym environment using the Pybullet physics engine.

Leave a Reply