In the previous article, we have created, installed and registered a minimalist Gym environment. However, this environment was not doing anything since we didn’t implement the 4 methods of the environment class: __init__, step, reset and render.
In this article, we will see how to implement these 4 methods for a simple game: the tic-tac-toe.
Let’s remind ourselves the rules of the game. The game is played on a grid that’s 3 squares by 3 squares. There are 2 players, one with X and the other with O. Players take turns putting their marks in empty squares. The first player to get 3 of her marks in a row (up, down, across, or diagonally) is the winner. A reward of +100 is given to the player wining the game.
Here is an example of implementation of the 4 methods.
As previously, we install and register the environment.
pip install -e .
We can test the environment using this code.
import gym import gym_tictac env = gym.make('tictac-v0') for e in range(3): env.reset() print("######") print("EPISODE: ", e) print("######") for t in range(9): env.render() action = t state, reward, done, info = env.step(action) print("reward: ", reward) print("") env.close()
You should see the following output.
###### EPISODE: 0 ###### - - - - - - - - - reward: 0 ****** o - - - - - - - - reward: 0 ****** o x - - - - - - - reward: 0 ****** o x o - - - - - - reward: 0 ****** o x o x - - - - - reward: 0 ****** o x o x o - - - - reward: 0 ****** o x o x o x - - - Player 1 wins. reward: 100 ****** o x o x o x o - - Game Over reward: 100 ****** o x o x o x o - - Game Over reward: 100 ******
This is not very exciting, as each player are adding their token one after the other but this is just to illustrate how to use the environment. The 4 methods are working without errors.
In the next article, we will see how to create a more interesting Gym environment using the Pybullet physics engine.