Here is a list of the most common reinforcement learning algorithms grouped by family.

1. Model-Free

Value-based

Policy-based

Actor-Critic

General Agents

Imitation Learning Agents

Hierarchical Reinforcement Learning Agents

Memory Types

Exploration Techniques

Meta Learning

2. Model-Based

Dyna-Style Algorithms / Model-based data generation

Policy Search with Backpropagation through Time / Analytic gradient computation

Shooting Algorithms / sampling-based planning

Value-equivalence prediction

Model-free Model-Based

Table summary of model-free RL algorithms

Algorithm Agent type Policy Policy type Monte Carlo (MC) or Temporal difference (TD) Action space State space
Tabular Q-learning (= SARSA max)
Q learning lambda
Value-based Off-policy Pseudo-deterministic (epsilon greedy) TD Discrete onlyDiscrete only
SARSA
SARSA lambda
Value-based On-policy Pseudo-deterministic (epsilon greedy) TD Discrete onlyDiscrete only
DQN
N step DQN
Double DQN
Noisy DQN
Prioritized Replay DQN
Dueling DQN
Categorical DQN
Distributed DQN (C51)
Value-based Off-policy Pseudo-deterministic (epsilon greedy)
Discrete onlyDiscrete or continuous
NAF = continuous DQNValue-basedContinuousContinuous
CEM Policy-based On-policy
MC

REINFORCE (Vanilla policy gradient) Policy-based On-policy Stochastic MC

Policy gradient softmax Policy-based

Stochastic


Natural Policy Gradient Policy-based

Stochastic


TRPO Actor-critic
On-policy (?) Stochastic
Discrete or continuousDiscrete or continuous
PPO Actor-criticOn-policy (?) Stochastic
Discrete or continuousDiscrete or continuous
Distributed PPO Actor-criticContinuous Continuous
A2C / A3C Actor-critic On-policy Stochastic TD Discrete or continuous Discrete or continuous
DDPG Actor-critic Off-policy Deterministic
Continuous onlyDiscrete or Continuous
TD3 Actor-critic Continuous onlyDiscrete or continuous
D4PG Actor-critic



Continuous only Discrete or continuous
SAC Actor-critic Off-policy

Continuous only
Discrete or continuous
ACERActor-criticDiscreteDiscrete or Continuous
ACKTRActor-criticDiscrete or ContinuousDiscrete or Continuous

Conclusion

We have just seen some of the most used RL algorithms. In the next article, we will look at the challenges and application of RL for robotic applications.