Here is a list of the most common reinforcement learning algorithms grouped by family.

1. Model-Free




General Agents

Imitation Learning Agents

Hierarchical Reinforcement Learning Agents

Memory Types

Exploration Techniques

Meta Learning

2. Model-Based

Dyna-Style Algorithms / Model-based data generation

Policy Search with Backpropagation through Time / Analytic gradient computation

Shooting Algorithms / sampling-based planning

Value-equivalence prediction

Model-free Model-Based

Table summary of model-free RL algorithms

Algorithm Agent type Policy Policy type Monte Carlo (MC) or Temporal difference (TD) Action space State space
Tabular Q-learning (= SARSA max)
Q learning lambda
Value-based Off-policy Pseudo-deterministic (epsilon greedy) TD Discrete onlyDiscrete only
SARSA lambda
Value-based On-policy Pseudo-deterministic (epsilon greedy) TD Discrete onlyDiscrete only
N step DQN
Double DQN
Noisy DQN
Prioritized Replay DQN
Dueling DQN
Categorical DQN
Distributed DQN (C51)
Value-based Off-policy Pseudo-deterministic (epsilon greedy)
Discrete onlyDiscrete or continuous
NAF = continuous DQNValue-basedContinuousContinuous
CEM Policy-based On-policy

REINFORCE (Vanilla policy gradient) Policy-based On-policy Stochastic MC

Policy gradient softmax Policy-based


Natural Policy Gradient Policy-based


TRPO Actor-critic
On-policy (?) Stochastic
Discrete or continuousDiscrete or continuous
PPO Actor-criticOn-policy (?) Stochastic
Discrete or continuousDiscrete or continuous
Distributed PPO Actor-criticContinuous Continuous
A2C / A3C Actor-critic On-policy Stochastic TD Discrete or continuous Discrete or continuous
DDPG Actor-critic Off-policy Deterministic
Continuous onlyDiscrete or Continuous
TD3 Actor-critic Continuous onlyDiscrete or continuous
D4PG Actor-critic

Continuous only Discrete or continuous
SAC Actor-critic Off-policy

Continuous only
Discrete or continuous
ACERActor-criticDiscreteDiscrete or Continuous
ACKTRActor-criticDiscrete or ContinuousDiscrete or Continuous


We have just seen some of the most used RL algorithms. In the next article, we will look at the challenges and application of RL for robotic applications.