RL Tutorial

Published:

In this presentation, I introduce RL algorithms from dynamic programming (value iteration, policy iteration) to Q-learning (Monte-Carlo and Temporal-Difference prediction, deep Q-network) to policy gradient (REINFORCE, proximal policy gradient, and soft actor-critic). Additionally, I discuss some of the recent advances in safe and multiagent RL and some other directions.

Slides