RL: Tricks of the Trade

Getting RL to work is hard. Collecting some tricks here.
Jul 5, 2020Last updated: Jul 5, 2020

Raw collection of random things I've read around for better RL but now forgotten the sources. Hopefully, I can refine this sometime.

Deadly triad

  • Off-policy learning
  • Flexible function approx.
  • Bootstrapping


  • Experience replay buffer + mini-batch SGD
  • Target network
  • TD-error clipping
  • Double Q-Learning - reduce maximization bias
  • Average Q-Learning - reduce variance

Optimistic initializations - initialize to upper bound of Q-values

© 2023 Sanyam Kapoor