RL: Tricks of the Trade

Getting RL to work is hard. Collecting some tricks here.

Raw collection of random things I’ve read around for better RL but now forgotten the sources. Hopefully, I can refine this sometime.

Deadly triad


Optimistic initializations - initialize to upper bound of Q-values