RL: Tricks of the Trade

Getting RL to work is hard. Collecting some tricks here.

<1 min
🧮 math

Raw collection of random things I’ve read around for better RL but now forgotten the sources. Hopefully, I can refine this sometime.

Deadly triad🔗

Stabilization🔗

Optimistic initializations - initialize to upper bound of Q-values