Reinforcement Learning
Notes based on Sutton & Barto (2018) — Reinforcement Learning: An Introduction
Based on Reinforcement Learning: An Introduction by Sutton & Barto (2018)
All Series-
Part 1: A Comprehensive Introduction to Reinforcement Learning (RL)
-
Part 2: The Multi-Armed Bandit – Mastering the Art of Choices
-
Part 3: Stepping into the World – Tabular Value-Based Methods
-
Part 4: A Cliffhanger - Comparing SARSA, Q-Learning, and Expected SARSA
-
Part 5: Breaking the Table – Function Approximation
-
Part 6: Cutting Out the Middleman – Policy Gradient Methods
-
Part 7: The Complete Alignment Pipeline — From SFT to Advanced RL
A complete guide to how PPO, DPO, and GRPO transform language models from pattern copiers into reasoning agents.