Reinforcement Learning

Notes based on Sutton & Barto (2018) — Reinforcement Learning: An Introduction

Based on Reinforcement Learning: An Introduction by Sutton & Barto (2018)

Part 1: A Comprehensive Introduction to Reinforcement Learning (RL)

7 min read · April 01, 2026
Part 2: The Multi-Armed Bandit – Mastering the Art of Choices

6 min read · April 02, 2026
Part 3: Stepping into the World – Tabular Value-Based Methods

9 min read · April 03, 2026
Part 4: A Cliffhanger - Comparing SARSA, Q-Learning, and Expected SARSA

7 min read · April 06, 2026
Part 5: Breaking the Table – Function Approximation

6 min read · April 07, 2026
Part 7: The Complete Alignment Pipeline — From SFT to Advanced RL

A complete guide to how PPO, DPO, and GRPO transform language models from pattern copiers into reasoning agents.

8 min read · April 08, 2026
Part 6: Cutting Out the Middleman – Policy Gradient Methods

6 min read · April 08, 2026