From REINFORCE to PPO/GRPO - Homepage

This is the home page of my RL series From REINFORCE to PPO/GRPO. I will develop from the most basic policy gradient RL algorithm REINFOCE, to the widely used modern RL algorithms PPO and GRPO.

Introduction to RL

REINFORCE

On-policy and Off-policy

I will keep updating this series.