From REINFORCE to PPO/GRPO - Homepage
The home page of PG series.
This is the home page of my RL series From REINFORCE to PPO/GRPO. I will develop from the most basic policy gradient RL algorithm REINFOCE, to the widely used modern RL algorithms PPO and GRPO.
I will keep updating this series.