From REINFORCE to PPO/GRPO - Homepage

Harry Huang (aka Wenyuan Huang, 黄问远)

This is the home page of my RL series From REINFORCE to PPO/GRPO. I will develop from the most basic policy gradient RL algorithm REINFOCE, to the widely used modern RL algorithms PPO and GRPO.

Introduction to RL

REINFORCE

On-policy and Off-policy

I will keep updating this series.

  • Title: From REINFORCE to PPO/GRPO - Homepage
  • Author: Harry Huang (aka Wenyuan Huang, 黄问远)
  • Created at : 2025-11-16 21:40:09
  • Updated at : 2025-11-16 21:52:48
  • Link: https://whuang369.com/blog/2025/11/16/CS/Machine_Learning/Reinforcement_Learning/Policy_Gradient_Homepage/
  • License: This work is licensed under CC BY-NC-SA 4.0.
Comments
On this page
From REINFORCE to PPO/GRPO - Homepage