From REINFORCE to PPO/GRPO - Homepage

Harry Huang (aka Wenyuan Huang, 黄问远)

2025-11-16 21:40:09 2025-11-16 21:40:09 Created 2025-11-16 21:52:48 2025-11-16 21:52:48 Updated

Computer Science
>
Machine Learning
>
Reinforcement Learning

Machine Learning
| Reinforcement Learning

This is the home page of my RL series From REINFORCE to PPO/GRPO. I will develop from the most basic policy gradient RL algorithm REINFOCE, to the widely used modern RL algorithms PPO and GRPO.

Introduction to RL

REINFORCE

On-policy and Off-policy

I will keep updating this series.

Title: From REINFORCE to PPO/GRPO - Homepage
Author: Harry Huang (aka Wenyuan Huang, 黄问远)
Created at : 2025-11-16 21:40:09
Updated at : 2025-11-16 21:52:48
Link: https://whuang369.com/blog/2025/11/16/CS/Machine_Learning/Reinforcement_Learning/Policy_Gradient_Homepage/
License: This work is licensed under CC BY-NC-SA 4.0.

#Machine Learning
#Reinforcement Learning

Comments

On this page

From REINFORCE to PPO/GRPO - Homepage