Introduction to Reinforcement Learning

Harry Huang (aka Wenyuan Huang, 黄问远)

What is RL?

Reinforcement Learning (RL) is about making decisions under given situations. Go is a great example about the problems that RL is good at: players are given the current chessboard, then they need to make the most optimal move. In the sense of RL, we call the current chess board "state", and the moves from player "actions". A good RL algorithm for Go (like Alpha Go) should be able to make an optimal action for any given possible state.

There are many other kind of Machine Learning (ML) that are able to make choices from given information. For instance, Computer Vision models classify the objects in given images. The peculiarity of RL is: it learns from interactions with an "environment", especially the "rewards" from the interactions.

"Environment" refers to everything outside the RL agent that the agent interacts with. It receives actions from agents, and give "feedback", usually consists of states and rewards back to the agent. In the example of Go, the environment receives the move of the agent and responds by updating the game board and possibly signaling the end of the game. It may also include an opponent who makes the next move. After each interaction, the environment returns a new state (the updated board) and a reward (such as +1 for winning, 0 for a draw, or -1 for losing). Over time, the RL agent learns which actions lead to better long-term rewards by exploring different strategies and adjusting its behavior based on the feedback it receives. This trial-and-error learning process is what distinguishes RL from other forms of machine learning.

To be continued

  • Title: Introduction to Reinforcement Learning
  • Author: Harry Huang (aka Wenyuan Huang, 黄问远)
  • Created at : 2025-03-22 02:13:34
  • Updated at : 2025-08-11 15:06:40
  • Link: https://whuang369.com/blog/2025/03/22/CS/Machine_Learning/Reinforcement_Learning/RL_Intro/
  • License: This work is licensed under CC BY-NC-SA 4.0.
Comments
On this page
Introduction to Reinforcement Learning