Introduction to Reinforcement Learning
Introduction to Reinforcement Learning (RL)
Reinforcement Learning (RL) is a field of machine learning focused on decision-making. A simple way to understand RL is through the game of Go. At any moment, a player sees the current board configuration (the state) and must choose one of many possible legal actions (moves). A strong RL algorithm for Go (such as AlphaGo) is able to choose an effective action for any given state.
RL differs from other types of machine learning—such as computer
vision classification—in one fundamental way:
RL agents learn through continuous interaction with an external
“environment,” primarily through the rewards they receive.
What Is an Environment?
The environment includes everything the agent interacts with. It accepts the agent’s actions and produces feedback, typically in the form of new states and rewards.
In Go, the environment receives a move, updates the board, checks if
the game ended, and provides a corresponding reward (e.g., +1 for a win,
0 for a draw, −1 for a loss). Over repeated interactions, the RL agent
learns which strategies produce higher long-term rewards.
This trial-and-error learning process is what distinguishes RL from
other machine learning paradigms.
The Markov Property
In many RL settings, the agent must choose actions based only on the current observation. For this to work, we require the Markov Property:
The future evolution of the process depends only on its current state, not on the sequence of past events that led there.
This ensures that an optimal policy can be defined as a function of the current state alone.
Of course, many real-world scenarios violate the Markov
Property.
For example, in RoboCup, a robot’s camera cannot capture the full soccer
field. The agent’s observation is incomplete, making the true state
partially observable. Researchers often design clever observation
functions or memory mechanisms (e.g., recurrent networks) to approximate
Markovian behavior, but perfect Markovness is rarely achievable in
practice.
Formalizing Reinforcement Learning
A typical RL problem is defined by the following components:
- State space (S): all possible states the agent can
observe.
- Action space (A): all actions the agent can
take.
- Reward function (r(s, a)): the immediate reward
after taking action (a) in state (s).
- Policy ((s)): a rule or function describing how the agent selects actions.
The objective in RL is to find an optimal policy—one that maximizes the expected long-term reward.
Example: Gridworld
To make this concrete, consider a simple environment called Gridworld.
An agent moves on an (N M) grid and tries to reach a specific goal cell ((g_x, g_y)). Each step allows the agent to move up, down, left, or right. When it reaches the goal, the episode ends and the agent receives a reward of +1; otherwise, each step incurs a small penalty of −0.01 to discourage wandering.
The formal components are:
State space:
[ (A_x, A_y), (G_x, G_y) A_x, G_x {1N}, A_y, G_y {1M}. ]
Here ((A_x, A_y)) is the agent’s location, and ((G_x, G_y)) is the goal’s location.Action space:
[ a {(0,1), (0,-1), (1,0), (-1,0)}, ]
corresponding to the four possible movement directions.Reward function:
- (+1) if the agent reaches the goal
- (-0.01) otherwise
- (+1) if the agent reaches the goal
Optimal policy:
Move toward the goal using the shortest path.
This is one of the simplest examples of an RL task, but it already illustrates how states, actions, and rewards interact.
Final Thoughts
This post gives only a brief introduction to RL. Real RL research involves far more complexity. Even seemingly simple components—like the reward function—can dramatically affect an agent’s ability to learn. For instance, in the Gridworld example, the agent rarely reaches the goal early in training, so relying solely on the sparse +1 reward makes learning extremely difficult.
Designing the state representation, action space, and reward function is often one of the hardest parts of building an RL system.
Among the many RL algorithm families, Policy Gradient
methods have become increasingly popular in modern research and
applications.
In the next article, we will begin exploring them—starting from the most
fundamental algorithm: REINFORCE.
- Title: Introduction to Reinforcement Learning
- Author: Harry Huang (aka Wenyuan Huang, 黄问远)
- Created at : 2025-03-22 02:13:34
- Updated at : 2025-11-16 21:44:06
- Link: https://whuang369.com/blog/2025/03/22/CS/Machine_Learning/Reinforcement_Learning/RL_Intro/
- License: This work is licensed under CC BY-NC-SA 4.0.