createGridWorld
Create a two-dimensional grid world object
Description
A grid world is a two-dimensional grid in which each position is a possible state
that the agent can occupy. The actions that an agent can attempt represent moves from one
position to the next. Many introductory reinforcement learning examples use grid worlds. Use
the createGridWorld function to create a GridWorld
object with a specified size and move types. You can then modify some of the object properties
and pass it to rlMDPEnv to create an
environment that agents can interact with. For more information, see Create Custom Grid World Environments.
Examples
For this example, create a 5-by-5 grid world object with these rules:
A 5-by-5 grid world bounded by borders, with four possible actions: North = 1, South = 2, East = 3, West = 4.
The agent begins from cell [2,1] (second row, first column, indicated by the red circle in the figure).
The agent receives reward +10 if it reaches the terminal state at cell [5,5] (blue cell).
The environment contains a special jump from cell [2,4] to cell [4,4] with +5 reward (blue arrow).
The agent is blocked by obstacles in cells [3,3], [3,4], [3,5], and [4,3] (black cells).
All other actions result in –1 reward.

Then, use the gridworld object to create an environment for which you can train and simulate an agent.
First, create a GridWorld object using the createGridWorld function.
gw = createGridWorld(5,5)
gw =
GridWorld with properties:
GridSize: [5 5]
CurrentState: "[1,1]"
States: [25×1 string]
Actions: [4×1 string]
T: [25×25×4 double]
R: [25×25×4 double]
ObstacleStates: [0×1 string]
TerminalStates: [0×1 string]
ProbabilityTolerance: 8.8818e-16
Display the action names.
gw.Actions
ans = 4×1 string
"N"
"S"
"E"
"W"
Then set the initial, terminal, and obstacle states.
gw.CurrentState = "[2,1]"; gw.TerminalStates = "[5,5]"; gw.ObstacleStates = ["[3,3]";"[3,4]";"[3,5]";"[4,3]"];
Update the state transition matrix for the obstacle states.
updateStateTranstionForObstacles(gw)
To set the jump rule over the obstacle states, first zero out all the transitions out from state "[2,4]" for any action. Note that, because each number in one row represents a probability of moving into a specific cell, all the numbers along a row must always add to either one or zero, otherwise an error is thrown.
Set to zero the probability of transitioning out from state "[2,4]". Use the state2idx function to obtain the index associated with the state "[2,4]".
gw.T(state2idx(gw,"[2,4]"),:,:) = 0;Then, for any action, set to one the probability from transitioning from state "[2,4]" to state "[4,4]".
gw.T(state2idx(gw,"[2,4]"),state2idx(gw,"[4,4]"),:) = 1;
Next, define the rewards in the reward transition matrix.
nS = numel(gw.States); nA = numel(gw.Actions); gw.R = -1*ones(nS,nS,nA); gw.R(state2idx(gw,"[2,4]"),state2idx(gw,"[4,4]"),:) = 5; gw.R(:,state2idx(gw,gw.TerminalStates),:) = 10;
Use rlMDPEnv to create the grid world environment env from the GridWorld object gw.
env = rlMDPEnv(gw)
env =
rlMDPEnv with properties:
Model: [1×1 rl.env.GridWorld]
ResetFcn: []
You can visualize the grid world environment using the plot function.
plot(env)
Use the action2idx function to obtain the index associated with the "E" action. Then use the environment step function to move the agent eastward.
[xn,rn,id]=step(env,action2idx(env.Model,"E"))
xn = 7
rn = -1
id = logical
0
Use the idx2state function to display the name of the next state.
idx2state(env.Model,xn)
ans = "[2,2]"
Use the getActionInfo and getObservationInfo functions to extract the action and observation specification objects from the environment.
actInfo = getActionInfo(env)
actInfo =
rlFiniteSetSpec with properties:
Elements: [4×1 double]
Name: "MDP Actions"
Description: [0×0 string]
Dimension: [1 1]
DataType: "double"
obsInfo = getObservationInfo(env)
obsInfo =
rlFiniteSetSpec with properties:
Elements: [25×1 double]
Name: "MDP Observations"
Description: [0×0 string]
Dimension: [1 1]
DataType: "double"
You can now use the action and observation specifications to create an agent for env, and then use the train and sim functions to train and simulate the agent within the environment.
Input Arguments
Number of grid world rows, specified as a positive integer.
Example: 5
Number of grid world columns, specified as a positive integer.
Example: 5
Action names, specified as either "Standard" or
"Kings".
When
movesis set to"Standard", the actions are["N";"S";"E";"W"].When
movesis set to"Kings", the actions are["N";"S";"E";"W";"NE";"NW";"SE";"SW"].
Example: "Kings"
Output Arguments
Two-dimensional grid world, returned as a GridWorld object with the
properties listed below. For more information, see Create Custom Grid World Environments.
Name of the current state, specified as a string. This name corresponds to the
current agent position in the grid, which is specified as a string or character
vector such as "[a,b]".
For more information on this property, see the
CurrentState property in Create Custom Grid World Environments.
Example: GW.CurrentState="[2,3]"
State names, specified as a string vector of length
m*n. Each state name is a string
specified as indicated in CurrentState. This property is
read-only.
For more information on this property, see the States
property in Create Custom Grid World Environments.
Action names, specified as a string vector. This property is read-only.
The length of the Actions vector is determined by the
moves argument.
Actions is a string vector of length:
Four, if
movesis specified as"Standard"Eight, if
movesis specified as"Kings"
For more information on this property, see the Actions
property in Create Custom Grid World Environments.
State transition matrix, specified as a 3-D array in which every row of each page contains nonnegative numbers that must add up to one.
The state transition matrix T is a probability matrix that
indicates the likelihood of the agent moving from the current state
s to any possible next state s' by
performing action a. T is given by
where
Tis aK-by-K-by-4 array ifmovesis specified as"Standard". Here,K=m*n.Tis aK-by-K-by-8 array, ifmovesis specified as"Kings".
When you create a grid world object, the transition matrix contains standard deterministic transitions corresponding to the four or eight actions that the agent can execute.
Note
Because each number in a row represents the probability of moving from the
cell indexed by the column into the cell indexed by the row, all the numbers
along a row must always add to either one or zero (within the tolerance
specified in ProbabilityTolerance). To set transition
probabilities, first, set an entire row to zero, then set the non-zero
probabilities all at once. For an example, see createGridWorld or createMDP.
For more information on this property, see the T property
in Create Custom Grid World Environments.
Example: GW.T(1,[1 2 3],1) = [0.25 0.5 0.25] assigns the
first three elements of the first row of T in the object
GW.
Reward transition matrix, specified as a 3-D array, determines how much reward
the agent receives after performing an action in the environment.
R has the same shape and size as the state transition matrix
T. The reward transition matrix R is given by,
where
Ris aK-by-K-by-4 array, ifmovesis specified as"Standard". Here,K=m*n.Ris aK-by-K-by-8 array, ifmovesis specified as"Kings".
When you create a grid world object, the reward matrix is zero.
For more information on this property, see the R property
in Create Custom Grid World Environments.
Example: GW.R(1,[1 2 3],1) = [0.2 -0.5 0.2] assigns the
first three elements of the first row of R in the object
GW.
State names that cannot be reached in the grid world, specified as a string vector.
For more information on this property, see the
ObstacleStates property in Create Custom Grid World Environments.
Example: GW.ObstacleStates =
["[3,3]";"[3,4]";"[3,5]";"[4,3]"]; sets four states as obstacles in
the object GW.
Terminal state names in the grid world, specified as a string vector.
For more information on this property, see the
TerminalStates property in Create Custom Grid World Environments.
Example: GW.TerminalStates = "[5,5]"; sets the state
"[5,5]" as terminal state in the object
GW.
Tolerance for the sum of probabilities along a row of the transition matrix, specified as a positive scalar.
Because the sum of numbers along a row of the transition matrix represents the
probability of moving into the state indexed by the row number, all the numbers
along a row must add to either one or zero, within the tolerance specified in
ProbabilityTolerance. If this condition is not verified, an
error is thrown.
To set transition probabilities, first, set an entire row to zero, then set
the non-zero probabilities all at once. For an example, see createMDP. Alternatively, copy the transition matrix into a variable,
modify the variable, and then assign it back as transition matrix of your grid
world object.
Example: GW.ProbabilityTolerance = 1e-15; sets to
1e-15 the probability tolerance of the object
GW.
Version History
Introduced in R2019a
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)