Main Content




  1. 创建网格世界模型。

  2. 配置网格世界模型。

  3. 使用网格世界模型创建自己的网格世界环境。

Grid World Model



财产 read-Only 描述
网格 是的


当前状态 No

Name of the current state of the agent, specified as a string. You can use this property to set the initial state of the agent. The agent always starts from cell[1,1]默认。

代理商从当前状态once you use the reset function in therlMDPEnv环境对象。

状态 是的

一个包含网格世界的状态名称的字符串向量。例如,对于2 by-2网格世界模型GW,指定以下内容:

gw.States = [“ [1,1]”;“ [2,1]”;“ [1,2]”;“ [2,2]”];
动作 是的


gw = createGridworld(m,n,移动)


移动 gw.actions
'标准' ['n';'s';'e';'w']
'Kings' ['n';'s';'e';'w';'ne';'nw';'se';'sw']
t No

状态过渡矩阵,指定为3-D数组。t是一个概率矩阵,表示代理从当前状态移动的可能性s到任何可能的下一个状态S'byperforming action一个


t (( s ,,,, s ' ,,,, 一个 = p r o b 一个 b 一世 l 一世 t y (( s ' | s ,,,, 一个

例如,考虑一个5 x-5的确定性网格世界对象GW用细胞中的代理[3,1]。查看北方方向的状态过渡矩阵。

northStateTrySition = GW.T(:,:,1)

从上图,北期(3,2)为1,因为代理从单元中移动[3,1]到细胞[2,1]采取行动'n'。A probability of 1 indicates that from a given state, if the agent goes north, it has a 100% chance of moving one cell north on the grid. For an example showing how to set up the state transition matrix, seetr一个一世nreinforcement Learning Agent in Basic Grid World

r No



r = r (( s ,,,, s ' ,,,, 一个

设置rsuch that there is a reward to the agent after every action. For instance, you can set up a positive reward if the agent transitions over obstacle states and when it reaches the terminal state. You can also set up a default reward of -11 for all actions the agent takes, independent of the current state and next state. For an example that show how to set up the reward transition matrix, seetr一个一世nreinforcement Learning Agent in Basic Grid World

ObstacleStates No

ObstacleStates是在网格世界中无法达到的状态,被指定为字符串向量。考虑以下5 x-5网格世界模型GW


GW。ObstacleStates = ["[3,3]";“ [3,4]”;“ [3,5]”;“ [4,3]”];

有关工作流程的示例,请参见tr一个一世nreinforcement Learning Agent in Basic Grid World

terminalStates No

terminalStates最终状态网格中的世界,指定吗一个s一个str一世ng vector. Consider the previous 5-by-5 grid world modelGW。蓝色单元是终端状态,您可以通过:

gw.terminalstates =“ [5,5]”;

有关工作流程的示例,请参见tr一个一世nreinforcement Learning Agent in Basic Grid World


您可以使用Markov决策过程(MDP)环境rlMDPEnvfrom the grid world model from the previous step. MDP is a discrete-time stochastic control process. It provides a mathematical framework for modeling decision making in situations where outcomes are partly random and partly under the control of the decision maker. The agent uses the grid world environment objectrlMDPEnv与网格世界模型对象互动网格世界

有关更多信息,请参阅rlMDPEnv一个ndtr一个一世nreinforcement Learning Agent in Basic Grid World



related Topics