Load Predefined金宝app环境
Reinforcement Learning Toolbox™ software provides predefined Simulink®environments for which the actions, observations, rewards, and dynamics are already defined. You can use these environments to:
学习强化学习概念。
熟悉钢筋学习工具箱软件功能。
Test your own reinforcement learning agents.
You can load the following predefined Simulink environments using therlPredefinedEnv
function.
环境 | Agent Task |
---|---|
Simple pendulum Simulink model | 使用离散或连续的动作空间摇摆并平衡一个简单的摆锤。 |
Cart-Pole Simscape™模型 | Balance a pole on a moving cart by applying forces to the cart using either a discrete or continuous action space. |
For predefined Simulink environments, the environment dynamics, observations, and reward signal are defined in a corresponding Simulink model. TherlPredefinedEnv
function creates aSimulinkEnvWithAgent
对象,train
function uses to interact with the Simulink model.
单摆金宝app模型
This environment is a simple frictionless pendulum that initially hangs in a downward position. The training goal is to make the pendulum stand upright without falling over using minimal control effort. The model for this environment is defined in theRlsimplepeneLummodel.
金宝appSimulink模型。
Open_System('rlsimplepentulummodel')
有两个简单的摆锤环境变体,因代理动作空间而异。
Discrete — Agent can apply a torque of eitherT最大限度,
0
, 要么 -T最大限度to the pendulum, whereT最大限度是个最大限度_tau
模型工作区中的变量。Continuous — Agent can apply any torque within the range [-T最大限度,T最大限度]。
To create a simple pendulum environment, use therlPredefinedEnv
function.
Discrete action space
env = rlPredefinedEnv ('SimplePendulumModel-Discrete');
连续动作空间
env = rlPredefinedEnv ('SimplePendulumModel-Continuous');
For examples that train agents in the simple pendulum environment, see:
行动
在简单的摆形环境中,代理使用单个动作信号与环境相互作用,施加在摆在摆动的底部。环境包含此动作信号的规范对象。对于环境的环境:
Discrete action space, the specification is an
rlFiniteSetSpec
目的。连续动作空间,规格是一个
rlnumericspec.
目的。
For more information on obtaining action specifications from an environment, seegetActionInfo.
.
Observations
在里面simple pendulum environment, the agent receives the following three observation signals, which are constructed within the创造观察子系统。
Sine of the pendulum angle
Cosine of the pendulum angle
Derivative of the pendulum angle
For each observation signal, the environment contains anrlnumericspec.
observation specification. All the observations are continuous and unbounded.
有关从环境中获取观察规范的更多信息,请参阅getobservationInfo.
.
Reward
这种环境的奖励信号,它在该环境中构建calculate rewardsubsystem, is
Here:
θ.t是从直立位置的位移的摆角。
是个derivative of the pendulum angle.
ut-1是个control effort from the previous time step.
Cart-Polesimscape.模型
The goal of the agent in the predefined cart-pole environments is to balance a pole on a moving cart by applying horizontal forces to the cart. The pole is considered successfully balanced if both of the following conditions are satisfied:
The pole angle remains within a given threshold of the vertical position, where the vertical position is zero radians.
The magnitude of the cart position remains below a given threshold.
The model for this environment is defined in therlCartPoleSimscapeModel
金宝appSimulink模型。The dynamics of this model are defined usingSimscape MultiBody™.
Open_System('rlcartpolesimscapemodel')
在里面环境子系统,使用Simscape组件定义模型动态,使用Simulink块构建奖励和观察。金宝app
Open_System('rlCartPoleSimscapeModel/Environment')
There are two cart-pole environment variants, which differ by the agent action space.
Discrete — Agent can apply a force of
15
,0
, or-15
到购物车。连续代理可以在范围内施加任何力[
-15
,15
]。
To create a cart-pole environment, use therlPredefinedEnv
function.
Discrete action space
env = rlPredefinedEnv ('cartpolleimscapemodel-collete');
连续动作空间
env = rlPredefinedEnv ('cartpolesimscapemodel-continual');
For an example that trains an agent in this cart-pole environment, seeTrain DDPG Agent to Swing Up and Balance Cart-Pole System.
行动
在里面cart-pole environments, the agent interacts with the environment using a single action signal, the force applied to the cart. The environment contains a specification object for this action signal. For the environment with a:
Discrete action space, the specification is an
rlFiniteSetSpec
目的。连续动作空间,规格是一个
rlnumericspec.
目的。
For more information on obtaining action specifications from an environment, seegetActionInfo.
.
Observations
在推车环境中,代理接收以下五个观察信号。
Sine of the pole angle
Cosine of the pole angle
Derivative of the pendulum angle
Cart position
Derivative of cart position
For each observation signal, the environment contains anrlnumericspec.
observation specification. All the observations are continuous and unbounded.
有关从环境中获取观察规范的更多信息,请参阅getobservationInfo.
.
Reward
此环境的奖励信号是两个组件的总和(r=rqr+rn+rp):
A quadratic regulator control reward, constructed in the
环境/ QR奖励
子系统。A cart limit penalty, constructed in the
环境/ x限制罚款
子系统。当推车位置的大小超过给定阈值时,该子系统产生负奖励。
Here:
x是推车位置。
θ.是个pole angle of displacement from the upright position.
ut-1是个control effort from the previous time step.