主要内容

Load Predefined金宝app环境

Reinforcement Learning Toolbox™ software provides predefined Simulink®environments for which the actions, observations, rewards, and dynamics are already defined. You can use these environments to:

  • 学习强化学习概念。

  • 熟悉钢筋学习工具箱软件功能。

  • Test your own reinforcement learning agents.

You can load the following predefined Simulink environments using therlPredefinedEnvfunction.

环境 Agent Task
Simple pendulum Simulink model 使用离散或连续的动作空间摇摆并平衡一个简单的摆锤。
Cart-Pole Simscape™模型 Balance a pole on a moving cart by applying forces to the cart using either a discrete or continuous action space.

For predefined Simulink environments, the environment dynamics, observations, and reward signal are defined in a corresponding Simulink model. TherlPredefinedEnvfunction creates aSimulinkEnvWithAgent对象,trainfunction uses to interact with the Simulink model.

单摆金宝app模型

This environment is a simple frictionless pendulum that initially hangs in a downward position. The training goal is to make the pendulum stand upright without falling over using minimal control effort. The model for this environment is defined in theRlsimplepeneLummodel.金宝appSimulink模型。

Open_System('rlsimplepentulummodel')

有两个简单的摆锤环境变体,因代理动作空间而异。

  • Discrete — Agent can apply a torque of eitherT最大限度,0, 要么 -T最大限度to the pendulum, whereT最大限度是个最大限度_tau模型工作区中的变量。

  • Continuous — Agent can apply any torque within the range [-T最大限度,T最大限度]。

To create a simple pendulum environment, use therlPredefinedEnvfunction.

  • Discrete action space

    env = rlPredefinedEnv ('SimplePendulumModel-Discrete');
  • 连续动作空间

    env = rlPredefinedEnv ('SimplePendulumModel-Continuous');

For examples that train agents in the simple pendulum environment, see:

行动

在简单的摆形环境中,代理使用单个动作信号与环境相互作用,施加在摆在摆动的底部。环境包含此动作信号的规范对象。对于环境的环境:

For more information on obtaining action specifications from an environment, seegetActionInfo..

Observations

在里面simple pendulum environment, the agent receives the following three observation signals, which are constructed within the创造观察子系统。

  • Sine of the pendulum angle

  • Cosine of the pendulum angle

  • Derivative of the pendulum angle

For each observation signal, the environment contains anrlnumericspec.observation specification. All the observations are continuous and unbounded.

有关从环境中获取观察规范的更多信息,请参阅getobservationInfo..

Reward

这种环境的奖励信号,它在该环境中构建calculate rewardsubsystem, is

r t = ( θ. t 2 + 0.1 θ. ˙ t 2 + 0.001 u t 1 2 )

Here:

  • θ.t是从直立位置的位移的摆角。

  • θ. ˙ t 是个derivative of the pendulum angle.

  • ut-1是个control effort from the previous time step.

Cart-Polesimscape.模型

The goal of the agent in the predefined cart-pole environments is to balance a pole on a moving cart by applying horizontal forces to the cart. The pole is considered successfully balanced if both of the following conditions are satisfied:

  • The pole angle remains within a given threshold of the vertical position, where the vertical position is zero radians.

  • The magnitude of the cart position remains below a given threshold.

The model for this environment is defined in therlCartPoleSimscapeModel金宝appSimulink模型。The dynamics of this model are defined usingSimscape MultiBody™.

Open_System('rlcartpolesimscapemodel')

在里面环境子系统,使用Simscape组件定义模型动态,使用Simulink块构建奖励和观察。金宝app

Open_System('rlCartPoleSimscapeModel/Environment')

There are two cart-pole environment variants, which differ by the agent action space.

  • Discrete — Agent can apply a force of15,0, or-15到购物车。

  • 连续代理可以在范围内施加任何力[-15,15]。

To create a cart-pole environment, use therlPredefinedEnvfunction.

  • Discrete action space

    env = rlPredefinedEnv ('cartpolleimscapemodel-collete');
  • 连续动作空间

    env = rlPredefinedEnv ('cartpolesimscapemodel-continual');

For an example that trains an agent in this cart-pole environment, seeTrain DDPG Agent to Swing Up and Balance Cart-Pole System.

行动

在里面cart-pole environments, the agent interacts with the environment using a single action signal, the force applied to the cart. The environment contains a specification object for this action signal. For the environment with a:

For more information on obtaining action specifications from an environment, seegetActionInfo..

Observations

在推车环境中,代理接收以下五个观察信号。

  • Sine of the pole angle

  • Cosine of the pole angle

  • Derivative of the pendulum angle

  • Cart position

  • Derivative of cart position

For each observation signal, the environment contains anrlnumericspec.observation specification. All the observations are continuous and unbounded.

有关从环境中获取观察规范的更多信息,请参阅getobservationInfo..

Reward

此环境的奖励信号是两个组件的总和(r=rqr+rn+rp):

  • A quadratic regulator control reward, constructed in the环境/ QR奖励子系统。

    r q r = ( 0.1 x 2 + 0.5 θ. 2 + 0.005. u t 1 2 )

  • A cart limit penalty, constructed in the环境/ x限制罚款子系统。当推车位置的大小超过给定阈值时,该子系统产生负奖励。

    r p = 100. ( | x | 3.5 )

Here:

  • x是推车位置。

  • θ.是个pole angle of displacement from the upright position.

  • ut-1是个control effort from the previous time step.

See Also

Functions

相关话题