加载预定义的Simulink环境金宝app- MATLAB & Simulink - MathWorks 日本 - 金宝app,下载188bet金宝搏,金宝搏官方网站

Load Predefined金宝app环境

Reinforcement Learning Toolbox™ software provides predefined Simulink^®environments for which the actions, observations, rewards, and dynamics are already defined. You can use these environments to:

学习强化学习概念。
熟悉钢筋学习工具箱软件功能。
Test your own reinforcement learning agents.

You can load the following predefined Simulink environments using therlPredefinedEnvfunction.

环境	Agent Task
Simple pendulum Simulink model	使用离散或连续的动作空间摇摆并平衡一个简单的摆锤。
Cart-Pole Simscape™模型	Balance a pole on a moving cart by applying forces to the cart using either a discrete or continuous action space.

For predefined Simulink environments, the environment dynamics, observations, and reward signal are defined in a corresponding Simulink model. TherlPredefinedEnvfunction creates aSimulinkEnvWithAgent对象,trainfunction uses to interact with the Simulink model.

单摆金宝app模型

This environment is a simple frictionless pendulum that initially hangs in a downward position. The training goal is to make the pendulum stand upright without falling over using minimal control effort. The model for this environment is defined in theRlsimplepeneLummodel.金宝appSimulink模型。

Open_System（'rlsimplepentulummodel')

有两个简单的摆锤环境变体，因代理动作空间而异。

Discrete — Agent can apply a torque of eitherT_最大限度,0，要么 -T_最大限度to the pendulum, whereT_最大限度是个最大限度_tau模型工作区中的变量。
Continuous — Agent can apply any torque within the range [-T_最大限度,T_最大限度]。

To create a simple pendulum environment, use therlPredefinedEnvfunction.

Discrete action space

env = rlPredefinedEnv ('SimplePendulumModel-Discrete'）;

连续动作空间

env = rlPredefinedEnv ('SimplePendulumModel-Continuous'）;

For examples that train agents in the simple pendulum environment, see:

行动

在简单的摆形环境中，代理使用单个动作信号与环境相互作用，施加在摆在摆动的底部。环境包含此动作信号的规范对象。对于环境的环境：

Discrete action space, the specification is anrlFiniteSetSpec目的。
连续动作空间，规格是一个rlnumericspec.目的。

For more information on obtaining action specifications from an environment, seegetActionInfo..

Observations

在里面simple pendulum environment, the agent receives the following three observation signals, which are constructed within the创造观察子系统。

Sine of the pendulum angle
Cosine of the pendulum angle
Derivative of the pendulum angle

For each observation signal, the environment contains anrlnumericspec.observation specification. All the observations are continuous and unbounded.

有关从环境中获取观察规范的更多信息，请参阅getobservationInfo..

Reward

这种环境的奖励信号，它在该环境中构建calculate rewardsubsystem, is

$r_{t} = - ({θ.}_{t}^{2} + 0.1 * {\dot{θ.}}_{t}^{2} + 0.001 * u_{t - 1}^{2})$

Here:

θ._t是从直立位置的位移的摆角。
${\dot{θ.}}_{t}$ 是个derivative of the pendulum angle.
u_t-1是个control effort from the previous time step.

Cart-Polesimscape.模型

The goal of the agent in the predefined cart-pole environments is to balance a pole on a moving cart by applying horizontal forces to the cart. The pole is considered successfully balanced if both of the following conditions are satisfied:

The pole angle remains within a given threshold of the vertical position, where the vertical position is zero radians.
The magnitude of the cart position remains below a given threshold.

The model for this environment is defined in therlCartPoleSimscapeModel金宝appSimulink模型。The dynamics of this model are defined usingSimscape MultiBody™.

Open_System（'rlcartpolesimscapemodel')

在里面环境子系统，使用Simscape组件定义模型动态，使用Simulink块构建奖励和观察。金宝app

Open_System（'rlCartPoleSimscapeModel/Environment')

There are two cart-pole environment variants, which differ by the agent action space.

Discrete — Agent can apply a force of15,0, or-15到购物车。
连续代理可以在范围内施加任何力[-15,15]。

To create a cart-pole environment, use therlPredefinedEnvfunction.

Discrete action space

env = rlPredefinedEnv ('cartpolleimscapemodel-collete'）;

连续动作空间

env = rlPredefinedEnv ('cartpolesimscapemodel-continual'）;

For an example that trains an agent in this cart-pole environment, seeTrain DDPG Agent to Swing Up and Balance Cart-Pole System.

行动

在里面cart-pole environments, the agent interacts with the environment using a single action signal, the force applied to the cart. The environment contains a specification object for this action signal. For the environment with a:

Discrete action space, the specification is anrlFiniteSetSpec目的。
连续动作空间，规格是一个rlnumericspec.目的。

For more information on obtaining action specifications from an environment, seegetActionInfo..

Observations

在推车环境中，代理接收以下五个观察信号。

Sine of the pole angle
Cosine of the pole angle
Derivative of the pendulum angle
Cart position
Derivative of cart position

For each observation signal, the environment contains anrlnumericspec.observation specification. All the observations are continuous and unbounded.

有关从环境中获取观察规范的更多信息，请参阅getobservationInfo..

Reward

此环境的奖励信号是两个组件的总和（r=r_qr+r_n+r_p）：

A quadratic regulator control reward, constructed in the环境/ QR奖励子系统。

$r_{q r} = - (0.1 * x^{2} + 0.5 * {θ.}^{2} + 0.005. * u_{t - 1}^{2})$
A cart limit penalty, constructed in the环境/ x限制罚款子系统。当推车位置的大小超过给定阈值时，该子系统产生负奖励。

$r_{p} = - 100. * (| x | \geq 3.5)$

Here:

x是推车位置。
θ.是个pole angle of displacement from the upright position.
u_t-1是个control effort from the previous time step.

Load Predefined金宝app环境

单摆金宝app模型

行动

Observations

Reward

Cart-Polesimscape.模型

行动

Observations

Reward

See Also

块

Functions

相关话题