RL代理人

Reinforcement learning agent

expand all in page

Library:
强化学习太lbox

描述

Use theRL代理人block to simulate and train a reinforcement learning agent in Simulink^®。您将块与存储在MATLAB中的代理相关联^®workspace or a data dictionary as an agent object such as anrlACAgent要么rlDDPGAgentobject. You connect the block so that it receives an observation and a computed reward. For instance, consider the following block diagram of theRlsimplepentulummodel.model.

The观察输入端口RL代理人block receives a signal that is derived from the instantaneous angle and angular velocity of the pendulum. The奖励port receives a reward calculated from the same two values and the applied action. You configure the observations and reward computations that are appropriate to your system.

The block uses the agent to generate an action based on the observation and reward you provide. Connect theaction输出端口为系统适当输入。例如，在Rlsimplepentulummodel.,actionport is a torque applied to the pendulum system. For more information about this model, seeTrain DQN Agent to Swing Up and Balance Pendulum。

To train a reinforcement learning agent in Simulink, you generate an environment from the Simulink model. You then create and configure the agent for training against that environment. For more information, see为强化学习创建金宝appSimulink环境。When you calltrainusing the environment,trainsimulates the model and updates the agent associated with the block.

港口

输入

expand all

`观察`— Environment observations
scalar | vector | nonvirtual bus

This port receives observation signals from the environment. Observation signals represent measurements or other instantaneous system data. If you have multiple observations, you can use amux.block to combine them into a vector signal. To use a nonvirtual bus signal, usebus2rlspec.。

`奖励`— Reward from environment
scalar

This port receives the reward signal, which you compute based on the observation data. The reward signal is used during agent training to maximize the expectation of the long-term reward.

`isdone`- 终止episode仿真的标志
logical

Use this signal to specify conditions under which to terminate a training episode. You must configure logic appropriate to your system to determine the conditions for episode termination. One application is to terminate an episode that is clearly going well or going poorly. For instance, you can terminate an episode if the agent reaches its goal or goes irrecoverably far from its goal.

Output

expand all

`action`— Agent action
scalar | vector | nonvirtual bus

Action computed by the agent based on the observation and reward inputs. Connect this port to the inputs of your system. To use a nonvirtual bus signal, usebus2rlspec.。

Note

当代理人如rlACAgent,rlpgagent., orrlPPOAgent用A.rlstochastorrepresentation带有连续动作空间的演员，代理不强制执行由操作规范设置的约束。在这些情况下，您必须在环境中强制执行操作空间约束。

`累积_Reward.`- 总奖励
scalar | vector

Cumulative sum of the reward signal during simulation. Observe or log this signal to track how the cumulative reward evolves over time.

Dependencies

To enable this port, select theProvide cumulative reward signalparameter.

Parameters

expand all

`Agent object`— Agent to train
`agent`(default) | agent object

输入存储在MATLAB工作区或数据字典中的代理对象的名称，例如rlACAgent要么rlDDPGAgentobject. For information about agent objects, see加固学习代理人。

Programmatic Use

块参数：Agent

Type:string, character vector

Default:"agentObj"

`Provide cumulative reward signal`— Add cumulative reward output port
`关闭`(default) |`上`

启用累积_Reward.block output by selecting this parameter.

Programmatic Use

块参数：ProvideCumRwd

Type:string, character vector

Values:“关”,"on"

Default:“关”

Model Examples

Train DQN Agent to Swing Up and Balance Pendulum

Train a Deep Q-network agent to balance a pendulum modeled in Simulink.

Train DDPG Agent to Swing Up and Balance Pendulum

Train a deep deterministic policy gradient agent to balance a pendulum modeled in Simulink.

RL代理人

描述

港口

输入