主要内容

RL代理人

Reinforcement learning agent

  • Library:
  • 强化学习太lbox

  • RL代理人block

描述

Use theRL代理人block to simulate and train a reinforcement learning agent in Simulink®。您将块与存储在MATLAB中的代理相关联®workspace or a data dictionary as an agent object such as anrlACAgent要么rlDDPGAgentobject. You connect the block so that it receives an observation and a computed reward. For instance, consider the following block diagram of theRlsimplepentulummodel.model.

The观察输入端口RL代理人block receives a signal that is derived from the instantaneous angle and angular velocity of the pendulum. The奖励port receives a reward calculated from the same two values and the applied action. You configure the observations and reward computations that are appropriate to your system.

The block uses the agent to generate an action based on the observation and reward you provide. Connect theaction输出端口为系统适当输入。例如,在Rlsimplepentulummodel.,actionport is a torque applied to the pendulum system. For more information about this model, seeTrain DQN Agent to Swing Up and Balance Pendulum

To train a reinforcement learning agent in Simulink, you generate an environment from the Simulink model. You then create and configure the agent for training against that environment. For more information, see为强化学习创建金宝appSimulink环境。When you calltrainusing the environment,trainsimulates the model and updates the agent associated with the block.

港口

输入

expand all

This port receives observation signals from the environment. Observation signals represent measurements or other instantaneous system data. If you have multiple observations, you can use amux.block to combine them into a vector signal. To use a nonvirtual bus signal, usebus2rlspec.

This port receives the reward signal, which you compute based on the observation data. The reward signal is used during agent training to maximize the expectation of the long-term reward.

Use this signal to specify conditions under which to terminate a training episode. You must configure logic appropriate to your system to determine the conditions for episode termination. One application is to terminate an episode that is clearly going well or going poorly. For instance, you can terminate an episode if the agent reaches its goal or goes irrecoverably far from its goal.

Output

expand all

Action computed by the agent based on the observation and reward inputs. Connect this port to the inputs of your system. To use a nonvirtual bus signal, usebus2rlspec.

Note

当代理人如rlACAgent,rlpgagent., orrlPPOAgent用A.rlstochastorrepresentation带有连续动作空间的演员,代理不强制执行由操作规范设置的约束。在这些情况下,您必须在环境中强制执行操作空间约束。

Cumulative sum of the reward signal during simulation. Observe or log this signal to track how the cumulative reward evolves over time.

Dependencies

To enable this port, select theProvide cumulative reward signalparameter.

Parameters

expand all

输入存储在MATLAB工作区或数据字典中的代理对象的名称,例如rlACAgent要么rlDDPGAgentobject. For information about agent objects, see加固学习代理人

Programmatic Use

块参数:Agent
Type:string, character vector
Default:"agentObj"

启用累积_Reward.block output by selecting this parameter.

Programmatic Use

块参数:ProvideCumRwd
Type:string, character vector
Values:“关”,"on"
Default:“关”
Introduced in R2019a