Main Content

rlSARSAAgent

撒尔沙reinforcement learning agent

Description

The SARSA algorithm is a model-free, online, on-policy reinforcement learning method. A SARSA agent is a value-based reinforcement learning agent which trains a critic to estimate the return or future rewards.

For more information on SARSA agents, seeSARSA Agents.

For more information on the different types of reinforcement learning agents, seeReinforcement Learning Agents.

Creation

Description

example

agent= rlSARSAAgent(critic,agentOptions)creates a SARSA agent with the specified critic network and sets theAgentOptionsproperty.

Input Arguments

expand all

评论家,指定为一个rlQValueFunctionobject. For more information on creating critics, seeCreate Policies and Value Functions.

Properties

expand all

Agent options, specified as anrlSARSAAgentOptionsobject.

Option to use exploration policy when selecting actions, specified as a one of the following logical values.

  • false— Use the agent greedy policy when selecting actions.

  • true— Use the agent exploration policy when selecting actions.

This property is read-only.

Observation specifications, specified as a reinforcement learning specification object defining properties such as dimensions, data type, and name of the observation signal.

The value ofObservationInfomatches the corresponding value specified incritic.

This property is read-only.

Action specification, specified as anrlFiniteSetSpecobject.

The value ofActionInfomatches the corresponding value specified incritic.

Sample time of agent, specified as a positive scalar or as-1. Setting this parameter to-1allows for event-based simulations. The initial value ofSampleTimematches the value specified inAgentOptions.

Within a Simulink®environment, theRL Agentblock in which the agent is specified to execute everySampleTimeseconds of simulation time. IfSampleTimeis-1, the block inherits the sample time from its parent subsystem.

Within a MATLAB®environment, the agent is executed every time the environment advances. In this case,SampleTimeis the time interval between consecutive elements in the output experience returned bysimortrain. IfSampleTimeis-1, the time interval between consecutive elements in the returned output experience reflects the timing of the event that triggers the agent execution.

Object Functions

train Train reinforcement learning agents within a specified environment
sim Simulate trained reinforcement learning agents within specified environment
getAction Obtain action from agent or actor given environment observations
getActor Get actor from reinforcement learning agent
setActor Set actor of reinforcement learning agent
getCritic Get critic from reinforcement learning agent
setCritic Set critic of reinforcement learning agent
generatePolicyFunction Create function that evaluates trained policy of reinforcement learning agent

Examples

collapse all

创建或加载一个环境接口。对于这个example load the Basic Grid World environment interface also used in the exampleTrain Reinforcement Learning Agent in Basic Grid World.

env = rlPredefinedEnv("BasicGridWorld");

Create a table approximation model derived from the environment observation and action specifications.

qTable = rlTable(...getObservationInfo(env),...getActionInfo(env));

Create the critic usingqTable. SARSA agents use anrlValueFunctionobject to implement the critic.

critic = rlQValueFunction(qTable,...getObservationInfo(env),...getActionInfo(env));

Create a SARSA agent using the specified critic and an epsilon value of0.05.

opt = rlSARSAAgentOptions; opt.EpsilonGreedyExploration.Epsilon = 0.05; agent = rlSARSAAgent(critic,opt)
agent = rlSARSAAgent with properties: AgentOptions: [1x1 rl.option.rlSARSAAgentOptions] UseExplorationPolicy: 0 ObservationInfo: [1x1 rl.util.rlFiniteSetSpec] ActionInfo: [1x1 rl.util.rlFiniteSetSpec] SampleTime: 1

To check your agent, use getAction to return the action from a random observation.

getAction(agent,{randi(25)})
ans =1x1 cell array{[1]}

You can now test and train the agent against the environment.

Version History

Introduced in R2019a