rlSARSAAgent

撒尔沙reinforcement learning agent

expand all in page

Description

The SARSA algorithm is a model-free, online, on-policy reinforcement learning method. A SARSA agent is a value-based reinforcement learning agent which trains a critic to estimate the return or future rewards.

For more information on SARSA agents, seeSARSA Agents.

For more information on the different types of reinforcement learning agents, seeReinforcement Learning Agents.

Creation

Syntax

agent = rlSARSAAgent(critic,agentOptions)

Description

example

agent= rlSARSAAgent(critic,agentOptions)creates a SARSA agent with the specified critic network and sets theAgentOptionsproperty.

Input Arguments

expand all

`critic`—Critic
`rlQValueFunction`object

评论家,指定为一个rlQValueFunctionobject. For more information on creating critics, seeCreate Policies and Value Functions.

Properties

expand all

`AgentOptions`—Agent options
`rlSARSAAgentOptions`object

Agent options, specified as anrlSARSAAgentOptionsobject.

`UseExplorationPolicy`—Option to use exploration policy
`false`(default) |`true`

Option to use exploration policy when selecting actions, specified as a one of the following logical values.

false— Use the agent greedy policy when selecting actions.
true— Use the agent exploration policy when selecting actions.

`ObservationInfo`—Observation specifications
specification object

This property is read-only.

Observation specifications, specified as a reinforcement learning specification object defining properties such as dimensions, data type, and name of the observation signal.

The value ofObservationInfomatches the corresponding value specified incritic.

`ActionInfo`—Action specification
`rlFiniteSetSpec`object

This property is read-only.

Action specification, specified as anrlFiniteSetSpecobject.

The value ofActionInfomatches the corresponding value specified incritic.

`SampleTime`—Sample time of agent
positive scalar|`-1`

Sample time of agent, specified as a positive scalar or as-1. Setting this parameter to-1allows for event-based simulations. The initial value ofSampleTimematches the value specified inAgentOptions.

Within a Simulink^®environment, theRL Agentblock in which the agent is specified to execute everySampleTimeseconds of simulation time. IfSampleTimeis-1, the block inherits the sample time from its parent subsystem.

Within a MATLAB^®environment, the agent is executed every time the environment advances. In this case,SampleTimeis the time interval between consecutive elements in the output experience returned bysimortrain. IfSampleTimeis-1, the time interval between consecutive elements in the returned output experience reflects the timing of the event that triggers the agent execution.

Object Functions

`train`	Train reinforcement learning agents within a specified environment
`sim`	Simulate trained reinforcement learning agents within specified environment
`getAction`	Obtain action from agent or actor given environment observations
`getActor`	Get actor from reinforcement learning agent
`setActor`	Set actor of reinforcement learning agent
`getCritic`	Get critic from reinforcement learning agent
`setCritic`	Set critic of reinforcement learning agent
`generatePolicyFunction`	Create function that evaluates trained policy of reinforcement learning agent

Examples

collapse all

Create a SARSA Agent

Open Live Script

创建或加载一个环境接口。对于这个example load the Basic Grid World environment interface also used in the exampleTrain Reinforcement Learning Agent in Basic Grid World.

env = rlPredefinedEnv("BasicGridWorld");

Create a table approximation model derived from the environment observation and action specifications.

qTable = rlTable(...getObservationInfo(env),...getActionInfo(env));

Create the critic usingqTable. SARSA agents use anrlValueFunctionobject to implement the critic.

critic = rlQValueFunction(qTable,...getObservationInfo(env),...getActionInfo(env));

Create a SARSA agent using the specified critic and an epsilon value of0.05.

opt = rlSARSAAgentOptions; opt.EpsilonGreedyExploration.Epsilon = 0.05; agent = rlSARSAAgent(critic,opt)

agent = rlSARSAAgent with properties: AgentOptions: [1x1 rl.option.rlSARSAAgentOptions] UseExplorationPolicy: 0 ObservationInfo: [1x1 rl.util.rlFiniteSetSpec] ActionInfo: [1x1 rl.util.rlFiniteSetSpec] SampleTime: 1

To check your agent, use getAction to return the action from a random observation.

getAction(agent,{randi(25)})

ans =1x1 cell array{[1]}

You can now test and train the agent against the environment.

Version History

Introduced in R2019a

rlSARSAAgent

Description

Creation

Syntax

Description

Input Arguments

`critic`—Critic
`rlQValueFunction`object

Properties

`AgentOptions`—Agent options
`rlSARSAAgentOptions`object

`UseExplorationPolicy`—Option to use exploration policy
`false`(default) |`true`

`ObservationInfo`—Observation specifications
specification object

`ActionInfo`—Action specification
`rlFiniteSetSpec`object

`SampleTime`—Sample time of agent
positive scalar|`-1`

Object Functions

Examples

Create a SARSA Agent

Version History

See Also

Topics

rlSARSAAgent

Description

Creation

Syntax

Description

Input Arguments

critic—CriticrlQValueFunctionobject

Properties

AgentOptions—Agent optionsrlSARSAAgentOptionsobject

UseExplorationPolicy—Option to use exploration policyfalse(default) |true

ObservationInfo—Observation specificationsspecification object

ActionInfo—Action specificationrlFiniteSetSpecobject

SampleTime—Sample time of agentpositive scalar|-1

Object Functions

Examples

Create a SARSA Agent

Version History

See Also

Topics

`critic`—Critic
`rlQValueFunction`object

`AgentOptions`—Agent options
`rlSARSAAgentOptions`object

`UseExplorationPolicy`—Option to use exploration policy
`false`(default) |`true`

`ObservationInfo`—Observation specifications
specification object

`ActionInfo`—Action specification
`rlFiniteSetSpec`object

`SampleTime`—Sample time of agent
positive scalar|`-1`