getMaxQValue

Obtain maximum estimated value over all possible actions from a Q-value function critic with discrete action space, given environment observations

所有的页面崩溃

Syntax

[maxQ,maxActionIndex] = getMaxQValue(qValueFcnObj,obs)

[maxQ,maxActionIndex,state] = getMaxQValue(___)

Description

example

[maxQ,maxActionIndex] = getMaxQValue(qValueFcnObj,obs)evaluates the discrete-action-space Q-value function criticqValueFcnObjand returns the maximum estimated value over all possible actionsmaxQ, with the corresponding action indexmaxActionIndex, given environment observationsobs.

[maxQ,maxActionIndex,state] = getMaxQValue(___)also returns the updated state ofqValueFcnObjwhen it contains a recurrent neural network.

Examples

collapse all

Obtain Maximum Q-Value Function Estimates

Open Live Script

Create an observation and action specification objects (or alternatively usegetObservationInfoandgetActionInfoto extract the specification objects from an environment. For this example, define the observation space as a continuous three-dimensional space, and the action space as a finite set consisting of three possible values (named -1, 0, and 1).

obsInfo = rlNumericSpec([3 1]); actInfo = rlFiniteSetSpec([-1 0 1]);

Create a custom basis function to approximate the Q-value function within the critic, and define an initial parameter vector.

myBasisFcn = @(myobs,myact) [...ones(4,1); myobs(:); myact; myobs(:).^2; myact.^2; sin(myobs(:)); sin(myact); cos(myobs(:)); cos(myact) ]; W0 = rand(20,1);

Create the critic.

critic = rlQValueFunction({myBasisFcn,W0},...obsInfo,actInfo);

UsegetMaxQValueto return the maximum value, among the possible actions, given a random observation. Also return the index corresponding to the action that maximizes the value.

[v,i] = getMaxQValue(critic,{rand(3,1)})

v = 9.0719

i = 3

Create a batch set of 64 random independent observations. The third dimension is the batch size, while the fourth is the sequence length for any recurrent neural network used by the critic (in this case not used).

batchobs = rand(3,1,64,1);

Obtain maximum values for all the observations.

bv = getMaxQValue(critic,{batchobs}); size(bv)

ans =1×21 64

选择相关系数最大值esponding to the 44th observation.

bv(44)

ans = 10.4138

Input Arguments

collapse all

`qValueFcnObj`—Q-value function critic
`rlQValueFunction`object|`rlVectorQValueFunction`object

Q-value function critic, specified as anrlQValueFunctionorrlVectorQValueFunctionobject.

`obs`—Environment observations
cell array

Environment observations, specified as a cell array with as many elements as there are observation input channels. Each element ofobscontains an array of observations for a single observation input channel.

The dimensions of each element inobsareM_O-by-L_B-by-L_S, where:

M_Ocorresponds to the dimensions of the associated observation input channel.
L_Bis the batch size. To specify a single observation, setL_B= 1. To specify a batch of observations, specifyL_B> 1. IfqValueFcnObjhas multiple observation input channels, thenL_Bmust be the same for all elements ofobs.
L_Sspecifies the sequence length for a recurrent neural network. IfqValueFcnObjdoes not use a recurrent neural network, thenL_S= 1. IfqValueFcnObjhas multiple observation input channels, thenL_Smust be the same for all elements ofobs.

L_BandL_Smust be the same for bothactandobs.

For more information on input and output formats for recurrent neural networks, see the Algorithms section oflstmLayer.

Output Arguments

collapse all

`maxQ`— Maximum Q-value estimate
array

Maximum Q-value estimate across all possible discrete actions, returned as a 1-by-L_B-by-L_Sarray, where:

L_Bis the batch size.
L_Sspecifies the sequence length for a recurrent neural network. IfqValueFcnObjdoes not use a recurrent neural network, thenL_S= 1.

`maxActionIndex`— Action index
array

Action index corresponding to the maximum Q value, returned as a 1-by-L_B-by-L_Sarray, where:

L_Bis the batch size.
L_Sspecifies the sequence length for a recurrent neural network. IfqValueFcnObjdoes not use a recurrent neural network, thenL_S= 1.

`state`— Updated state of the critic
cell array

Updated state ofqValueFcnObj, returned as a cell array. IfqValueFcnObjdoes not use a recurrent neural network, thenstateis an empty cell array.

You can set the state of the critic tostateusing thesetStatefunction. For example:

qValueFcnObj = setState(qValueFcnObj,state);

Version History

Introduced in R2020a

getMaxQValue

Syntax

Description

Examples

Obtain Maximum Q-Value Function Estimates

Input Arguments

`qValueFcnObj`—Q-value function critic
`rlQValueFunction`object|`rlVectorQValueFunction`object

`obs`—Environment observations
cell array

Output Arguments

`maxQ`— Maximum Q-value estimate
array

`maxActionIndex`— Action index
array

`state`— Updated state of the critic
cell array

Version History

See Also

Topics

getMaxQValue

Syntax

Description

Examples

Obtain Maximum Q-Value Function Estimates

Input Arguments

qValueFcnObj—Q-value function criticrlQValueFunctionobject|rlVectorQValueFunctionobject

obs—Environment observationscell array

Output Arguments

maxQ— Maximum Q-value estimatearray

maxActionIndex— Action indexarray

state— Updated state of the criticcell array

Version History

See Also

Topics

`qValueFcnObj`—Q-value function critic
`rlQValueFunction`object|`rlVectorQValueFunction`object

`obs`—Environment observations
cell array

`maxQ`— Maximum Q-value estimate
array

`maxActionIndex`— Action index
array

`state`— Updated state of the critic
cell array