主要内容

rlfunctionenv

Specify custom reinforcement learning environment dynamics using functions

描述

利用rlfunctionenvto define a custom reinforcement learning environment. You provide MATLAB®定义环境的步骤和重置行为的功能。当您想自定义环境超出可用的预定义环境时,此对象很有用rlPredefinedEnv.

Creation

描述

example

env= rlFunctionEnv(obsInfo,actInfo,stepfcn,重置fcn)creates a reinforcement learning environment using the provided observation and action specifications,obsInfoactInfo, 分别。您还设置了StepFcn重置properties using MATLAB functions.

Input Arguments

expand all

观察规范, specified as anrlFiniteSetSpec或者rlNumericSpecobject or an array containing a mix of such objects. These objects define properties such as the dimensions, data types, and names of the observation signals.

Action specification, specified as anrlFiniteSetSpec或者rlNumericSpec目的。这些对象定义了诸如操作信号的尺寸,数据类型和名称之类的属性。

Properties

expand all

Step behavior for the environment, specified as a function name, function handle, or anonymous function.

StepFcn是您提供的功能,描述了环境如何从给定的动作中前进到下一个状态。当使用函数名称或函数句柄时,此函数必须具有两个输入和四个输出,如以下签名所示。

[观察,奖励,iSdone,登录标志] = mystepfunction(动作,记录标志)

To use additional input arguments beyond the required set, specifyStepFcn使用匿名函数句柄。

步骤函数计算环境中给定动作的观察值和奖励的值。所需的输入和输出参数如下。

  • Action— Current action, which must match the dimensions and data type specified inactInfo.

  • Observation— Returned observation, which must match the dimensions and data types specified inobsInfo.

  • Reward- 当前步骤的奖励,作为标量值返回。

  • IsDone— Logical value indicating whether to end the simulation episode. The step function that you define can include logic to decide whether to end the simulation based on the observation, reward, or any other values.

  • 记录标志— Any data that you want to pass from one step to the next, specified as a structure.

For an example showing multiple ways to define a step function, see使用自定义功能创建MATLAB环境.

重置环境的行为,指定为函数,函数句柄或匿名函数句柄。

您提供的重置函数必须没有输入和两个输出,如以下签名所示。

[InitialObservation,LoggedSignals] = myResetFunction

要使用您的重置函数使用输入参数,请指定重置使用匿名函数句柄。

重置函数将环境设置为初始状态,并计算观察信号的初始值。例如,您可以创建一个重置函数,该函数随机将某些状态值随机,以便每个训练情节从不同的初始条件开始。

Thesim函数调用重置函数以在每个模拟开始时重置环境,然后train功能在每个培训剧集的开始时称为。

TheInitialObservationoutput must match the dimensions and data type ofobsInfo.

To pass information from the reset condition into the first step, specify that information in the reset function as the output structure记录标志.

For an example showing multiple ways to define a reset function, see使用自定义功能创建MATLAB环境.

Information to pass to the next step, specified as a structure. When you create the environment, whatever you define as the记录标志output of重置初始化此属性。当发生步骤时,该软件将其属性带有数据以传递到下一步的情况,如所定义StepFcn.

Object Functions

getActioninfo Obtain action data specifications from reinforcement learning environment or agent
GetObservationinfo 从增强学习环境或代理中获取观察数据规格
train 火车在speci强化学习代理fied environment
sim 在指定环境中模拟训练有素的加固学习剂
validateEnvironment Validate custom reinforcement learning environment

例子

全部收缩

Create a reinforcement learning environment by supplying custom dynamic functions in MATLAB®. Usingrlfunctionenv,您可以从观察规范,行动规范和step重置您定义的功能。

For this example, create an environment that represents a system for balancing a cart on a pole. The observations from the environment are the cart position, cart velocity, pendulum angle, and pendulum angle derivative. (For additional details about this environment, see使用自定义功能创建MATLAB环境。)为这些信号创建观察规范。

oinfo = rlNumericSpec([4 1]); oinfo.Name ='CartPole States';oinfo.Description ='x, dx, theta, dtheta';

The environment has a discrete action space where the agent can apply one of two possible force values to the cart, –10 N or 10 N. Create the action specification for those actions.

ActionInfo = rlFiniteSetSpec([-10 10]); ActionInfo.Name =“ Cartpole Action”;

Next, specify the customstep重置functions. For this example, use the supplied functionsmyResetFunction.mmyStepFunction.m. For details about these functions and how they are constructed, see使用自定义功能创建MATLAB环境.

使用定义的观察规范,操作规范和功能名称构建自定义环境。

env = rlFunctionEnv(oinfo,ActionInfo,“神秘函数”,'myResetFunction');

You can create agents forenv和train them within the environment as you would for any other reinforcement learning environment.

作为使用函数名称的替代方法,您可以将函数指定为函数句柄。有关更多详细信息和示例,请参见使用自定义功能创建MATLAB环境.

Version History

Introduced in R2019a