主要内容

Deploy Trained Reinforcement Learning Policies

Once you train a reinforcement learning agent, you can generate code to deploy the optimal policy. You can generate:

  • CUDA®code for deep neural network policies using GPU Coder™

  • C/C++ code for table, deep neural network, or linear basis function policies usingMATLAB®编码器™

Note

Code generation for deep neural network policies supports only networks with a single input layer.

有关培训加固学习代理的更多信息,请参阅Train Reinforcement Learning Agents.

要创建基于给定观察的策略评估功能,请使用生成policyfunction.command. This command generates a MATLAB script, which contains the policy evaluation function, and a MAT-file, which contains the optimal policy data.

You can generate code to deploy this policy function using GPU Coder orMATLAB Coder.

使用代码使用GPU Coder

If your trained optimal policy uses a deep neural network, you can generate CUDA code for the policy using GPU Coder. There are several required and recommended prerequisite products for generating CUDA code for deep neural networks. For more information, seeInstalling Prerequisite Products(GPU编码器)Setting Up the Prerequisite Products(GPU编码器).

Not all deep neural network layers support GPU code generation. For a list of supported layers, see金宝app支持的网络和层(GPU编码器). For more information and examples on GPU code generation, see与GPU编码器深入学习(GPU编码器).

GenerateCUDACode for Deep Neural Network Policy

例如,为培训的策略梯度代理生成GPU代码火车PG代理可以平衡车杆系统.

加载培训的代理。

load('matlabcartpolepg.mat','agent')

Create a policy evaluation function for this agent.

生成policyfunction(代理)

This command creates theequatepolicy.m.文件,包含策略函数,以及agentData.matfile, which contains the trained deep neural network actor. For a given observation, the policy function evaluates a probability for each potential action using the actor network. Then, the policy function randomly selects an action based on these probabilities.

Since the actor network for this PG agent has a single input layer and single output layer, you can generate code for this network using GPU Coder. For example, you can generate a CUDA compatible MEX function.

Configure thecodegen功能创建CUDA兼容C ++ MEX功能。

cfg = coder.gpuConfig('mex'); cfg.TargetLang ='C++'; cfg.DeepLearningConfig = coder.DeepLearningConfig('cudnn');

设置策略评估输入参数的尺寸,其对应于代理的观察规范维度。要找到观察尺寸,请使用getobservationInfo.功能。In this case, the observations are in a four-element vector.

argstr ='{ONEON(4,1)}';

Generate code using thecodegen功能。

codegen('-config','cfg','evaluatePolicy','-args',argstr,'-报告');

This command generates the MEX functionevaluatepolicy_mex..

使用代码使用MATLAB编码器

您可以使用表,深神经网络或线性基础函数策略生成C / C ++代码MATLAB Coder.

UsingMATLAB Coder,您可以生成:

Generate C++ Code for Deep Neural Network Policy

As an example, generate C code for the policy gradient agent trained in火车PG代理可以平衡车杆系统.

加载培训的代理。

load('matlabcartpolepg.mat','agent')

Create a policy evaluation function for this agent.

生成policyfunction(代理)

This command creates theequatepolicy.m.文件,包含策略函数,以及agentData.matfile, which contains the trained deep neural network actor. For a given observation, the policy function evaluates a probability for each potential action using the actor network. Then, the policy function randomly selects an action based on these probabilities.

Configure thecodegen生成适合定位静态库的代码的功能。

cfg = coder.config('lib');

在配置对象上,将目标语言设置为C ++,并设置DeeplearningConfigto the target library 'Mkldnn.'。此选项生成使用Intel Math Kernel库进行深度神经网络(Intel MKL-DNN)的代码。

cfg.targetlang =.'C++'; cfg.DeepLearningConfig = coder.DeepLearningConfig('mkldnn');

设置策略评估输入参数的尺寸,其对应于代理的观察规范维度。要找到观察尺寸,请使用getobservationInfo.功能。In this case, the observations are in a four-element vector.

argstr ='{ONEON(4,1)}';

Generate code using thecodegen功能。

codegen('-config','cfg','evaluatePolicy','-args',argstr,'-报告');

This command generates the C++ code for the policy gradient agent containing the deep neural network actor.

Generate C Code for Q Table Policy

As an example, generate C code for the Q-learning agent trained in在基本网格世界中列车加固学习代理.

加载培训的代理。

load('basicGWQAgent.mat','qAgent')

Create a policy evaluation function for this agent.

生成policyfunction.(qAgent)

This command creates theequatepolicy.m.文件,包含策略函数,以及agentData.mat文件,其中包含培训的Q表值函数。对于给定的观察,策略函数使用Q表查找每个潜在动作的值函数。然后,策略函数选择值函数最大的操作。

设置策略评估输入参数的尺寸,其对应于代理的观察规范维度。要找到观察尺寸,请使用getobservationInfo.功能。在这种情况下,有一个有限的观察。

argstr ='{[1]}';

Configure thecodegenfunction to generate embeddable C code suitable for targeting a static library, and set the output folder toBuildFolder..

cfg = coder.config('lib'); outFolder ='buildFolder';

使用使用的C代码codegen功能。

codegen('-C','-d',外档,'-config','cfg',...'evaluatePolicy','-args',argstr,'-报告');

See Also

相关话题