交互式地创建和强化学习培训代理

设计、训练和模拟强化学习代理使用可视交互式工作流在强化学习设计器中应用。使用这个应用程序设置强化学习的强化学习问题没有编写MATLAB工具箱™^®代码。工作在整个强化学习工作流程:

导入一个现有环境的应用程序
导入或创建一个新的代理为您的环境和选择适当的hyperparameters代理
使用默认的神经网络结构由强化学习工具箱或导入自定义架构
火车在单个或多个代理工人和受托人对环境模拟训练
分析仿真结果和完善代理参数•出口最终代理为进一步使用MATLAB工作区和部署

R2021a释放MATLAB,强化学习工具箱可以交互式地设计、训练、模拟RL代理与新强化学习设计师应用。打开应用程序从命令行或MATLAB将来发布。首先,您需要创建代理对象的环境培训。强化学习设计允许您导入环境对象从MATLAB工作区,选择一些预定义的环境,或创建自己的自定义环境。对于这个示例,让我们创建一个预定义cart-pole MATLAB环境离散行动空间和我们还将导入一个定制的一只四足机器人的仿真软件环境的连续行动空间MATLAB工作区。金宝app您可以删除或重命名环境对象的环境窗格中根据需要,你可以把观察和行动空间的尺寸在预览窗格中。创建一个代理,代理部分在强化学习中单击New选项卡。根据所选择的环境,观察和操作空间的本质,兼容的应用程序将显示一个列表内置的训练算法。对于这个演示,我们将选择DQN算法。应用程序将生成一个DQN代理和一个默认的评论家架构。你可以调整所需的一些评论家的默认值在创建代理。 The new agent will appear in the Agents pane and the Agent Editor will show a summary view of the agent and available hyperparameters that can be tuned. For example let’s change the agent’s sample time and the critic’s learn rate. Here, we can also adjust the exploration strategy of the agent and see how exploration will progress with respect to number of training steps. To view the critic default network, click View Critic Model on the DQN Agent tab. The Deep Learning Network Analyzer opens and displays the critic structure. You can change the critic neural network by importing a different critic network from the workspace. You can also import a different set of agent options or a different critic representation object altogether. Click Train to specify training options such as stopping criteria for the agent. Here, let’s set the max number of episodes to 1000 and leave the rest to their default values. To parallelize training click on the Use Parallel button. Parallelization options include additional settings such as the type of data workers will send back, whether data will be sent synchronously or not and more. After setting the training options, you can generate a MATLAB script with the specified settings that you can use outside the app if needed. To start training, click Train. During the training process, the app opens the Training Session tab and displays the training progress. If visualization of the environment is available, you can also view how the environment responds during training. You can stop training anytime and choose to accept or discard training results. Accepted results will show up under the Results Pane and a new trained agent will also appear under Agents. To simulate an agent, go to the Simulate tab and select the appropriate agent and environment object from the drop-down list. For this task, let’s import a pretrained agent for the 4-legged robot environment we imported at the beginning. Double click on the agent object to open the Agent editor. You can see that this is a DDPG agent that takes in 44 continuous observations and outputs 8 continuous torques. In the Simulate tab, select the desired number of simulations and simulation length. If you need to run a large number of simulations, you can run them in parallel. After clicking Simulate, the app opens the Simulation Session tab. If available, you can view the visualization of the environment at this stage as well. When the simulations are completed, you will be able to see the reward for each simulation as well as the reward mean and standard deviation. Remember that the reward signal is provided as part of the environment. To analyze the simulation results, click on Inspect Simulation Data. In the Simulation Data Inspector you can view the saved signals for each simulation episode. If you want to keep the simulation results click accept. When you finish your work, you can choose to export any of the agents shown under the Agents pane. For convenience, you can also directly export the underlying actor or critic representations, actor or critic neural networks, and agent options. To save the app session for future use, click Save Session on the Reinforcement Learning tab. For more information please refer to the documentation of Reinforcement Learning Toolbox.

强化学习工具箱

下一个:

点56

交互式地创建一个基本的情节

交互式地创建和强化学习培训代理

相关产品下载188bet金宝搏

强化学习工具箱

下一个:

相关视频: