强化学习行动得到饱和的一个范围值

25日视图(30天)
你好,
我有一个强化学习env 4观察和6的行动。每个动作都有0.05的下限和上限为1。我发现在训练的动作越来越饱和值的一个乐队。
例如:指定的行动限制为0.05比1。但我看到动作输出在训练变化范围从0到0.16只,不出去的乐队
我附上了捕捉动作的输出在训练。
将下面的代码
clc;
清晰;
关闭;
%负荷模型的参数金宝app
SPWM_RL_Data;
%开放模型模金宝app型
mdl =“RL_Debug”;
open_system (mdl);
%创建环境接口
open_system (“RL_Debug /发射装置”);
%创建观测规范
numObservations = 4;
observationInfo = rlNumericSpec ([numObservations 1]);
observationInfo。Name =“观察”;
observationInfo。描述=“错误信号”;
%创建动作规范
numActions = 6;
actionInfo = rlNumericSpec (numActions [1],“LowerLimit”,(0.05;0.05;0.05;0.05;0.05;0.05),“UpperLimit”,(1;1;1;1;1;1]);
actionInfo。Name =“switchingPulses”;
%创建仿真软件环金宝app境的观察和操作规范
agentblk =“RL_Debug /发射装置/ RL代理”;
env = rl金宝appSimulinkEnv (mdl agentblk、observationInfo actionInfo);
%得到环境的观察和操作信息
%获得观察和操作规范
obsInfo = getObservationInfo (env);
actInfo = getActionInfo (env);
rng (0)%解决随机种子
statePath = [featureInputLayer numObservations,“归一化”,“没有”,“名字”,“状态”)
fullyConnectedLayer (64“名字”,“fc1”));
actionPath = [featureInputLayer numActions,“归一化”,“没有”,“名字”,“行动”)
fullyConnectedLayer (64“名字”,“取得”));
commonPath = [additionLayer (2,“名字”,“添加”)
reluLayer (“名字”,“relu2”)
fullyConnectedLayer (32,“名字”,“一个fc3”文件)
reluLayer (“名字”,“relu3”)
fullyConnectedLayer (16,“名字”,“fc4”)
fullyConnectedLayer (1,“名字”,“CriticOutput”));
criticNetwork = layerGraph ();
criticNetwork = addLayers (criticNetwork statePath);
criticNetwork = addLayers (criticNetwork actionPath);
criticNetwork = addLayers (criticNetwork commonPath);
criticNetwork = connectLayers (criticNetwork,“fc1”,“添加/三机一体”);
criticNetwork = connectLayers (criticNetwork,“取得”,“添加/ in2”);
%的创建一个表示评论家使用递归神经网络
criticOptions = rlRepresentationOptions (“LearnRate”1的军医,“GradientThreshold”1);
摘要= rlQValueRepresentation (criticNetwork observationInfo actionInfo,
“观察”,{“状态”},“行动”,{“行动”},criticOptions);
critic2 = rlQValueRepresentation (criticNetwork observationInfo actionInfo,
“观察”,{“状态”},“行动”,{“行动”},criticOptions);
actorNetwork = [featureInputLayer numObservations,“归一化”,“没有”,“名字”,“状态”)
fullyConnectedLayer (64“名字”,“actorFC1”)
reluLayer (“名字”,“relu1”)
fullyConnectedLayer (32,“名字”,“actorFC2”)
reluLayer (“名字”,“relu2”)
fullyConnectedLayer (numActions“名字”,“行动”)
tanhLayer (“名字”,“tanh1”)
scalingLayer (“名字”,“规模”,“规模”actionInfo.UpperLimit)];
actorOptions = rlRepresentationOptions (“LearnRate”1 e - 3,“GradientThreshold”,1“L2RegularizationFactor”,0.001);
演员= rlDeterministicActorRepresentation (actorNetwork observationInfo actionInfo,
“观察”,{“状态”},“行动”,{“规模”},actorOptions);
% Ts_agent = t;
agentOptions = rlTD3AgentOptions (“SampleTime”Ts_agent,
“DiscountFactor”,0.995,
“ExperienceBufferLength”2 e6,
“MiniBatchSize”,512,
“NumStepsToLookAhead”5,
“TargetSmoothFactor”,0.005,
“TargetUpdateFrequency”2);
agentOptions.ExplorationModel。方差= 0.05;
agentOptions.ExplorationModel。VarianceDecayRate = 2的军医;
agentOptions.ExplorationModel。VarianceMin = 0.001;
agentOptions.TargetPolicySmoothModel。方差= 0.1;
agentOptions.TargetPolicySmoothModel。VarianceDecayRate = 1的军医;
代理= rlTD3Agent(演员,[摘要,critic2], agentOptions);
% T = 1.0;
maxepisodes = 10000;
maxsteps =装天花板(Tf / Ts_agent);
trainingOpts = rlTrainingOptions (
“MaxEpisodes”maxepisodes,
“MaxStepsPerEpisode”maxsteps,
“StopTrainingCriteria”,“AverageReward”,
“StopTrainingValue”,8000,
“ScoreAveragingWindowLength”,100);
如果(doTraining)
trainStats =火车(代理,env, trainingOpts);
保存(“Agent.mat”,“代理”)
其他的
负载(“Agent.mat”)
结束
%模拟剂
rng (0);
simOptions = rlSimulationOptions (“MaxSteps”maxsteps,“NumSimulations”1);
sim (env,代理,simOptions);

接受的答案

Emmanouil Tzorakoleftherakis
Emmanouil Tzorakoleftherakis 2021年4月15日
扩展层不正确设置。你想规模(limit-lower上限),然后做出相应调整。
scalingLayer (“规模”actionInfo.UpperLimit-actionInfo.LowerLimit,“偏见”(actionInfo.UpperLimit-actionInfo.LowerLimit) / 2)

更多的答案(0)

社区寻宝

找到宝藏在MATLAB中央,发现社区如何帮助你!

开始狩猎!