负荷训练强化学习多代理sim卡
7视图(30天)
显示旧的评论
你好,
我训练四个代理与强化学习的Q学习方法。培训后,训练有素的特工被加载到模拟,但是他们总是选择了相同的动作,保持不变,未能达到预期的效果在前面的培训。
这是我的代码
clc;
清晰;
mdl =“FOUR_DG_0331”;
open_system (mdl);
agentBlk = [“FOUR_DG_0331 / RL Agent1”,“FOUR_DG_0331 / RL Agent2”,“FOUR_DG_0331 / RL Agent3”,“FOUR_DG_0331 / RL Agent4”];
oInfo = rlFiniteSetSpec ([123456789]);
aInfo = rlFiniteSetSpec ([150160170]);
aInfo1 = rlFiniteSetSpec ([150170]);
obsInfos = {oInfo, oInfo、oInfo oInfo};
actInfos = {aInfo1, aInfo、aInfo aInfo};
env = rl金宝appSimulinkEnv (mdl agentBlk、obsInfos actInfos);
t = 0.01;
Tf = 4;
rng (0);
qTable1 = rlTable (oInfo aInfo1);
qTable2 = rlTable (oInfo aInfo);
qTable3 = rlTable (oInfo aInfo);
qTable4 = rlTable (oInfo aInfo);
criticOpts = rlRepresentationOptions (“LearnRate”,0.1);
摘要= rlQValueRepresentation (qTable1 oInfo、aInfo1 criticOpts);
Critic2 = rlQValueRepresentation (qTable2 oInfo、aInfo criticOpts);
Critic3 = rlQValueRepresentation (qTable3 oInfo、aInfo criticOpts);
Critic4 = rlQValueRepresentation (qTable4 oInfo、aInfo criticOpts);
% / *代理选项代码* * /
%……....
% ........
agent1 = rlQAgent(摘要、QAgent_opt);
agent2 = rlQAgent (Critic2 QAgent_opt);
agent3 = rlQAgent (Critic3 QAgent_opt);
agent4 = rlQAgent (Critic4 QAgent_opt);
trainOpts = rlTrainingOptions;
trainOpts。MaxEpisodes = 1000;
trainOpts。MaxStepsPerEpisode =装天花板(Tf / Ts);
trainOpts。StopTrainingCriteria =“EpisodeCount”;
trainOpts。StopTrainingValue = 1000;
trainOpts。SaveAgentCriteria =“EpisodeCount”;
trainOpts。SaveAgentValue = 15;
trainOpts。SaveAgentDirectory =“savedAgents”;
trainOpts。Verbose = false;
trainOpts。情节=“训练进步”;
doTraining = false;
如果doTraining
统计=火车([agent1、agent2 agent3, agent4], env, trainOpts);
其他的
加载(trainOpts。SaveAgentDirectory +“/ Agents16.mat”,“代理”);
simOpts = rlSimulationOptions (“MaxSteps”装天花板(Tf / Ts));
经验= sim (env, [agent1 agent2 agent3 agent4], simOpts)
结束
sim调用的结果是150年所有四个代理选择行动。代理不选择其他行动训练时一样。
´我不明白为什么…有人能帮帮我吗?
答案(2)
Ari Biswas
2021年4月16日
它可能意味着代理聚合非最优政策。你可以训练代理是否有改进。注意,你所看到的行为在培训探索与之关联。如果EpsilonGreedyExploration。ε参数并没有腐烂多少然后代理仍进行探索。这可能是一个原因,你看到一个sim行为差异。