classdef eGreedy <处理% eGreedy实现ε贪婪策略Multi-Armed强盗属性n_arms %武器数量ε%的比例勘探试验%柜台试验奖励%累积奖励获得分数%跟踪预期回报方法函数结束自我= eGreedy (n_armsε)%初始化对象的自我。n_arms = n_arms;%的武器数量的自我。试验= 0 (1、n_arms);%初始化空向量自我。奖励= 0 (1、n_arms);%初始化与0自我。成绩= 1 (1、n_arms);%初始化ε1如果输入参数个数= = 1%没有自我。ε= 0.1;%使用默认0.1其他%否则自我。ε=ε;%使用用户输入结束结束函数s =分数(自我、手臂)%计算期望的奖励s =自我。分数(最终:);%延续当前的分数s(手臂)= (self.rewards (arm) + 1)……%更新分数/ (self.trials (arm) + 1);结束函数选择=选择(自我)%选择一个手臂如果兰德< =自我。ε= %(ε)乘以选择兰迪(self.n_arms); % explore all arms else % (1 - epsilon) times [~, choice] = ... % exploit the best arm max(self.scores(end,:)); end end function update(self, arm, reward) % update data self.trials(arm) = ... % increment counter self.trials(arm) + 1; self.rewards(arm) = ... % increment reward self.rewards(arm) + reward; self.scores(end + 1,:) = ... % append scores self.score(arm); end function plot(self,title_str,options) % plot scores n = size(self.scores,1); % number of trials figure % new figure plot(1:n, self.scores) % plot xlim([0 n]) % x-axis limits title(title_str) % add title xlabel('Trials') % add x-axis label ylabel('Expected Reward') % add y-axis label legend(options) % add legend end end end