主要内容

使用MapReduce计算平均值

此示例显示如何在使用数据集中计算单个变量的平均值mapreduce。它展示了简单的使用mapreduce具有一个键,最小计算和中间状态(累积中间和和计数)。

准备数据

Create a datastore using theairlinesmall.csvdata set. This 12-megabyte data set contains 29 columns of flight information for several airline carriers, including arrival and departure times. In this example, selectarrdelay.(航班到达延迟)作为感兴趣的变量。

ds = tabulartextdataStore('airlinesmall.csv''尾声''na');ds.selectedvariablenames ='arrdelay';

数据购物态度对待'na'缺少的值,并用缺少缺失值默认值。此外,这是sedicentvariablenames.属性允许您只使用所选择的感兴趣变量,您可以验证预览

预览(DS)
ANS =.8×1表Arrdelay ________ 8 8 21 13 4 59 3 11

运行mapreduce.

mapreduce功能requires a map function and a reduce function as inputs. The mapper receives blocks of data and outputs intermediate results. The reducer reads the intermediate results and produces a final result.

在此示例中,映射器在每个数据块中查找到达延迟的计数和总和。然后,映射器将这些值存储为与密钥相关联的中间值“partialcountsumdelay”

显示地图函数文件。

功能meanArrivalDelayMapper (data, info, intermKVStore)%数据是Arrdelay的n×1表。首先删除缺失值:data(isnan(data.ArrDelay),:) = [];%记录部分计数和总和,还原器将累积它们。partcountsum = [length(data.arrdelay),sum(data.arrdelay)];添加(Intermkvstore,“partialcountsumdelay”那partCountSum);结尾

Reducer接受映射器存储的每个块的计数和总和。它总结了以获取总计数和总和的值。总体平均到达延迟是值的简单划分。mapreduce只调用这个减速机一次那since the mapper only adds a single unique key. The reducer uses添加将单个键值对添加到输出。

显示regume函数文件。

功能易于滑动设备(Intermkey,Intermvaliter,Outkvstore)Count = 0;总和= 0;尽管hasnext(Intermvaliter)countsum = getnext(Intermvalirt);count = count + countsum(1);总和= sum + countsum(2);结尾含义=总和/计数;%键值对添加到outkvstore将成为mapReduce的输出添加(outkvstore,“幻程类了”那卑鄙);结尾

采用mapreduceto apply the map and reduce functions to the datastore,DS.

卑鄙= mapreduce(ds, @meanArrivalDelayMapper, @meanArrivalDelayReducer);
******************************** * mapreduce进展* ********************************地图0%减少0%地图16%减少0%映射32%减少0%地图48%减少0%地图65%减少0%地图81%减少0%图97%减少0%图100%减少0%图100%减少100%

mapreduce返回一个数据存储,卑鄙那with files in the current folder.

从输出数据存储读取最终结果,卑鄙

readall(卑鄙)
ANS =.1×2表键值______________________________ {visharivaldelay'} {[7.1201]}

本地功能

在此列出的是地图并减少函数mapreduce适用于数据。

功能meanArrivalDelayMapper (data, info, intermKVStore)%数据是Arrdelay的n×1表。首先删除缺失值:data(isnan(data.ArrDelay),:) = [];%记录部分计数和总和,还原器将累积它们。partcountsum = [length(data.arrdelay),sum(data.arrdelay)];添加(Intermkvstore,“partialcountsumdelay”那partCountSum);结尾%-------------------------------------------------------------------功能易于滑动设备(Intermkey,Intermvaliter,Outkvstore)Count = 0;总和= 0;尽管hasnext(Intermvaliter)countsum = getnext(Intermvalirt);count = count + countsum(1);总和= sum + countsum(2);结尾含义=总和/计数;%键值对添加到outkvstore将成为mapReduce的输出添加(outkvstore,“幻程类了”那卑鄙);结尾%-------------------------------------------------------------------

也可以看看

|

Related Topics