Example on Running a StandaloneMATLABMapReduce Application
Supported Platform:Linux®only.
This example shows you how to create a standalone MATLAB®MapReduce应用程序使用MCC
command and run it against a Hadoop®cluster.
目标:计算从给定数据集的航空公司的最大到达延迟。
Dataset: | AirlinesMall.CSV |
Description: | 1987 - 2008年的航空公司出发和到达信息。 |
Location: | /usr/local/matlab/R2022a/toolbox/matlab/demos |
Prerequisites
通过创建一个新的工作文件夹来开始此示例,该文件夹可见MATLAB搜索路径。
在启动MATLAB之前,在终端设置环境变量
hadoop_prefix
指向Hadoop安装文件夹。例如:Shell Command csh / tcsh % setenv HADOOP_PREFIX /usr/lib/hadoop
bash $ export hadoop_prefix =/usr/lib/hadoop
Note
This example uses
/usr/lib/hadoop
as directory where Hadoop is installed. Your Hadoop installation directory maybe different.如果您忘记设置
hadoop_prefix
environment variable prior to starting MATLAB, set it up using the MATLAB functionsetenv
at the MATLAB command prompt as soon as you start MATLAB. For example:setenv('hadoop_prefix','/usr/lib/hadoop')
安装MATLAB运行时in a folder that is accessible by every worker node in the Hadoop cluster. This example uses
/usr/local/matlab/matlab_runtime/V912
as the location of theMATLAB运行时文件夹中。如果you don’t have theMATLAB运行时, you can download it from the website at:
//www.tatmou.com/products/compiler/mcr
.Note
有关信息MATLAB运行时版本编号对应MATLAB版本,请参阅此信息list.
Copy the map function
maxarrivalDelayMapper.m
从/usr/local/matlab/R2022a/toolbox/matlab/demos
folder to the work folder.有关更多信息,请参阅Write a Map Function.
Copy the reduce function
maxArrivalDelayReducer.m
从
folder to the work folder.Matlabroot
/toolbox/matlab/demos有关更多信息,请参阅Write a Reduce Function.
创建目录
/user/
在HDFS™上并复制文件
/数据集AirlinesMall.CSV
到该目录。这里
在HDFS中指您的用户名。$./hadoop fs -copyfromlocal airlinesmall.csv hdfs://主机:54310/user/
/数据集
Procedure
启动matlab并确认
hadoop_prefix
已经设置了环境变量。在命令提示符下,键入:>> getenv('hadoop_prefix')
如果
Ans
是空的,请查看Prerequisitessection above to see how you can set thehadoop_prefix
environment variable.创建一个名称的新MATLAB脚本
depMapRedStandAlone.m
. You will add the code listed in the steps listed below to this script file.Create a
datastore
that points to the airline data in Hadoop Distributed File System (HDFS) .ds = datastore('hdfs:///user/username/datasets/airlinesmall.csv',...'TreatAsMissing','na',...'SelectedVariableNames',{“唯一载体”,'arrdelay'});
有关更多信息,请参阅Work with Remote Data.
使用默认设置配置针对Hadoop的部署应用程序。
config = matlab.mapreduce.DeployHadoopMapReducer;
班上
matlab.mapreduce.DeployHadoopMapReducer
can be used to configure a standalone application based on the Hadoop environment where it is going to be deployed.例如,如果要指定MATLAB运行时在集群上的每个工人节点上,包括类似的代码行:
config = matlab.mapreduce.DeployHadoopMapReducer('MCRRoot','/opt/MATLAB/MATLAB_Runtime/V912');
/opt/MATLAB/MATLAB_Runtime
on the worker nodes.For information on specifying additional cluster specific properties, see
matlab.mapreduce.DeployHadoopMapReducer
.Note
Specifying aMATLAB运行时位置作为班级的一部分
matlab.mapreduce.DeployHadoopMapReducer
will override anyMATLAB运行时执行独立应用程序期间指定的位置。Define the execution environment using the
MapReducer
.MR = MapReducer(config);
应用
MapReduce
功能.result = mapreduce(...DS,...@maxarrivalDelayMapper,@maxarrivalDelayReducer,...mr,...'OutputType','Binary',...'OutputFolder','hdfs:/// user/
/结果/myResults' );Note
An HDFS directory such as
.../myresults
can be written to only once. If you plan on running your standalone application multiple times against the Hadoop cluster, make sure you delete the.../myresults
在每个执行之前,HDFS上的目录。另一个选择是更改.../myresults
MATLAB代码中的目录并重新编译应用程序。Read the result from the resulting datastore.
myAppResult = readall(result)
Use the
MCC
command with the-m
标志以创建独立应用程序。MCC-mdepMapRedStandAlone.m
The
-m
flag creates a standard executable that can be run from a command line. However, theMCC
command cannot package the results in an installer.Run the standalone application from a Linux shell using the following command:
$./run_depMapRedStandAlone.sh /usr/local/MATLAB/MATLAB_Runtime/V912
/usr/local/matlab/matlab_runtime/V912
is an argument indicating the location of theMATLAB运行时.在执行上述命令之前,请验证
hadoop_prefix
environment variable is set in the Terminal by typing:$echo $ hadoop_prefix
echo
空无一人,看看Prerequisitessection above to see how you can set thehadoop_prefix
environment variable.Your application will fail to execute if the
hadoop_prefix
environment variable is not set.You will see the following output:
myAppResult = Key Value _________________ ______ 'MaxArrivalDelay' [1014]
Other examples of地图
和reduce
功能s are available attoolbox/matlab/demos
文件夹中。您可以使用其他的例子to prototype similar standalone applications that run against Hadoop. For more information, see用MapReduce构建有效的算法.
Complete code for the standalone applicationdepMapRedStandAlone
可以在这里找到:
See Also
datastore
|Tabulartextdatastore
|KeyValueDatastore
|matlab.mapreduce.DeployHadoopMapReducer
|MCC