主要内容

Example on Running a StandaloneMATLABMapReduce Application

Supported Platform:Linux®only.

This example shows you how to create a standalone MATLAB®MapReduce应用程序使用MCCcommand and run it against a Hadoop®cluster.

目标:计算从给定数据集的航空公司的最大到达延迟。

Dataset: AirlinesMall.CSV
Description:

1987 - 2008年的航空公司出发和到达信息。

Location: /usr/local/matlab/R2022a/toolbox/matlab/demos

Prerequisites

  1. 通过创建一个新的工作文件夹来开始此示例,该文件夹可见MATLAB搜索路径。

  2. 在启动MATLAB之前,在终端设置环境变量hadoop_prefix指向Hadoop安装文件夹。例如:

    Shell Command
    csh / tcsh

    % setenv HADOOP_PREFIX /usr/lib/hadoop

    bash

    $ export hadoop_prefix =/usr/lib/hadoop

    Note

    This example uses/usr/lib/hadoopas directory where Hadoop is installed. Your Hadoop installation directory maybe different.

    如果您忘记设置hadoop_prefixenvironment variable prior to starting MATLAB, set it up using the MATLAB functionsetenvat the MATLAB command prompt as soon as you start MATLAB. For example:

    setenv('hadoop_prefix','/usr/lib/hadoop')

  3. 安装MATLAB运行时in a folder that is accessible by every worker node in the Hadoop cluster. This example uses/usr/local/matlab/matlab_runtime/V912as the location of theMATLAB运行时文件夹中。

    如果you don’t have theMATLAB运行时, you can download it from the website at://www.tatmou.com/products/compiler/mcr.

    Note

    有关信息MATLAB运行时版本编号对应MATLAB版本,请参阅此信息list.

  4. Copy the map functionmaxarrivalDelayMapper.m/usr/local/matlab/R2022a/toolbox/matlab/demosfolder to the work folder.

    maxarrivalDelayMapper.m

    有关更多信息,请参阅Write a Map Function.

  5. Copy the reduce functionmaxArrivalDelayReducer.mMatlabroot/toolbox/matlab/demosfolder to the work folder.

    maxArrivalDelayReducer.m

    有关更多信息,请参阅Write a Reduce Function.

  6. 创建目录/user//数据集在HDFS™上并复制文件AirlinesMall.CSV到该目录。这里在HDFS中指您的用户名。

    $./hadoop fs -copyfromlocal airlinesmall.csv hdfs://主机:54310/user//数据集

Procedure

  1. 启动matlab并确认hadoop_prefix已经设置了环境变量。在命令提示符下,键入:

    >> getenv('hadoop_prefix')

    如果Ans是空的,请查看Prerequisitessection above to see how you can set thehadoop_prefixenvironment variable.

  2. 创建一个名称的新MATLAB脚本depMapRedStandAlone.m. You will add the code listed in the steps listed below to this script file.

  3. Create adatastorethat points to the airline data in Hadoop Distributed File System (HDFS) .

    ds = datastore('hdfs:///user/username/datasets/airlinesmall.csv',...'TreatAsMissing','na',...'SelectedVariableNames',{“唯一载体”,'arrdelay'});

    有关更多信息,请参阅Work with Remote Data.

  4. 使用默认设置配置针对Hadoop的部署应用程序。

    config = matlab.mapreduce.DeployHadoopMapReducer;

    班上matlab.mapreduce.DeployHadoopMapReducercan be used to configure a standalone application based on the Hadoop environment where it is going to be deployed.

    例如,如果要指定MATLAB运行时在集群上的每个工人节点上,包括类似的代码行:

    config = matlab.mapreduce.DeployHadoopMapReducer('MCRRoot','/opt/MATLAB/MATLAB_Runtime/V912');
    在这种情况下,我们假设MATLAB运行时安装在非默认位置,例如/opt/MATLAB/MATLAB_Runtimeon the worker nodes.

    For information on specifying additional cluster specific properties, seematlab.mapreduce.DeployHadoopMapReducer.

    Note

    Specifying aMATLAB运行时位置作为班级的一部分matlab.mapreduce.DeployHadoopMapReducerwill override anyMATLAB运行时执行独立应用程序期间指定的位置。

  5. Define the execution environment using theMapReducer.

    MR = MapReducer(config);
  6. 应用MapReduce功能.

    result = mapreduce(...DS,...@maxarrivalDelayMapper,@maxarrivalDelayReducer,...mr,...'OutputType','Binary',...'OutputFolder','hdfs:/// user//结果/myResults');

    Note

    An HDFS directory such as.../myresultscan be written to only once. If you plan on running your standalone application multiple times against the Hadoop cluster, make sure you delete the.../myresults在每个执行之前,HDFS上的目录。另一个选择是更改.../myresultsMATLAB代码中的目录并重新编译应用程序。

  7. Read the result from the resulting datastore.

    myAppResult = readall(result)
  8. Use theMCCcommand with the-m标志以创建独立应用程序。

    MCC-mdepMapRedStandAlone.m

    The-mflag creates a standard executable that can be run from a command line. However, theMCCcommand cannot package the results in an installer.

  9. Run the standalone application from a Linux shell using the following command:

    $./run_depMapRedStandAlone.sh /usr/local/MATLAB/MATLAB_Runtime/V912

    /usr/local/matlab/matlab_runtime/V912is an argument indicating the location of theMATLAB运行时.

    在执行上述命令之前,请验证hadoop_prefixenvironment variable is set in the Terminal by typing:

    $echo $ hadoop_prefix
    如果echo空无一人,看看Prerequisitessection above to see how you can set thehadoop_prefixenvironment variable.

    Your application will fail to execute if thehadoop_prefixenvironment variable is not set.

  10. You will see the following output:

    myAppResult = Key Value _________________ ______ 'MaxArrivalDelay' [1014]

Other examples of地图reduce功能s are available attoolbox/matlab/demos文件夹中。您可以使用其他的例子to prototype similar standalone applications that run against Hadoop. For more information, see用MapReduce构建有效的算法.

Complete code for the standalone applicationdepMapRedStandAlone可以在这里找到:

depMapRedStandAlone.m

See Also

||||

相关话题