Main Content

Extend Tall Arrays with Other Products

Products Used:Statistics and Machine Learning Toolbox™, Database Toolbox™, Parallel Computing Toolbox™,MATLAB®Parallel Server™,MATLAB Compiler™

Several toolboxes enhance the capabilities of tall arrays. These enhancements include writing machine learning algorithms, integrating with big data systems, and deploying standalone apps.

Statistics and Machine Learning

Statistics and Machine Learning Toolbox enables you to perform advanced statistical calculations on tall arrays. Capabilities include:

  • K-means clustering

  • Linear regression fitting

  • Grouped statistics

  • Classification

SeeAnalysis of Big Data with Tall Arrays(Statistics and Machine Learning Toolbox)for more information.

Control Where Your Code Runs

When you execute calculations on tall arrays, the default execution environment uses either the local MATLAB session, or a local parallel pool if you have Parallel Computing Toolbox. Use themapreducerfunction to change the execution environment of tall arrays when using Parallel Computing Toolbox,MATLAB Parallel Server, orMATLAB Compiler:

  • Parallel Computing Toolbox — Run calculations in parallel using local or cluster workers to speed up large tall array calculations. SeeUse Tall Arrays on a Parallel Pool(Parallel Computing Toolbox)orProcess Big Data in the Cloud(Parallel Computing Toolbox)for more information.

  • MATLAB Parallel Server— Run tall array calculations on a cluster, including Apache Spark™ enabled Hadoop®clusters. This can significantly reduce the execution time of very large calculations. SeeUse Tall Arrays on a Spark Enabled Hadoop Cluster(Parallel Computing Toolbox)for more information.

  • MATLAB Compiler— Deploy MATLAB applications containing tall arrays as standalone apps on Apache Spark. SeeSpark Applications(MATLAB Compiler)for more information.

One of the benefits of developing your algorithms with tall arrays is that you only need to write the code once. You can develop your code locally, then usemapreducerto scale up and take advantage of the capabilities offered by Parallel Computing Toolbox,MATLAB Parallel Server, orMATLAB Compiler, without needing to rewrite your algorithm.

Note

Each tall array is bound to a single execution environment when it is constructed usingtall(ds). If that execution environment is later modified or deleted, then the tall array becomes invalid.

For this reason, each time you change the execution environment you must reconstruct the tall array.

Work with Databases

Database Toolbox enables you to create a tall table from aDatabaseDatastorethat is backed by data in a database. For more information, seeAnalyze Large Data in Database Using Tall Arrays(Database Toolbox).

Note

DatabaseDatastorehas these limitations:

  • DatabaseDatastoremust use the local MATLAB session as the execution environment. Set this environment using the commandmapreducer(0).

  • 独立应用程序包含高tha数组t useDatabaseDatastorecannot be deployed against Apache Spark usingMATLAB Compiler.

See Also

||

Related Topics