Big Data Processing
mapreduce
, on Spark®and Hadoop®clustersYou can use Parallel Computing Toolbox™ to distribute large arrays in parallel across multiple MATLAB® workers, so that you can run big-data applications that use the combined memory of your cluster. You operate on the entire array as a single entity, however, workers operate only on their part of the array, and automatically transfer data between themselves when necessary. Parallel Computing Toolbox also enables you to execute MATLAB®tall array anddatastore
calculations in parallel, so that you can analyze big data sets that do not fit in the memory of your cluster. You can useMATLAB Parallel Server™to run tall array anddatastore
calculations in parallel on Spark enabled Hadoop clusters. Doing so significantly reduces the execution time of very large data calculations.
Categories
- Distributed Arrays
Analyze big data sets in parallel using distributed arrays and simultaneous execution - Tall Arrays and mapreduce
Analyze big data sets in parallel using MATLAB tall arrays and datastores ormapreduce
on Spark and Hadoop clusters, and parallel pools