Use Experiment Manager to Train Networks in Parallel

By default,Experiment Managerruns one trial of your experiment at a time on a single CPU. If you have Parallel Computing Toolbox™, you can configure your experiment to run multiple trials at the same time or to run a single trial at a time on multiple GPUs, on a cluster, or in the cloud.

Training Scenario Recommendation

Run multiple trials at the same time using one parallel worker for each trial.

Training Scenario	Recommendation
Run multiple trials at the same time using one parallel worker for each trial.	Set up your parallel environment, setModeto`Simultaneous`, and clickRun. Experiment Manager runs as many simultaneous trials as there are workers in your parallel pool. All other trials in your experiment are queued for later evaluation. Alternatively, to offload the experiment as a batch job, setModeto`Batch Simultaneous`, specify yourClusterandPool Size, and clickRun. For more information, seeOffload Experiments as Batch Jobs to Cluster. 实验管理器不支持金宝app`Simultaneous`or`Batch Simultaneous`execution when you set the training option`ExecutionEnvironment`to`"multi-gpu"`or`"parallel"`or when you enable the training option`DispatchInBackground`. Use these options to speed up your training only if you intend to run one trial of your experiment at a time.
Run a single trial at a time on multiple parallel workers.	Built-In Training Experiments: In the experiment setup function, set the training option`ExecutionEnvironment`to`"multi-gpu"`or`"parallel"`. For more information, seeScale Up Deep Learning in Parallel, on GPUs, and in the Cloud. If you are using a partitionable datastore, enable background dispatching by setting the training option`DispatchInBackground`to`true`. For more information, see使用数据存储并行训练和背景Dispatching. Set up your parallel environment, setModeto`Sequential`, and clickRun. Alternatively, to offload the experiment as a batch job, setModeto`Batch Sequential`, specify yourClusterandPool Size, and clickRun. Experiment Manager does not support this execution mode when you set the training option`ExecutionEnvironment`to`"multi-gpu"`. For more information, seeOffload Experiments as Batch Jobs to Cluster.
Custom Training Experiments: In the experiment training function, set up your parallel environment and use an`spmd`block to define a custom parallel training loop. For more information, seeCustom Training with Multiple GPUs in Experiment Manager. SetModeto`Sequential`and clickRun. Alternatively, to offload the experiment as a batch job, setModeto`Batch Sequential`, specify yourClusterandPool Size, and clickRun. For more information, seeOffload Experiments as Batch Jobs to Cluster.

Set up your parallel environment, setModetoSimultaneous, and clickRun. Experiment Manager runs as many simultaneous trials as there are workers in your parallel pool. All other trials in your experiment are queued for later evaluation.

Alternatively, to offload the experiment as a batch job, setModetoBatch Simultaneous, specify yourClusterandPool Size, and clickRun. For more information, seeOffload Experiments as Batch Jobs to Cluster.

实验管理器不支持金宝appSimultaneousorBatch Simultaneousexecution when you set the training optionExecutionEnvironmentto"multi-gpu"or"parallel"or when you enable the training optionDispatchInBackground. Use these options to speed up your training only if you intend to run one trial of your experiment at a time.

Run a single trial at a time on multiple parallel workers.

Built-In Training Experiments:

In the experiment setup function, set the training optionExecutionEnvironmentto"multi-gpu"or"parallel". For more information, seeScale Up Deep Learning in Parallel, on GPUs, and in the Cloud.

If you are using a partitionable datastore, enable background dispatching by setting the training optionDispatchInBackgroundtotrue. For more information, see使用数据存储并行训练和背景Dispatching.

Set up your parallel environment, setModetoSequential, and clickRun.

Alternatively, to offload the experiment as a batch job, setModetoBatch Sequential, specify yourClusterandPool Size, and clickRun. Experiment Manager does not support this execution mode when you set the training optionExecutionEnvironmentto"multi-gpu". For more information, seeOffload Experiments as Batch Jobs to Cluster.

Custom Training Experiments:

In the experiment training function, set up your parallel environment and use anspmdblock to define a custom parallel training loop. For more information, seeCustom Training with Multiple GPUs in Experiment Manager.

SetModetoSequentialand clickRun.

Alternatively, to offload the experiment as a batch job, setModetoBatch Sequential, specify yourClusterandPool Size, and clickRun. For more information, seeOffload Experiments as Batch Jobs to Cluster.

Tip

To run an experiment in parallel usingMATLAB^®Online™, you must have access to a Cloud Center cluster. For more information, seeUse Parallel Computing Toolbox with Cloud Center Cluster in MATLAB Online(Parallel Computing Toolbox).

Set Up Parallel Environment

Train on Multiple GPUs

If you have multiple GPUs, parallel execution typically increases the speed of your experiment. Using a GPU for deep learning requires Parallel Computing Toolbox and a supported GPU device. For more information, seeGPU Support by Release(Parallel Computing Toolbox).

For built-in training experiments, GPU support is automatic. By default, these experiments use a GPU if one is available.
For custom training experiments, computations occur on a CPU by default. To train on a GPU, convert your data togpuArrayobjects. To determine whether a usable GPU is available, call thecanUseGPUfunction.

For best results, before you run your experiment, create a parallel pool with as many workers as GPUs. You can check the number of available GPUs by using thegpuDeviceCount(Parallel Computing Toolbox)function.

numGPUs = gpuDeviceCount("available"); parpool(numGPUs)

Note

If you create a parallel pool on a single GPU, all workers share that GPU, so you do not get the training speed-up and you increase the chances of the GPU running out of memory.

Train on Cluster or in Cloud

If your experiments take a long time to run on your local machine, you can accelerate training by using a computer cluster on your onsite network or by renting high-performance GPUs in the cloud. After you complete the initial setup, you can run your experiments with minimal changes to your code. Working on a cluster or in the cloud requiresMATLAB Parallel Server™. For more information, see深度学习in the Cloud.