Train Deep Learning Networks in Parallel
This example shows how to run multiple deep learning experiments on your local machine. Using this example as a template, you can modify the network layers and training options to suit your specific application needs. You can use this approach with a single or multiple GPUs. If you have a single GPU, the networks train one after the other in the background. The approach in this example enables you to continue using MATLAB® while deep learning experiments are in progress.
As an alternative, you can useExperiment Managerto interactively train multiple deep networks in parallel. For more information, seeUse Experiment Manager to Train Networks in Parallel.
Prepare Data Set
在运行示例之前,你必须访问ss to a local copy of a deep learning data set. This example uses a data set with synthetic images of digits from 0 to 9. In the following code, change the location to point to your data set.
datasetLocation = fullfile(matlabroot,'toolbox','nnet',...'nndemos','nndatasets','DigitDataset');
If you want to run the experiments with more resources, you can run this example in a cluster in the cloud.
Upload the data set to an Amazon S3 bucket. For an example, seeUpload Deep Learning Data to the Cloud.
Create a cloud cluster. In MATLAB, you can create clusters in the cloud directly from the MATLAB Desktop. For more information, seeCreate Cloud Cluster(Parallel Computing Toolbox).
Select your cloud cluster as the default, on theHometab, in theEnvironmentsection, selectParallel>Select a Default Cluster.
Load Data Set
Load the data set by using animageDatastore
object. Split the data set into training, validation, and test sets.
imds = imageDatastore(datasetLocation,...'IncludeSubfolders',true,...'LabelSource','foldernames'); [imdsTrain,imdsValidation,imdsTest] = splitEachLabel(imds,0.8,0.1);
To train the network with augmented image data, create anaugmentedImageDatastore
. Use random translations and horizontal reflections. Data augmentation helps prevent the network from overfitting and memorizing the exact details of the training images.
imageSize = [28 28 1]; pixelRange = [-4 4]; imageAugmenter = imageDataAugmenter(...'RandXReflection',true,...'RandXTranslation',pixelRange,...'RandYTranslation',pixelRange); augmentedImdsTrain = augmentedImageDatastore(imageSize,imdsTrain,...'DataAugmentation',imageAugmenter);
Train Networks in Parallel
Start a parallel pool with as many workers as GPUs. You can check the number of available GPUs by using thegpuDeviceCount
(Parallel Computing Toolbox)function. MATLAB assigns a different GPU to each worker. By default,parpool
uses your default cluster profile. If you have not changed the default, it islocal
. This example was run using a machine with 2 GPUs.
numGPUs = gpuDeviceCount("available"); parpool(numGPUs);
开始平行池(parpool)使用“本地”profile ... Connected to the parallel pool (number of workers: 2).
To send training progress information from the workers during training, use aparallel.pool.DataQueue
(Parallel Computing Toolbox)object. To learn more about how to use data queues to obtain feedback during training, see the exampleUse parfeval to Train Multiple Deep Learning Networks.
dataqueue = parallel.pool.DataQueue;
Define the network layers and training options. For code readability, you can define them in a separate function that returns several network architectures and training options. In this case,networkLayersAndOptions
returns a cell array of network layers and an array of training options of the same length. Open this example in MATLAB and then clicknetworkLayersAndOptions
to open the supporting functionnetworkLayersAndOptions
. Paste in your own network layers and options. The file contains sample training options that show how to send information to the data queue using an output function.
[layersCell,options] = networkLayersAndOptions(augmentedImdsTrain,imdsValidation,dataqueue);
Prepare the training progress plots, and set a callback function to update these plots after each worker sends data to the queue.preparePlots
andupdatePlots
are supporting functions for this example.
handles = preparePlots(numel(layersCell));
afterEach(dataqueue,@(data) updatePlots(handles,data));
To hold the computation results in parallel workers, use future objects. Preallocate an array of future objects for the result of each training.
trainingFuture(1:numel(layersCell)) = parallel.FevalFuture;
Loop through the network layers and options by using afor
loop, and useparfeval
(Parallel Computing Toolbox)to train the networks on a parallel worker. To request two output arguments fromtrainNetwork
, specify2
as the second input argument toparfeval
.
fori=1:numel(layersCell) trainingFuture(i) = parfeval(@trainNetwork,2,augmentedImdsTrain,layersCell{i},options(i));end
parfeval
does not block MATLAB, so you can continue working while the computations take place.
To fetch results from future objects, use thefetchOutputs
function. For this example, fetch the trained networks and their training information.fetchOutputs
blocks MATLAB until the results are available. This step can take a few minutes.
[network,trainingInfo] = fetchOutputs(trainingFuture);
Save the results to disk using thesave
function. To load the results again later, use theload
function. Usesprintf
anddatetime
to name the file using the current date and time.
filename = sprintf('experiment-%s',datetime('now','Format','yyyyMMdd''T''HHmmss')); save(filename,'network','trainingInfo');
Plot Results
After the networks complete training, plot their training progress by using the information intrainingInfo
.
Use subplots to distribute the different plots for each network. For this example, use the first row of subplots to plot the training accuracy against the number of epoch along with the validation accuracy.
figure('Units','normalized','Position',[0.1 0.1 0.6 0.6]); title('Training Progress Plots');fori=1:numel(layersCell) subplot(2,numel(layersCell),i); holdon; gridon; ylim([0 100]); iterationsPerEpoch = floor(augmentedImdsTrain.NumObservations/options(i).MiniBatchSize); epoch = (1:numel(trainingInfo(i).TrainingAccuracy))/iterationsPerEpoch; plot(epoch,trainingInfo(i).TrainingAccuracy); plot(epoch,trainingInfo(i).ValidationAccuracy,'.k','MarkerSize',10);endsubplot(2,numel(layersCell),1), ylabel('Accuracy');
Then, use the second row of subplots to plot the training loss against the number of epoch along with the validation loss.
fori=1:numel(layersCell) subplot(2,numel(layersCell),numel(layersCell) + i); holdon; gridon; ylim([0 max([trainingInfo.TrainingLoss])]); iterationsPerEpoch = floor(augmentedImdsTrain.NumObservations/options(i).MiniBatchSize); epoch = (1:numel(trainingInfo(i).TrainingAccuracy))/iterationsPerEpoch; plot(epoch,trainingInfo(i).TrainingLoss); plot(epoch,trainingInfo(i).ValidationLoss,'.k','MarkerSize',10); xlabel('Epoch');endsubplot(2,numel(layersCell),numel(layersCell)+1), ylabel('Loss');
After you choose a network, you can useclassify
and obtain its accuracy on the test dataimdsTest
.
See Also
Experiment Manager|augmentedImageDatastore
|imageDatastore
|parfeval
(Parallel Computing Toolbox)|fetchOutputs
|trainNetwork
|trainingOptions
Related Examples
- Train Network Using Automatic Multi-GPU Support
- Use parfeval to Train Multiple Deep Learning Networks
- Use Experiment Manager to Train Networks in Parallel