通过预验证的音频网络转移学习

此示例使用：

打开实时脚本

此示例显示了如何使用转移学习来重新验证Yamnet（预验证的卷积神经网络）来对一组新的音频信号进行分类。要开始从头开始的音频深度学习，请参阅Classify Sound Using Deep Learning。

转移学习通常用于深度学习应用中。您可以进行验证的网络，并将其用作学习新任务的起点。通过转移学习，对网络进行微调通常比训练从头开始的随机初始化权重的网络更快，更容易。您可以使用较少数量的培训信号将学习的功能快速传输到新任务。

Audio Toolbox™ additionally provides theclassifySound功能，该功能实现了YAMNET的必要预处理和方便的后处理以解释结果。音频工具箱还提供了预验证的VGGISH网络（vggish）以及vggishfeaturesfunction, which implements preprocessing and postprocessing for the VGGish network.

Create Data

产生100个白噪声信号，100个棕色噪声信号和100个粉红色噪声信号。每个信号的持续时间为0.98秒，假设采样率为16 kHz。

fs = 16e3; duration = 0.98; N = duration*fs; numSignals = 100; wNoise = 2*rand([N,numSignals]) - 1; wLabels = repelem(categorical("white"），数字信号，1）;bNoise = filter(1,[1,-0.999],wNoise); bNoise = bNoise./max(abs(bNoise),[],'all'）；blabels = retelem（分类（“棕色”），数字信号，1）;pnoise = pinknoise（[n，numsignals]）;plabels = retelem（分类（"pink"），数字信号，1）;

Split the data into training and test sets. Normally, the training set consists of most of the data. However, to illustrate the power of transfer learning, you will use only a few samples for training and the majority for validation.

k =5; trainAudio = [wNoise(:,1:K),bNoise(:,1:K),pNoise(:,1:K)]; trainLabels = [wLabels(1:K);bLabels(1:K);pLabels(1:K)]; validationAudio = [wNoise(:,K+1:end),bNoise(:,K+1:end),pNoise(:,K+1:end)]; validationLabels = [wLabels(K+1:end);bLabels(K+1:end);pLabels(K+1:end)]; fprintf(“火车集中的每个噪声颜色样品=％d \ n”+。。。"Number of samples per noise color in validation set = %d\n"，k，numsignals-k）;

火车集中每个噪声颜色的样品数量=验证集中的每个噪声颜色的样品数量= 95

Extract Features

利用melSpectrogramto extract log-mel spectrograms from both the training set and the validation set using the same parameters as the YAMNet model was trained on.

FFTLength = 512; numBands = 64; frequencyRange = [125 7500]; windowLength = 0.025*fs; overlapLength = 0.015*fs; trainFeatures = melSpectrogram(trainAudio,fs,。。。'Window',hann(windowLength,'periodic'),。。。'OverlapLength',overlapLength,。。。“ fftlength”FFTLength,。。。'频率范围',frequencyRange,。。。'numbands'，numbands，。。。'FilterBankNormalization','none',。。。'WindowNormalization'，错误的，。。。'SpectrumType','magnitude',。。。'FilterBankDesignDomain',``扭曲''）；trainFeatures = log(trainFeatures + single(0.001)); trainFeatures = permute(trainFeatures,[2,1,4,3]); validationFeatures = melSpectrogram(validationAudio,fs,。。。'Window',hann(windowLength,'periodic'),。。。'OverlapLength',overlapLength,。。。“ fftlength”FFTLength,。。。'频率范围',frequencyRange,。。。'numbands'，numbands，。。。'FilterBankNormalization','none',。。。'WindowNormalization'，错误的，。。。'SpectrumType','magnitude',。。。'FilterBankDesignDomain',``扭曲''）；validationFeatures = log(validationFeatures + single(0.001)); validationFeatures = permute(validationFeatures,[2,1,4,3]);

转移学习

To load the pretrained network, callyamnet。If the Audio Toolbox model for YAMNet is not installed, then the function provides a link to the location of the network weights. To download the model, click the link. Unzip the file to a location on the MATLAB path. The YAMNet model can classify audio into one of 521 sound categories, including white noise and pink noise (but not brown noise).

网= yamnet; net.Layers(end).Classes

ans =521×1分类演讲儿童演讲，孩子说话叙事，独白言语综合器喊叫bellow yell喊叫孩子大喊尖叫的孩子，笑着笑笑笑笑声笑声笑声笑，chortle笑，哭泣，哭泣，哭泣的婴儿哭泣，婴儿哭泣，mo吟声，mo吟声唱歌⋮

Prepare the model for transfer learning by first converting the network to aLayerGraph(Deep Learning Toolbox)。利用replaceLayer(Deep Learning Toolbox)to replace the fully-connected layer with an untrained fully-connected layer. Replace the classification layer with a classification layer that classifies the input as "white", "pink", or "brown". SeeList of Deep Learning Layers(Deep Learning Toolbox)for deep learning layers supported in MATLAB®.

kiliquelabels = unique（trainlabels）;numlabels = numel（iniquelabels）;lgraph = layergraph（net.layers）;lgraph =替代者（lgraph，“稠密”,fullyConnectedLayer(numLabels,“姓名”,“稠密”)); lgraph = replaceLayer(lgraph,“声音”，分类器（“姓名”,"Sounds","Classes"）））;

要定义培训选项，请使用训练(Deep Learning Toolbox)。

选项=训练（'adam','验证data'，{single（验证图），验证标签}）;

要训练网络，请使用trainNetwork(Deep Learning Toolbox)。该网络仅使用每种噪声类型的5个信号来实现100％的验证精度。

trainNetwork(single(trainFeatures),trainLabels,lgraph,options);

Training on single CPU. |======================================================================================================================| | Epoch | Iteration | Time Elapsed | Mini-batch | Validation | Mini-batch | Validation | Base Learning | | | | (hh:mm:ss) | Accuracy | Accuracy | Loss | Loss | Rate | |======================================================================================================================| | 1 | 1 | 00:00:02 | 20.00% | 88.77% | 1.1922 | 0.6619 | 0.0010 | | 30 | 30 | 00:00:14 | 100.00% | 100.00% | 9.1076e-06 | 5.0431e-05 | 0.0010 | |======================================================================================================================|