主要内容

通过预验证的音频网络转移学习

此示例显示了如何使用转移学习来重新验证Yamnet(预验证的卷积神经网络)来对一组新的音频信号进行分类。要开始从头开始的音频深度学习,请参阅Classify Sound Using Deep Learning

转移学习通常用于深度学习应用中。您可以进行验证的网络,并将其用作学习新任务的起点。通过转移学习,对网络进行微调通常比训练从头开始的随机初始化权重的网络更快,更容易。您可以使用较少数量的培训信号将学习的功能快速传输到新任务。

Audio Toolbox™ additionally provides theclassifySound功能,该功能实现了YAMNET的必要预处理和方便的后处理以解释结果。音频工具箱还提供了预验证的VGGISH网络(vggish)以及vggishfeaturesfunction, which implements preprocessing and postprocessing for the VGGish network.

Create Data

产生100个白噪声信号,100个棕色噪声信号和100个粉红色噪声信号。每个信号的持续时间为0.98秒,假设采样率为16 kHz。

fs = 16e3; duration = 0.98; N = duration*fs; numSignals = 100; wNoise = 2*rand([N,numSignals]) - 1; wLabels = repelem(categorical("white"),数字信号,1);bNoise = filter(1,[1,-0.999],wNoise); bNoise = bNoise./max(abs(bNoise),[],'all');blabels = retelem(分类(“棕色”),数字信号,1);pnoise = pinknoise([n,numsignals]);plabels = retelem(分类("pink"),数字信号,1);

Split the data into training and test sets. Normally, the training set consists of most of the data. However, to illustrate the power of transfer learning, you will use only a few samples for training and the majority for validation.

k =5; trainAudio = [wNoise(:,1:K),bNoise(:,1:K),pNoise(:,1:K)]; trainLabels = [wLabels(1:K);bLabels(1:K);pLabels(1:K)]; validationAudio = [wNoise(:,K+1:end),bNoise(:,K+1:end),pNoise(:,K+1:end)]; validationLabels = [wLabels(K+1:end);bLabels(K+1:end);pLabels(K+1:end)]; fprintf(“火车集中的每个噪声颜色样品=%d \ n”+。。。"Number of samples per noise color in validation set = %d\n",k,numsignals-k);
火车集中每个噪声颜色的样品数量=验证集中的每个噪声颜色的样品数量= 95

Extract Features

利用melSpectrogramto extract log-mel spectrograms from both the training set and the validation set using the same parameters as the YAMNet model was trained on.

FFTLength = 512; numBands = 64; frequencyRange = [125 7500]; windowLength = 0.025*fs; overlapLength = 0.015*fs; trainFeatures = melSpectrogram(trainAudio,fs,。。。'Window',hann(windowLength,'periodic'),。。。'OverlapLength',overlapLength,。。。“ fftlength”FFTLength,。。。'频率范围',frequencyRange,。。。'numbands',numbands,。。。'FilterBankNormalization','none',。。。'WindowNormalization',错误的,。。。'SpectrumType','magnitude',。。。'FilterBankDesignDomain',``扭曲'');trainFeatures = log(trainFeatures + single(0.001)); trainFeatures = permute(trainFeatures,[2,1,4,3]); validationFeatures = melSpectrogram(validationAudio,fs,。。。'Window',hann(windowLength,'periodic'),。。。'OverlapLength',overlapLength,。。。“ fftlength”FFTLength,。。。'频率范围',frequencyRange,。。。'numbands',numbands,。。。'FilterBankNormalization','none',。。。'WindowNormalization',错误的,。。。'SpectrumType','magnitude',。。。'FilterBankDesignDomain',``扭曲'');validationFeatures = log(validationFeatures + single(0.001)); validationFeatures = permute(validationFeatures,[2,1,4,3]);

转移学习

To load the pretrained network, callyamnet。If the Audio Toolbox model for YAMNet is not installed, then the function provides a link to the location of the network weights. To download the model, click the link. Unzip the file to a location on the MATLAB path. The YAMNet model can classify audio into one of 521 sound categories, including white noise and pink noise (but not brown noise).

网= yamnet; net.Layers(end).Classes
ans =521×1分类演讲儿童演讲,孩子说话叙事,独白言语综合器喊叫bellow yell喊叫孩子大喊尖叫的孩子,笑着笑笑笑笑声笑声笑声笑,chortle笑,哭泣,哭泣,哭泣的婴儿哭泣,婴儿哭泣,mo吟声,mo吟声唱歌⋮

Prepare the model for transfer learning by first converting the network to aLayerGraph(Deep Learning Toolbox)。利用replaceLayer(Deep Learning Toolbox)to replace the fully-connected layer with an untrained fully-connected layer. Replace the classification layer with a classification layer that classifies the input as "white", "pink", or "brown". SeeList of Deep Learning Layers(Deep Learning Toolbox)for deep learning layers supported in MATLAB®.

kiliquelabels = unique(trainlabels);numlabels = numel(iniquelabels);lgraph = layergraph(net.layers);lgraph =替代者(lgraph,“稠密”,fullyConnectedLayer(numLabels,“姓名”,“稠密”)); lgraph = replaceLayer(lgraph,“声音”,分类器(“姓名”,"Sounds","Classes")));

要定义培训选项,请使用训练(Deep Learning Toolbox)

选项=训练('adam','验证data',{single(验证图),验证标签});

要训​​练网络,请使用trainNetwork(Deep Learning Toolbox)。该网络仅使用每种噪声类型的5个信号来实现100%的验证精度。

trainNetwork(single(trainFeatures),trainLabels,lgraph,options);
Training on single CPU. |======================================================================================================================| | Epoch | Iteration | Time Elapsed | Mini-batch | Validation | Mini-batch | Validation | Base Learning | | | | (hh:mm:ss) | Accuracy | Accuracy | Loss | Loss | Rate | |======================================================================================================================| | 1 | 1 | 00:00:02 | 20.00% | 88.77% | 1.1922 | 0.6619 | 0.0010 | | 30 | 30 | 00:00:14 | 100.00% | 100.00% | 9.1076e-06 | 5.0431e-05 | 0.0010 | |======================================================================================================================|