主要内容

学习与Pretrained音频网络传输

这个例子展示了如何使用转移学习再教育YAMNet, pretrained卷积神经网络,对一套新的音频信号进行分类。从头开始使用音频深度学习,明白了使用深度学习分类的声音

在深度学习应用程序转移学习是常用的。pretrained网络,可以使用它作为一个起点,学习一个新任务。微调网络转移学习通常比训练一个网络快得多,也更容易与随机初始化权重从零开始。您可以快速学习功能转移到一个新的任务使用较少的训练信号。

音频工具箱™另外提供classifySoundYAMNet函数,实现必要的预处理和后处理方便解释结果。音频工具箱还提供了pretrained VGGish网络(vggish),以及vggishFeatures函数,它实现了预处理和后处理VGGish网络。

创建数据

产生100白噪声信号,100布朗噪音信号,100粉红噪声信号。每个信号代表一个0.98秒的时间假设16千赫采样率。

fs = 16 e3;时间= 0.98;N = * fs持续时间;numSignals = 100;wNoise = 2 *兰德([N, numSignals]) - 1;wLabels = repelem(分类(“白色”)、numSignals 1);bNoise =过滤器(1 [-0.999]wNoise);bNoise = bNoise. / max (abs (bNoise), [],“所有”);bLabels = repelem(分类(“棕色”)、numSignals 1);pNoise = pinknoise ([N, numSignals]);pLabels = repelem(分类(“粉红色”)、numSignals 1);

将数据分为训练集和测试集。正常情况下,训练集包含大部分的数据。然而,为了说明转移的力量学习,您将使用只有少数样本训练和多数进行验证。

K =5;trainAudio = [wNoise (:, 1: K), bNoise (:, 1: K), pNoise (:, 1: K)];trainLabels = [wLabels (1: K); bLabels (1: K); pLabels (1: K)];validationAudio = [wNoise (:, K + 1:结束),bNoise (:, K + 1:结束),pNoise (:, K + 1:结束)];validationLabels = [wLabels (K + 1:结束);bLabels (K + 1:结束);pLabels (K + 1:结束)];流(“每个在火车噪音颜色的样品数量= % d \ n "+“验证的样品/噪音颜色数量= % d \ n "K numSignals-K);
每个在火车噪音颜色的样品数量= 5验证集样本/噪音颜色数量= 95

提取的特征

使用melSpectrogram提取log-mel声音从训练集和验证集使用相同的参数作为YAMNet模型训练。

FFTLength = 512;numBands = 64;frequencyRange = (125 - 7500);windowLength = 0.025 * fs;overlapLength = 0.015 * fs;trainFeatures = melSpectrogram (trainAudio fs,“窗口”损害(windowLength“周期”),“OverlapLength”overlapLength,“FFTLength”FFTLength,“FrequencyRange”frequencyRange,“NumBands”numBands,“FilterBankNormalization”,“没有”,“WindowNormalization”假的,“SpectrumType”,“级”,“FilterBankDesignDomain”,“扭曲”);trainFeatures =日志(trainFeatures +单(0.001));trainFeatures =排列(trainFeatures, (2, 1, 4, 3]);validationFeatures = melSpectrogram (validationAudio fs,“窗口”损害(windowLength“周期”),“OverlapLength”overlapLength,“FFTLength”FFTLength,“FrequencyRange”frequencyRange,“NumBands”numBands,“FilterBankNormalization”,“没有”,“WindowNormalization”假的,“SpectrumType”,“级”,“FilterBankDesignDomain”,“扭曲”);validationFeatures =日志(validationFeatures +单(0.001));validationFeatures =排列(validationFeatures, (2, 1, 4, 3]);

转移学习

加载pretrained网络,电话yamnet。如果音频YAMNet工具箱模型没有安装,那么函数提供了一个链接到网络权值的位置。下载模式,点击链接。将文件解压缩到一个位置在MATLAB的道路。YAMNet模型可以音频分类成一个521年的声音类别,包括白噪声和粉红噪声(但不是布朗噪音)。

网= yamnet;net.Layers(结束). class
ans =521×1分类演讲的孩子讲话,孩子说对话叙述,独白呀呀学语语音合成器喊咆哮呐喊喊孩子们大声尖叫低语笑声婴儿笑声咯咯地笑窃笑肚子笑了笑,得意的哭泣,哭泣的婴儿哭,婴儿哭泣呜咽声哀号,呻吟叹息合唱团约德尔调的唱腔来吟唱民歌⋮吟唱咒语的孩子唱歌

模型准备学习首先将网络转换为转移layerGraph(深度学习工具箱)。使用replaceLayer(深度学习工具箱)取代全层与未经训练的全层。将分类层替换为一个分类层分类输入“白色”,“粉红色”,或“棕色”。看到深度学习层的列表(深度学习工具箱)在MATLAB®深度学习层的支持。金宝app

uniqueLabels =独特(trainLabels);numLabels =元素个数(uniqueLabels);lgraph = layerGraph (net.Layers);lgraph = replaceLayer (lgraph,“密集”fullyConnectedLayer (numLabels“名称”,“密集”));lgraph = replaceLayer (lgraph,“声音”classificationLayer (“名称”,“声音”,“类”uniqueLabels));

定义训练选项,使用trainingOptions(深度学习工具箱)

选择= trainingOptions (“亚当”,“ValidationData”,{单身(validationFeatures) validationLabels});

训练网络,使用trainNetwork(深度学习工具箱)。网络实现验证的准确性100%只使用5信号/噪声类型。

trainNetwork(单(trainFeatures)、trainLabels lgraph,选项);
培训在单CPU。| = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = | | | |时代迭代时间| Mini-batch | |验证Mini-batch | |验证基地学习| | | | (hh: mm: ss) | | | | |损失损失精度精度率| | = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = | | 1 | 1 | 00:00:02 | | 20.00% 88.77% | 1.1922 | 0.6619 | 0.0010 | | 30 | 30 | 00:00:14 | | | 100.00% 100.00% 9.1076 e-06 e-05 | 5.0431 | 0.0010 | | = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = |