ivectorSystem

创建矢量i系统

描述

i-vectors紧凑的统计表示身份提取音频信号。ivectorSystem创建一个可训练的矢量i系统提取i-vectors并执行分类任务,如说话人识别,演讲者diarization,和声音分类。你也可以确定阈值为开集任务和登记标签到系统开放和闭集分类。

创建

语法

静脉注射= ivectorSystem

静脉注射= ivectorSystem (Name =值)

描述

例子

静脉注射= ivectorSystem创建一个默认的矢量i系统。你可以训练矢量i系统提取i-vectors和执行分类任务。

例子

静脉注射= ivectorSystem (名称=值)指定默认的属性静脉注射使用一个或多个名称参数。

属性

全部展开

`InputType`- - - - - -类型的输入
`“音频”`(默认)|`“特征”`

输入类型,指定为“音频”或“特征”。

“音频”——矢量i系统接受单声道音频信号作为输入。处理音频数据提取20 mel频率cepstral系数(MFCCs),δMFCCs,δMFCCs 60系数每帧。
如果InputType被设置为“音频”当创建矢量i系统,可以将训练数据:
- 单元阵列的单通道的音频信号,每个指定为一个列向量与基本类型单或双。
- 一个audioDatastore对象或一个signalDatastore对象指向一个数据集的单声道音频信号。
- 一个TransformedDatastore与一个潜在的audioDatastore或signalDatastore指向一个数据集的单声道音频信号。调用的输出读从必须单声道音频信号变换数据存储基本数据类型单或双。
“特征”——矢量i接受预浸音频特征作为输入。
如果InputType被设置为“特征”当创建矢量i系统,可以将训练数据:
- 细胞矩阵与基本类型的数组单或双。矩阵必须由音频特征的数字特征(列)第一次被锁定trainExtractor被称为和啤酒花(行)是可变大小的数量。特性输入的数量在任何对象的后续调用任何函数必须等于功能调用时使用的数量trainExtractor。
- 一个TransformedDatastore对象与一个潜在的audioDatastore或signalDatastore谁的读函数输出如前面所述的子弹。
- 一个signalDatastore的对象读函数输出所述第一颗子弹。

例子:静脉注射= ivectorSystem (InputType =“音频”)

数据类型:字符|字符串

`SampleRate`- - - - - -采样率的音频输入赫兹
`16000年`(默认)|积极的标量

在赫兹采样率的音频输入,指定为一个积极的标量。

请注意

的“SampleRate”属性只适用于当InputType被设置为“音频”。

例子:静脉注射= ivectorSystem (InputType =“音频”,SampleRate = 48000)

数据类型:单|双

`DetectSpeech`- - - - - -应用语音检测
`真正的`(默认)|`假`

应用语音检测,指定为真正的或假。与DetectSpeech设置为真正的矢量i系统提取特征只能从地区检测到演讲。

请注意

的DetectSpeech属性只适用于当InputType被设置为“音频”。

ivectorSystem使用detectSpeech函数来检测区域的言论。

例子:静脉注射= ivectorSystem (InputType =“音频”,DetectSpeech = true)

数据类型:逻辑|单|双

`详细的`- - - - - -显示培训进展
`真正的`(默认)|`假`

显示培训进展,指定为真正的或假。与详细的设置为真正的矢量i系统显示培训进展在命令窗口或现场编辑器。

提示

详细和non-verbose行为之间切换,使用点符号设置详细的属性对象之间的函数调用。

例子:静脉注射= ivectorSystem (InputType =“音频”,Verbose = false)

数据类型:逻辑|单|双

`EnrolledLabels`- - - - - -表包含了标签
`0`——- - - - - -`2`表(默认)

这个属性是只读的。

表包含了标签,指定为一个表。表行名称对应标签和列名称对应的模板矢量i和个人i-vectors i矢量用于生成模板。i-vectors数量模板用于生成矢量i可能被视为衡量模板的信心。

使用招收招收新的标签或更新现有的标签。
使用unenroll从系统删除标签。

数据类型:表

对象的功能

`trainExtractor`	火车i矢量提取器
`trainClassifier`	火车i矢量分类器
`校准`	训练矢量i系统校准器
`招收`	登记标签
`unenroll`	Unenroll标签
`detectionErrorTradeoff`	评价二元分类系统
`验证`	验证标签
`识别`	识别标签
`ivector`	提取矢量i
`信息`	返回培训配置和数据信息
`addInfoHeader`	添加自定义信息矢量i系统
`释放`	允许改变属性值和输入特征

例子

全部折叠

火车议长验证系统

打开生活的脚本

从格拉茨大学使用俯仰跟踪数据库技术(PTDB-TUG)[1]。数据集由英语母语阅读2342 TIMIT语料库的语音学上丰富的句子。下载并提取数据集,取决于您的系统,下载和提取数据集可以大约1.5小时。

url =“https://www2.spsc.tugraz.at/databases/PTDB-TUG/SPEECH_DATA_ZIPPED.zip”;downloadFolder = tempdir;datasetFolder = fullfile (downloadFolder,“PTDB-TUG”);如果~存在(datasetFolder“dir”)disp (“下载PTDB-TUG (3.9 G)……”解压缩(url, datasetFolder)结束

创建一个audioDatastore对象指向数据集的数据集。最初是用于pitch-tracking培训和评估,包括喉动描记器读数和基线距决定。只使用原始的录音。

广告= audioDatastore ([fullfile (datasetFolder“语音数据”,“女性”,“麦克风”),fullfile (datasetFolder,“语音数据”,“男性”,“麦克风”)),…IncludeSubfolders = true,…FileExtensions =“wav”);

文件名包含演讲者id。解码的文件名设置标签audioDatastore对象。

ads.Labels = extractBetween (ads.Files,“mic_”,“_”);countEachLabel(广告)

ans =20×2表标签数_____ _____ F01 236 F02 236 F03 236 F04 236 F05 236 F06 236 F07 236 F08 234 F09 236 F10 236 M01 236 M02 236 M03 236 M04 236 M05 236 M06 236⋮

一个音频文件读取的数据集,听,和情节。

[audioIn, audioInfo] =阅读(广告);fs = audioInfo.SampleRate;t =(0:大小(audioIn, 1) 1) / fs;声音(audioIn fs)情节(t, audioIn)包含(“时间(s)”)ylabel (“振幅”1)轴([0 t(结束)1])标题(“从数据集样本话语”)

分离audioDatastore对象分为四个:一个用于训练,一个负责招生,一个评估detection-error权衡,一个用于测试。训练集包含16人。招生,detection-error权衡和测试集包含其他四个扬声器。

speakersToTest =分类([“M01”,“M05”,“F01”,“F05”]);(广告,~ ismember adsTrain =子集(ads.Labels speakersToTest));广告=子集(广告,ismember (ads.Labels speakersToTest));[adsEnroll, adsTest adsDET] = splitEachLabel(广告、3、1);

显示的标签分布audioDatastore对象。

countEachLabel (adsTrain)

ans =16×2表标签数_____ _____ F02 236 F03 236 F04 236 F06 236 F07 236 F08 234 F09 236 F10 236 M02 236 M03 236 M04 236 M06 236 M07 236 M08 236 M09 236 M10 236

countEachLabel (adsEnroll)

ans =4×2表标签数_____ _____ F01 3 F05 M01 3 M05 3

countEachLabel (adsTest)

ans =4×2表标签数_____ _____ F01 F05 1 M01 1 M05 1

countEachLabel (adsDET)

ans =4×2表标签数_____ _____ F01 232 F05 232 M01 232 M05 232

创建一个矢量i系统。默认情况下,矢量i系统假设系统是单声道音频信号的输入。

speakerVerification = ivectorSystem (SampleRate = fs)

speakerVerification = ivectorSystem属性:InputType:“音频”SampleRate: 48000 DetectSpeech: 1详细:1 EnrolledLabels:(0×2表)

训练器矢量i系统调用trainExtractor。指定数量的通用背景模型(UBM)组件128和期望最大化的迭代的数量为5。指定的总变异性空间(电视)是64年和3的迭代次数。

trainExtractor (speakerVerification adsTrain,…UBMNumComponents = 128, UBMNumIterations = 5,…TVSRank = 64, TVSNumIterations = 3)

计算标准化因素....完成。........做培训通用背景模型。培训总可变性空间……完成。矢量i器培训完成。

训练分类器的矢量i系统,使用trainClassifier。减少维数的i-vectors,指定投影矩阵的特征向量的数量16。指定的维数概率线性判别分析(PLDA)模型作为16日和3的迭代次数。

trainClassifier (speakerVerification adsTrain adsTrain.Labels,…NumEigenvectors = 16,…PLDANumDimensions = 16, PLDANumIterations = 3)

提取i-vectors……。培训投影矩阵.....完成。培训PLDA……做模型。i矢量分类器训练完成。

校准系统,分数可以被视为衡量信心积极的决定,使用校准。

校准(speakerVerification adsTrain adsTrain.Labels)

提取i-vectors……。校准CSS得分手……。校准PLDA得分手……。校准完成。

检查参数以前训练矢量i系统使用信息。

信息(speakerVerification)

矢量i系统输入输入特征向量长度:60输入数据类型:双trainExtractor列车信号:3774 UBMNumComponents: 128 UBMNumIterations: 5 TVSRank: 64 TVSNumIterations: 3 trainClassifier列车信号:3774火车标签:F02 (236), F03 (236)…和14 NumEigenvectors: 16 PLDANumDimensions: PLDANumIterations: 3校准标定信号:3774校准标签:F02 (236), F03 (236)…和14

把招生。

[adsEnrollPart1, adsEnrollPart2] = splitEachLabel (adsEnroll 1 2);

招收扬声器的矢量i系统,电话招收。

登记(speakerVerification adsEnrollPart1 adsEnrollPart1.Labels)

提取i-vectors……。招收i-vectors .......完成。注册完成。

当你登记扬声器,只读的EnrolledLabels属性是i-vectors更新登记标签和对应的模板。表还跟踪数量的信号矢量i用于创建模板。一般来说,使用更多的信号导致更好的模板。

speakerVerification.EnrolledLabels

ans =4×2表ivector NumSamples _________________ __________ F01{16×1双}1 F05{16×1双}1 M01{16×1双}1 M05{16×1双}1

登记注册的第二部分然后再查看注册表标签。i矢量模板和更新样本的数量。

登记(speakerVerification adsEnrollPart2 adsEnrollPart2.Labels)

提取i-vectors……。招收i-vectors .......完成。注册完成。

speakerVerification.EnrolledLabels

ans =4×2表ivector NumSamples _________________ __________ F01{16×1双}3 F05{16×1双}3 M01{16×1双}3 M05{16×1双}3

评估矢量i系统扬声器验证,并确定一个决策阈值调用detectionErrorTradeoff。

[结果,eerThreshold] = detectionErrorTradeoff (speakerVerification, adsDET adsDET.Labels);

提取i-vectors……。得分矢量i对…。检测错误权衡评估完成。

第一个输出detectionErrorTradeoff是一种结构与两个字段:CSS和PLDA。每个字段包含一个表。表的每一行包含一个演讲者可能决定阈值验证任务,和相应的误警率(远)和错误拒绝率(FRR)。远和FRR决心使用了议长标签和数据输入detectionErrorTradeoff函数。

结果

结果=结构体字段:PLDA(1000×3表):CSS:(1000×3表)

results.CSS

ans =1000×3表阈值远FRR __________ _________ ___ 1.7736 e-09 1 0 1.8233 e-09 0.99964 0 1.8745 e-09 0.99964 0 1.927 e-09 0.99964 0 1.9811 e-09 0.99964 0 2.0366 e-09 0.99964 0 2.0937 e-09 0.99964 0 2.1524 e-09 0.99964 0 2.2128 e-09 0.99964 0 2.2748 e-09 0.99964 0 2.3386 e-09 0.99964 0 2.4042 e-09 0.99964 0 2.4716 0.99964 e-09 0 2.5409 e-09 0.99964 0 2.6122 e-09 0.99964 0 0⋮2.6854 e-09 0.99964

results.PLDA

ans =1000×3表阈值远FRR __________ _________ ___ 4.7045 e-34 1 0 5.143 e-34 0.99964 0 5.6225 e-34 0.99964 0 6.1466 e-34 0.99964 0 6.7197 e-34 0.99964 0 7.3461 e-34 0.99964 0 8.0309 e-34 0.99964 0 8.7796 e-34 0.99964 0 9.5981 e-34 0.99964 0 1.0493 e-33 0.99964 0 1.1471 e-33 0.99964 0 1.254 e-33 0.99964 0 1.371 0.99964 e-33 0 1.4988 e-33 0.99964 0 1.6385 e-33 0.99964 0 0⋮1.7912 e-33 0.99964

第二个输出detectionErrorTradeoff是一种结构与两个字段:CSS和PLDA。相应的值是决定阈值,结果平等的错误率(当远和FRR相等)。

eerThreshold

eerThreshold =结构体字段:CSS PLDA: 0.0021: 0.9366

你第一次打电话detectionErrorTradeoff,你必须提供数据和相应的标签来评估。随后,您可以获得相同的信息,或使用相同的底层不同的分析数据,通过调用detectionErrorTradeoff没有数据和标签。

调用detectionErrorTradeoff第二次没有参数或输出参数数据可视化detection-error权衡。

detectionErrorTradeoff (speakerVerification)

调用detectionErrorTradeoff一次。这一次,只想象detection-error PLDA得分手的权衡。

detectionErrorTradeoff (speakerVerification射手=“plda”)

根据您的应用程序中,您可能想要使用一个阈值权重的错误成本假警报的错误成本高于或低于一个错误的拒绝。您也可以使用的数据不能代表演讲者的先验概率。您可以使用minDCF参数指定自定义成本和先验概率。调用detectionErrorTradeoff再一次,这一次指定错误拒绝的成本为1,误接受的成本为2,演讲者的先验概率为0.1。

costFR = 1;costFA = 2;priorProb = 0.1;detectionErrorTradeoff (speakerVerification射手=“plda”minDCF = [costFR, costFA priorProb])

调用detectionErrorTradeoff一次。这一次,得到的minDCF阈值PLDA得分手和检测成本函数的参数。

[~,minDCFThreshold] = detectionErrorTradeoff (speakerVerification射手=“plda”minDCF = [costFR, costFA priorProb])

minDCFThreshold = 0.0595

测试扬声器验证系统

阅读测试集的一个信号。

adsTest = shuffle (adsTest);[audioIn, audioInfo] =阅读(adsTest);knownSpeakerID = audioInfo.Label

knownSpeakerID =1×1单元阵列{' F05 '}

执行发言人确认,电话验证音频信号和指定演讲者ID、一个得分手,一个阈值的得分手。的验证函数返回一个逻辑值表示是否接受或拒绝一个演讲者的身份,和一个分数表明输入音频和模板的相似度矢量i对应登记标签。

(tf,分数)=验证(speakerVerification、audioIn knownSpeakerID,“plda”,eerThreshold.PLDA);如果tf流(“成功!”\ nSpeaker接受。\ nSimilarity得分= % 0.2 f \ n \ n ',得分)其他的fprinf (“失败!\ nSpeaker拒绝。\ nSimilarity得分= % 0.2 f \ n \ n ',得分)结束

成功!发言人接受。相似性得分= 1.00

叫发言人再次验证。这一次,指定一个错误的议长ID。

possibleSpeakers = speakerVerification.EnrolledLabels.Properties.RowNames;imposterIdx =找到(~ ismember (possibleSpeakers knownSpeakerID));冒名顶替者= possibleSpeakers (imposterIdx (randperm(元素个数(imposterIdx), 1)))

冒名顶替者=1×1单元阵列{' F01 '}

(tf,分数)=验证(speakerVerification audioIn,冒名顶替者,“plda”,eerThreshold.PLDA);如果tf流(“失败!\ nSpeaker接受。\ nSimilarity得分= % 0.2 f \ n \ n ',得分)其他的流(“成功!”\ nSpeaker拒绝。\ nSimilarity得分= % 0.2 f \ n \ n ',得分)结束

成功!发言人拒绝。相似性得分= 0.00

引用

[1]信号处理和语音通信实验室。https://www.spsc.tugraz.at/databases-and-tools/ptdb-tug-pitch-tracking-database-from-graz-university-of-technology.html。2019年12月12日访问。

训练说话人识别系统

打开生活的脚本

使用人口普查数据库(也称为AN4数据库)的结算系统鲁棒语音识别[1]。男性和女性受试者的数据集包含的录音文字和数字说话。辅助函数在这个例子中为您下载数据集FLAC和转换原始文件,并返回两个audioDatastore对象包含了训练集和测试集。默认情况下,减少了数据集,这样跑得快。您可以使用完整的数据集通过设置ReduceDataset为假。

[adsTrain, adsTest] = HelperAN4Download (ReduceDataset = true);

测试数据集分割成注册和测试集。使用两个话语为登记和剩余的测试集。一般来说,话语越多用于注册、系统的性能越好。然而,大多数实际应用仅限于少量招生的话语。

[adsEnroll, adsTest] = splitEachLabel (adsTest 2);

检查扬声器的分布在训练、测试和注册集。演讲者在训练集不重叠的扬声器测试和注册集。

总结(adsTrain.Labels)

fejs 13 fmjd 13 fsrb 13 ftmj 13 fwxs 12 mcen 13 mrcb 13 msjm 13 msjr 13 msmn 9

总结(adsEnroll.Labels)

fvap marh 2

总结(adsTest.Labels)

fvap 11 marh 11

创建一个矢量i接受特性输入系统。

fs = 16 e3;4 = ivectorSystem (SampleRate = fs, InputType =“特征”);

创建一个audioFeatureExtractor对象提取gammatone cepstral系数(GTCC),δGTCC,δ的GTCC、和球场50周期损害windows女士45女士重叠。

afe = audioFeatureExtractor (gtcc = true, gtccDelta = true, gtccDeltaDelta = true, = true, SampleRate = fs);afe。窗口=损害(圆(0.05 * fs),“周期”);afe。OverlapLength =圆(0.045 * fs);afe

afe = audioFeatureExtractor属性:属性窗口:[800×1双]OverlapLength: 720 SampleRate: 16000 FFTLength: [] SpectralDescriptorInput:“linearSpectrum”FeatureVectorLength: 40 gtcc启用功能,gtccDelta, gtccDeltaDelta,音高linearSpectrum禁用功能,melSpectrum, barkSpectrum, erbSpectrum, mfcc, mfccDelta mfccDeltaDelta, spectralCentroid, spectralCrest, spectralDecrease, spectralEntropy, spectralFlatness spectralFlux, spectralKurtosis, spectralRolloffPoint, spectralSkewness, spectralSlope, spectralSpread harmonicRatio, zerocrossrate, shortTimeEnergy提取功能,设置相应的属性为true。例如,obj。mfcc = true,增加了mfcc启用的列表功能。

创建转换通过添加特征提取的数据存储读的函数adsTrain和adsEnroll。

trainLabels = adsTrain.Labels;adsTrain =变换(adsTrain, @ (x)提取(afe x));enrollLabels = adsEnroll.Labels;adsEnroll =变换(adsEnroll, @ (x)提取(afe x));

训练器和使用训练集分类器。

adsTrain trainExtractor (iv,…UBMNumComponents = 64,…UBMNumIterations = 5,…TVSRank = 32,…TVSNumIterations = 3);

计算标准化因素....完成。........做培训通用背景模型。培训总可变性空间……完成。矢量i器培训完成。

trainClassifier (iv、adsTrain trainLabels,…NumEigenvectors = 16,……PLDANumDimensions = 16,…PLDANumIterations = 5);

提取i-vectors……。培训投影矩阵.....完成。培训PLDA ........做模型。i矢量分类器训练完成。

校准系统,分数可以被视为衡量信心积极的决定,使用校准。

校准(iv、adsTrain trainLabels)

提取i-vectors……。校准CSS得分手……。校准PLDA得分手……。校准完成。

录取入学的扬声器设置。

登记(iv、adsEnroll enrollLabels)

提取i-vectors……。招收i-vectors .....完成。注册完成。

评估文件级别测试集上的预测精度。

numCorrect = 0;重置(adsTest)为指数= 1:元素个数(adsTest.Files)特性=提取(afe、读取(adsTest));结果=识别(iv功能);trueLabel = adsTest.Labels(指数);predictedLabel = results.Label (1);isPredictionCorrect = trueLabel = = predictedLabel;numCorrect = numCorrect + isPredictionCorrect;结束显示器(“文件的准确性:+圆(100 * numCorrect /元素个数(adsTest.Files), 2) +“(%)”)

“文件准确性:100 (%)”

引用

[1]“卡耐基-梅隆的斯芬克斯集团-音频数据库。”http://www.speech.cs.cmu.edu/databases/an4/。2019年12月19日访问。

火车环境声音分类系统

打开生活的脚本

下载并解压缩环境声音分类数据集,该数据集由录音贴上一个10个不同的音频声音类(ESC-10)。

loc = matlab.internal.examples.download金宝appSupportFile (“音频”,“esc - 10. - zip”);解压缩(loc pwd)

创建一个audioDatastore对象来管理数据并把它分割为训练集和验证集。调用countEachLabel显示声音类的分布和数量的独特的标签。

广告= audioDatastore (pwd IncludeSubfolders = true, LabelSource =“foldernames”);countEachLabel(广告)

ans =10×2表_____电锯标签数* * * 40 clock_tick 40 crackling_fire 40 crying_baby 40狗40直升机40雨40公鸡38 sea_waves 40打喷嚏40

听的一个文件。

[audioIn, audioInfo] =阅读(广告);fs = audioInfo.SampleRate;声音(audioIn fs) audioInfo.Label

ans =分类电锯

将数据存储分为训练集和测试集。

[adsTrain, adsTest] = splitEachLabel(广告,0.8);

创建一个audioFeatureExtractor从音频中提取所有可能的功能。

afe = audioFeatureExtractor (SampleRate = fs,…窗口=汉明(圆(0.03 * fs),“周期”),…OverlapLength =圆(0.02 * fs));params = info (afe,“所有”);params = structfun (@ (x)真的,params, UniformOutput = false);参数设置(afe);afe

afe = audioFeatureExtractor属性:属性窗口:[1323×1双]OverlapLength: 882 SampleRate: 44100 FFTLength: [] SpectralDescriptorInput:“linearSpectrum”FeatureVectorLength: 862 linearSpectrum启用功能,melSpectrum, barkSpectrum, erbSpectrum, mfcc, mfccDelta mfccDeltaDelta, gtcc、gtccDelta, gtccDeltaDelta, spectralCentroid, spectralCrest spectralDecrease, spectralEntropy, spectralFlatness, spectralFlux, spectralKurtosis, spectralRolloffPoint spectralSkewness, spectralSlope, spectralSpread,音高,harmonicRatio, zerocrossrate shortTimeEnergy禁用功能没有提取功能,设置相应的属性为true。例如,obj。mfcc = true,增加了mfcc启用的列表功能。

在当前文件夹中创建两个目录:训练和测试。从训练和测试数据中提取特征集和垫的特性文件写入相应的目录。Pre-extracting功能可以节省时间当你想评估不同的功能组合或培训配置。

如果~ isdir (“训练”mkdir ()“训练”mkdir ()“测试”)outputType =“.mat”;writeall (adsTrain“训练”WriteFcn = @ (x, y, z) writeFeatures (x, y, z, afe)) writeall (adsTest,“测试”WriteFcn = @ (x, y, z) writeFeatures (x, y, z, afe))结束

创建数据存储指向音频信号的特性。

sdsTrain = signalDatastore (“训练”,IncludeSubfolders = true);sdsTest = signalDatastore (“测试”,IncludeSubfolders = true);

创建标签数组的顺序一样signalDatastore文件。

labelsTrain =分类(extractBetween (sdsTrain.Files“ESC-10”+ filesep, filesep));labelsTest =分类(extractBetween (sdsTest.Files“ESC-10”+ filesep, filesep));

创建一个转换数据存储的信号数据存储分离和只使用所需的特性。您可以使用的输出信息在audioFeatureExtractor你选择的特征映射到指数矩阵的特性。你可以尝试通过选择不同的特性的例子。

featureIndices = info (afe)

featureIndices =结构体字段:linearSpectrum:(1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17日18 19 20 21日22日23日24日25日26日27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77…]melSpectrum: [663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694] barkSpectrum: [695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726] erbSpectrum: (727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769] mfcc: (770 771 772 773 774 775 776 777 778 779 780 781 782] mfccDelta: (783 784 785 786 787 788 789 790 791 792 793 794 795] mfccDeltaDelta: (796 797 798 799 800 801 802 803 804 805 806 807 808] gtcc: (809 810 811 812 813 814 815 816 817 818 819 820 821] gtccDelta: (822 823 824 825 826 827 828 829 830 831 832 833 834] gtccDeltaDelta: (835 836 837 838 839 840 841 842 843 844 845 846 847] spectralCentroid: 848 spectralCrest: 849 spectralDecrease: 850 spectralEntropy: 851 spectralFlatness: 852 spectralFlux: 853 spectralKurtosis: 854 spectralRolloffPoint: 855 spectralSkewness: 856 spectralSlope: 857 spectralSpread: 858节:859 harmonicRatio: 860 zerocrossrate: 861 shortTimeEnergy: 862

idxToUse = […featureIndices.harmonicRatio…,featureIndices.spectralRolloffPoint…,featureIndices.spectralFlux…,featureIndices.spectralSlope…];tdsTrain =变换(sdsTrain, @ (x) x (:, idxToUse));tdsTest =变换(sdsTest, @ (x) x (:, idxToUse));

创建一个矢量i接受特性输入系统。

soundClassifier = ivectorSystem (InputType =“特征”);

使用训练集训练器和分类器。

trainExtractor (soundClassifier tdsTrain UBMNumComponents = 128, TVSRank = 64);

计算标准化因素....完成。.....做培训通用背景模型。培训总可变性空间……完成。矢量i器培训完成。

trainClassifier (soundClassifier tdsTrain、labelsTrain NumEigenvectors = 32, PLDANumIterations = 0)

提取i-vectors……。培训投影矩阵.....完成。i矢量分类器训练完成。

参加训练集的标签创建矢量i模板的每个环境的声音。

登记(soundClassifier tdsTrain labelsTrain)

提取i-vectors……。招收i-vectors .............完成。注册完成。

校准矢量i系统。

校准(soundClassifier tdsTrain labelsTrain)

提取i-vectors……。校准CSS得分手……。校准完成。

使用识别功能测试集返回系统的推断标签。

inferredLabels = labelsTest;inferredLabels (:) = inferredLabels (1);为2 = 1:元素个数(labelsTest)特性=阅读(tdsTest);tableOut =识别(soundClassifier、特性“css”NumCandidates = 1);inferredLabels (ii) = tableOut.Label (1);结束

创建一个混淆矩阵来可视化性能测试集。

uniqueLabels =独特(labelsTest);厘米= 0(元素个数(uniqueLabels),元素个数(uniqueLabels));为2 = 1:元素个数(uniqueLabels)为jj = 1:元素个数(uniqueLabels)厘米总和(ii, jj) = ((labelsTest = = uniqueLabels (ii) & (inferredLabels = = uniqueLabels (jj)));结束结束labelStrings =取代(string (uniqueLabels),“_”,”“);colorbar热图(labelStrings labelStrings,厘米)从ylabel (“真正的标签”)包含(“预测”标签)精度=意味着(inferredLabels = = labelsTest);标题(sprintf (“精度= % 0.2 f % %”、准确性* 100)

释放矢量i系统。

发行版(soundClassifier)

金宝app支持功能

函数writeFeatures (afe audioIn,信息,~)% Convet单精度audioIn =单(audioIn);%提取特征特点=提取(afe audioIn);%显示输出的文件扩展名的名字替换垫子上。文件名= strrep (info.SuggestedOutputName,“wav”,“.mat”);%保存MFCC系数垫文件。保存(文件名,“特征”)结束

火车声故障识别系统

打开生活的脚本

下载并解压缩空气压缩机数据集[1]。这个数据集由空气压缩机的录音在健康状态或错误的7个州之一。

loc = matlab.internal.examples.download金宝appSupportFile (“音频”,…“AirCompressorDataset / AirCompressorDataset.zip”);解压缩(loc pwd)

创建一个audioDatastore对象来管理数据并把它分割为训练集和验证集。

广告= audioDatastore (pwd IncludeSubfolders = true, LabelSource =“foldernames”);[adsTrain, adsTest] = splitEachLabel(广告,0.8,0.2);

从数据存储读取一个音频文件并保存采样率。听音频信号,信号在时域的阴谋。

[x, fileInfo] =阅读(adsTrain);fs = fileInfo.SampleRate;声音(x, fs) t =(0:大小(x, 1) 1) / fs;情节(t, x)包含(“时间(s)”)标题(“国家= "+字符串(fileInfo.Label)轴紧

创建一个矢量i系统DetectSpeech设置为假。关掉冗长的行为。

faultRecognizer = ivectorSystem (SampleRate = fs, DetectSpeech = false,…Verbose = false)

faultRecognizer = ivectorSystem属性:InputType:“音频”SampleRate: 16000 DetectSpeech: 0详细:0 EnrolledLabels:(0×2表)

训练矢量i器和i矢量分类器使用训练数据存储。

trainExtractor (faultRecognizer adsTrain,…UBMNumComponents = 80,…UBMNumIterations = 3,……TVSRank = 40,…TVSNumIterations = 3) trainClassifier (faultRecognizer、adsTrain adsTrain.Labels,…NumEigenvectors = 7,……PLDANumDimensions = 32,…PLDANumIterations = 5)

校准得分输出faultRecognizer,这样他们就可以被视为衡量信心积极的决定。把详细的行为。参加所有的训练集的标签。

校准(faultRecognizer adsTrain adsTrain.Labels) faultRecognizer。Verbose = true;登记(faultRecognizer adsTrain adsTrain.Labels)

提取i-vectors……。招收i-vectors ...........完成。注册完成。

使用只读属性EnrolledLabels查看注册标签和对应的矢量i模板。

faultRecognizer.EnrolledLabels

ans =8×2表ivector NumSamples _______ __________轴承180飞轮{7×1双}{7×1双}180名健康{7×1双}180丽芙·{7×1双}180值列表180 NRV{7×1双}{7×1双}180活塞{7×1双}180 Riderbelt{7×1双}180

使用识别函数与PLDA得分手的情况来预测机器在测试集。识别函数返回一个表可能的标签按照降序排列的信心。

[audioIn, audioInfo] =阅读(adsTest);trueLabel = audioInfo.Label

trueLabel =分类轴承

predictedLabels =识别(faultRecognizer audioIn,“plda”)

predictedLabels =8×2表标签分_____ __________轴承0.99997飞轮2.265 e-05活塞8.6076 e-08丽芙·1.4237 e15汽油NRV 4.5529 e-16 Riderbelt 3.7359 e-16值列表6.3025 e-19健康4.2094 e-30

默认情况下,识别函数返回所有可能的候选标签和相应的分数。使用NumCandidates减少返回的候选人。

结果=识别(faultRecognizer audioIn,“plda”NumCandidates = 3)

结果=3×2表标签分________ __________轴承e-08 8.6076 0.99997 2.265飞轮e-05活塞

引用

[1]Verma Nishchal K。,et al. “Intelligent Condition Based Monitoring Using Acoustic Signals for Air Compressors.”IEEE可靠性,卷65,不。1、2016年3月,页291 - 309。DOI.org (Crossref),doi: 10.1109 / TR.2015.2459684。

训练语音情感识别系统

打开生活的脚本

下载柏林的情感语音数据库[1]。数据库包含535个口语话语10演员旨在传达的情绪:愤怒、无聊、厌恶、焦虑、恐惧、快乐、悲伤、或中性的。情感是文本无关的。

url =“http://emodb.bilderbar.info/download/download.zip”;downloadFolder = tempdir;datasetFolder = fullfile (downloadFolder,“Emo-DB”);如果~存在(datasetFolder“dir”)disp (“下载Emo-DB (40.5 MB)……”解压缩(url, datasetFolder)结束

创建一个audioDatastore指向的音频文件。

广告= audioDatastore (fullfile (datasetFolder,“wav”));

文件名称代码指示演讲者id、文本语言,情感,和版本。解释代码的网站包含一个关键和关于扬声器的附加信息,如性别和年龄。创建一个表的变量演讲者和情感。解码文件名到桌子上。

filepaths = ads.Files;emotionCodes = cellfun (@ (x) x(录得5个),filepaths,“UniformOutput”、假);情感=取代(emotionCodes, {' W ',“L”,“E”,“一个”,“F”,“T”,“N”},…{“愤怒”,“无聊”,“厌恶”,“焦虑”,“幸福”,“悲伤”,“中性”});speakerCodes = cellfun (@ (x) x (end-10: end-9) filepaths,“UniformOutput”、假);labelTable =表(分类(speakerCodes)、分类(情绪),VariableNames = [“议长”,“情感”]);总结(labelTable)

变量:演讲者:535×1分类值:03 49 08年58 09年10 38 43 11 55 12 35 13 61 14 69 15 56 16 71情感:535×1分类值:愤怒127焦虑69无聊81厌恶46 62快乐71中性79悲伤

labelTable在相同的顺序文件audioDatastore。设置标签财产的audioDatastore来labelTable。

ads.Labels = labelTable;

从数据存储和读取的信号听它。显示演讲者ID和情感的音频信号。

[audioIn, audioInfo] =阅读(广告);fs = audioInfo.SampleRate;声音(audioIn fs) audioInfo.Label

ans =1×2表演讲者情感_____ 03幸福

将数据库分成训练集和测试集,分配两个扬声器的测试集,其余的训练集。

testSpeakerIdx = ads.Labels.Speaker = =“12”| ads.Labels.Speaker = =“13”;adsTrain =子集(广告,~ testSpeakerIdx);adsTest =子集(广告,testSpeakerIdx);

所有的培训和测试音频数据读入单元数组。如果您的数据可以在内存中,训练通常是更快输入细胞阵列i矢量系统而不是数据存储。

小火车= readall (adsTrain);trainLabels = adsTrain.Labels.Emotion;testSet = readall (adsTest);testLabels = adsTest.Labels.Emotion;

创建一个矢量i系统并不适用语音检测。当DetectSpeech被设置为真正的(默认),只有地区检测到语音训练矢量i使用系统。当DetectSpeech被设置为假,整个输入音频是用来训练矢量i系统。应用语音检测的有效性取决于数据输入系统。

emotionRecognizer = ivectorSystem (SampleRate = fs, DetectSpeech =假)

emotionRecognizer = ivectorSystem属性:InputType:“音频”SampleRate: 16000 DetectSpeech: 0详细:1 EnrolledLabels:(0×2表)

调用trainExtractor使用训练集。

rng默认的trainExtractor (emotionRecognizer、小火车、…UBMNumComponents =256年,…UBMNumIterations =5,……TVSRank =128年,…TVSNumIterations =5);

计算标准化因素.....完成。........做培训通用背景模型。培训总可变性空间........完成。矢量i器培训完成。

复制的情感识别系统在稍后的示例中使用。

sentimentRecognizer = (emotionRecognizer)复印件;

调用trainClassifier使用训练集。

rng默认的trainLabels trainClassifier (emotionRecognizer,小火车,…NumEigenvectors =32,……PLDANumDimensions =16,…PLDANumIterations =10);

提取i-vectors……。培训投影矩阵.....完成。培训PLDA .............做模型。i矢量分类器训练完成。

调用校准使用训练集。在实践中,设置的校准应不同训练集。

校准(emotionRecognizer,小火车,trainLabels)

提取i-vectors……。校准CSS得分手……。校准PLDA得分手……。校准完成。

招收培训标签矢量i系统。

登记(emotionRecognizer,小火车,trainLabels)

提取i-vectors……。招收i-vectors ..........完成。注册完成。

您可以使用detectionErrorTradeoff作为一个快速检查的性能multilabel闭集分类系统。然而,detectionErrorTradeoff提供更合适的开集的二进制信息分类的问题,例如,议长验证任务。

detectionErrorTradeoff (emotionRecognizer testSet testLabels)

提取i-vectors……。得分矢量i对…。检测错误权衡评估完成。

为一个更详细的视图的矢量i系统的性能在multilabel闭集应用程序中,您可以使用识别功能和创建一个混合矩阵。混淆矩阵使您能够确定哪些情绪被误判和他们被误判。使用支持函数金宝appplotConfusion显示结果。

trueLabels = testLabels;predictedLabels = trueLabels;射手=“plda”;为2 = 1:元素个数(testSet) tableOut =识别(emotionRecognizer, testSet{2},射手);predictedLabels (ii) = tableOut.Label (1);结束plotConfusion (trueLabels predictedLabels)

调用信息检查如何emotionRecognizer培训和评估。

信息(emotionRecognizer)

矢量i系统输入输入特征向量长度:60输入数据类型:双trainExtractor列车信号:439 UBMNumComponents: 256 UBMNumIterations: 5 TVSRank: 128 TVSNumIterations: 5 trainClassifier列车信号:439火车标签:愤怒(103),焦虑(56)…和5 NumEigenvectors: 32 PLDANumDimensions: 16 PLDANumIterations: 10校准标定信号:439校准标签:愤怒(103),焦虑(56)…和5 detectionErrorTradeoff评价信号:96评价标签:愤怒(24),焦虑(13)…和5

接下来,修改矢量i系统识别情感积极、中性或负面的。更新标签只包含类别消极的,积极的,和分类。

trainLabelsSentiment = trainLabels;trainLabelsSentiment (ismember (trainLabels,分类([“愤怒”,“焦虑”,“无聊”,“悲伤”,“厌恶”))))=分类(“负面”);trainLabelsSentiment (ismember (trainLabels分类(“幸福”)))=分类(“积极”);trainLabelsSentiment = removecats (trainLabelsSentiment);testLabelsSentiment = testLabels;testLabelsSentiment (ismember (testLabels,分类([“愤怒”,“焦虑”,“无聊”,“悲伤”,“厌恶”))))=分类(“负面”);testLabelsSentiment (ismember (testLabels分类(“幸福”)))=分类(“积极”);testLabelsSentiment = removecats (testLabelsSentiment);

训练矢量i系统分类器使用更新后的标签。你不需要重新训练器。调整系统。

rng默认的trainLabelsSentiment trainClassifier (sentimentRecognizer,小火车,…NumEigenvectors =64年,……PLDANumDimensions =32,…PLDANumIterations =10);

提取i-vectors……。培训投影矩阵.....完成。培训PLDA .............做模型。i矢量分类器训练完成。

校准(sentimentRecognizer,小火车,trainLabels)

提取i-vectors……。校准CSS得分手……。校准PLDA得分手……。校准完成。

招收培训标签到系统中,然后测试集的混合矩阵。

登记(sentimentRecognizer,小火车,trainLabelsSentiment)

提取i-vectors……。招收i-vectors……。注册完成。

trueLabels = testLabelsSentiment;predictedLabels = trueLabels;射手=“plda”;为2 = 1:元素个数(testSet) tableOut =识别(sentimentRecognizer, testSet{2},射手);predictedLabels (ii) = tableOut.Label (1);结束plotConfusion (trueLabels predictedLabels)

一个矢量i系统不需要标签用来训练分类器等于注册标签。

从系统Unenroll情绪标签,然后登记系统中最初的情感类别。分析系统的分类性能。

unenroll (sentimentRecognizer)登记(小火车,sentimentRecognizer trainLabels)

提取i-vectors……。招收i-vectors ..........完成。注册完成。

trueLabels = testLabels;predictedLabels = trueLabels;射手=“plda”;为2 = 1:元素个数(testSet) tableOut =识别(sentimentRecognizer, testSet{2},射手);predictedLabels (ii) = tableOut.Label (1);结束plotConfusion (trueLabels predictedLabels)

金宝app支持功能

函数plotConfusion (trueLabels predictedLabels) uniqueLabels =独特(trueLabels);厘米= 0(元素个数(uniqueLabels),元素个数(uniqueLabels));为2 = 1:元素个数(uniqueLabels)为jj = 1:元素个数(uniqueLabels)厘米总和(ii, jj) = ((trueLabels = = uniqueLabels (ii) & (predictedLabels = = uniqueLabels (jj)));结束结束colorbar热图(uniqueLabels uniqueLabels,厘米)从ylabel (“真正的标签”)包含(“预测标签”)精度=意味着(trueLabels = = predictedLabels);标题(sprintf (“精度= % 0.2 f % %”、准确性* 100)结束

引用

[1]Burkhardt, F。,一个。Paeschke, M. Rolfes, W.F. Sendlmeier, and B. Weiss, "A Database of German Emotional Speech." In Proceedings Interspeech 2005. Lisbon, Portugal: International Speech Communication Association, 2005.

火车词识别系统

这个示例使用:

打开生活的脚本

矢量i系统包括一个可训练的前端,学习如何提取i-vectors基于无标号数据,和一个可训练的后端,学习如何分类i-vectors基于标签的数据。在本例中,您一个矢量i系统应用于文字识别的任务。首先,评估的准确性矢量i系统使用传统的矢量i系统中的分类器包括:概率线性判别分析(PLDA)和余弦相似度得分(CSS)。接下来,评估系统的准确性,如果你将分类器替换为双向长期短期记忆(BiLSTM)网络或再邻居分类器。

建立训练集和验证集

下载免费使用数字数据集(FSDD)[1]。FSDD由短音频文件所说的数字(0 - 9)。

loc = matlab.internal.examples.download金宝appSupportFile (“音频”,“FSDD.zip”);解压缩(loc pwd)

创建一个audioDatastore指录音。获取数据的采样率设置。

广告= audioDatastore (pwd IncludeSubfolders = true);[~,adsInfo] =阅读(广告);fs = adsInfo.SampleRate;

文件名的第一个元素是数字语音文件。文件名的第一个元素,将它们转换为分类,然后设置标签财产的audioDatastore。

(~,文件名)= cellfun (@ (x) fileparts (x) ads.Files, UniformOutput = false);ads.Labels =分类(string (cellfun (@ (x) x(1),文件名)));

将数据存储到开发集和验证集,使用splitEachLabel。分配80%的数据开发和剩下的20%进行验证。

[adsTrain, adsValidation] = splitEachLabel(广告,0.8);

传统的矢量i后端性能进行评估

创建一个矢量i系统预计音频输入8 kHz的采样率和言论不执行检测。

wordRecognizer = ivectorSystem (DetectSpeech = false, SampleRate = fs)

wordRecognizer = ivectorSystem属性:InputType:“音频”SampleRate: 8000 DetectSpeech: 0详细:1 EnrolledLabels:(0×2表)

训练矢量i器使用训练集的数据。

trainExtractor (wordRecognizer adsTrain,…UBMNumComponents = 64,…UBMNumIterations = 5,……TVSRank = 32,…TVSNumIterations = 5);

计算标准化因素....完成。........做培训通用背景模型。培训总可变性空间........完成。矢量i器培训完成。

训练矢量i使用训练数据中的数据集分类器和相应的标签。

trainClassifier (wordRecognizer adsTrain adsTrain.Labels,…NumEigenvectors = 10,……PLDANumDimensions = 10,…PLDANumIterations = 5);

提取i-vectors……。培训投影矩阵.....完成。培训PLDA ........做模型。i矢量分类器训练完成。

校准得分输出wordRecognizer,这样他们就可以被视为衡量信心积极的决定。登记标签到系统使用整个训练集。

校准(wordRecognizer adsTrain adsTrain.Labels)

提取i-vectors……。校准CSS得分手……。校准PLDA得分手……。校准完成。

登记(wordRecognizer adsTrain adsTrain.Labels)

提取i-vectors……。招收i-vectors .............完成。注册完成。

在一个循环,从验证数据存储,读取音频识别最有可能词根据指定的得分手,并保存预测进行分析。

trueLabels = adsValidation.Labels;predictedLabels = trueLabels;重置(adsValidation)射手=“plda”;为2 = 1:元素个数(trueLabels) audioIn =阅读(adsValidation);=识别(wordRecognizer audioIn,射手);predictedLabels (ii) = to.Label (1);结束

显示一个混乱图的矢量i系统的性能验证集。

图(单位=“归一化”位置= [0.2 - 0.2 0.5 - 0.5])confusionchart (trueLabels predictedLabels,…ColumnSummary =“column-normalized”,…RowSummary =“row-normalized”,…标题= sprintf (的精度= % 0.2 f (% %)”,100 * (predictedLabels = = trueLabels)))

评估深度学习后端性能

接下来,火车全连通网络使用i-vectors作为输入。

ivectorsTrain = (ivector (wordRecognizer adsTrain) ';ivectorsValidation = (ivector (wordRecognizer adsValidation) ';

定义一个完全连接网络。

层= […featureInputLayer(大小(ivectorsTrain, 2),标准化=“没有”)fullyConnectedLayer (128) dropoutLayer (0.4) fullyConnectedLayer (256) dropoutLayer (0.4) fullyConnectedLayer (256) dropoutLayer (0.4) fullyConnectedLayer (128) dropoutLayer (0.4) fullyConnectedLayer(元素个数(独特(adsTrain.Labels))) softmaxLayer classificationLayer];

定义训练参数。

miniBatchSize = 256;validationFrequency =地板(元素个数(adsTrain.Labels) / miniBatchSize);选择= trainingOptions (“亚当”,…MaxEpochs = 10,…MiniBatchSize = MiniBatchSize,…情节=“训练进步”,…Verbose = false,…洗牌=“every-epoch”,…ValidationData = {ivectorsValidation, adsValidation.Labels},…ValidationFrequency = ValidationFrequency);

培训网络。

网= trainNetwork (ivectorsTrain、adsTrain.Labels层,选择);

评估深度学习后端使用混乱的性能图表。

predictedLabels =分类(净,ivectorsValidation);trueLabels = adsValidation.Labels;图(单位=“归一化”位置= [0.2 - 0.2 0.5 - 0.5])confusionchart (trueLabels predictedLabels,…ColumnSummary =“column-normalized”,…RowSummary =“row-normalized”,…标题= sprintf (的精度= % 0.2 f (% %)”,100 * (predictedLabels = = trueLabels)))

然而,后端性能评估

培训和评估i-vectorsk最近的邻居(资讯)的后端。

使用fitcknn培训资讯模型。

classificationKNN = fitcknn (…ivectorsTrain,…adsTrain.Labels,…距离=“欧几里得”,…指数= [],…NumNeighbors = 10,…DistanceWeight =“SquaredInverse”,…规范= true,…一会=独特(adsTrain.Labels));

评估资讯后端。

predictedLabels =预测(classificationKNN ivectorsValidation);trueLabels = adsValidation.Labels;图(单位=“归一化”位置= [0.2 - 0.2 0.5 - 0.5])confusionchart (trueLabels predictedLabels,…ColumnSummary =“column-normalized”,…RowSummary =“row-normalized”,…标题= sprintf (的精度= % 0.2 f (% %)”,100 * (predictedLabels = = trueLabels)))

引用

[1]Jakobovski。“Jakobovski / Free-Spoken-Digit-Dataset。”GitHub, May 30, 2019.https://github.com/Jakobovski/free-spoken-digit-dataset。

引用

[1]雷诺兹,道格拉斯。,et al. “Speaker Verification Using Adapted Gaussian Mixture Models.”数字信号处理,10卷,不。2000年1月1 - 3,19-41页。DOI.org (Crossref),doi: 10.1006 / dspr.1999.0361。

[2]肯尼,帕特里克,et al。“联合因子分析与Eigenchannels在说话人识别。”IEEE音频、语音和语言处理,15卷,不。2007年5月4日,页。1435 - 47。DOI.org (Crossref),doi: 10.1109 / TASL.2006.881693。

[3]肯尼,P。,et al. “A Study of Interspeaker Variability in Speaker Verification.”IEEE音频、语音和语言处理,16卷,不。5,2008年7月,页980 - 88。DOI.org (Crossref),doi: 10.1109 / TASL.2008.925147。

[4]Dehak Najim, et al。“议长前端因素分析验证。”IEEE音频、语音和语言处理,19卷,不。4,2011年5月,页788 - 98。DOI.org (Crossref),doi: 10.1109 / TASL.2010.2064307。

帕维尔,[5]Matejka Ondrej Glembek,法比奥Castaldo, m·j·阿拉姆Oldrich Plchot,帕特里克·肯尼·卢卡斯Burget, Jan Cernocky。“Full-Covariance UBM和重尾PLDA矢量i议长验证。”2011年IEEE国际会议音响、演讲和信号处理(ICASSP),2011年。https://doi.org/10.1109/icassp.2011.5947436。

[6]斯奈德,大卫,et al . " X-Vectors:健壮款嵌入的说话人识别。”2018年IEEE国际会议音响、演讲和信号处理(ICASSP),2018岁的IEEE 5329 - 33页。DOI.org (Crossref),doi: 10.1109 / ICASSP.2018.8461375。

[7]信号处理和语音通信实验室。2019年12月12日通过。https://www.spsc.tugraz.at/databases-and-tools/ptdb-tug-pitch-tracking-database-from-graz-university-of-technology.html。

[8]Variani,伊桑·等。“深层神经网络对小足迹Text-Dependent议长验证。”2014年IEEE国际会议音响、演讲和信号处理(ICASSP),2014岁的IEEE 4052 - 56页。DOI.org (Crossref),doi: 10.1109 / ICASSP.2014.6854363。

[9]Dehak、Najim Reda Dehak,詹姆斯·r .玻璃、道格拉斯·a·雷诺兹和帕特里克·肯尼。“余弦相似性得分没有分数归一化技术。”奥德赛(2010)。

[10]Verma Pulkit, Pradip Das。“I-Vectors语音处理的应用:一项调查。”国际语音识别技术杂志》上,18卷,不。4,2015年12月,页529 - 46。DOI.org (Crossref),doi: 10.1007 / s10772 - 015 - 9295 - 3。

[11]d Garcia-Romero和c . Espy-Wilson”分析i矢量长度归一化的说话人识别系统。”Interspeech,2011年,页249 - 252。

[12]肯尼,帕特里克。“重尾分布先验贝叶斯议长验证”。奥德赛2010 -演讲者和语言识别车间布尔诺,捷克共和国,2010年。

[13]斯、亚历山大、香港李Aik队效力,托米- Kinnunen。“统一概率线性判别分析变异生物认证。”课堂讲稿在计算机科学结构、句法和统计模式识别,2014,464 - 75。https://doi.org/10.1007/978 - 3 - 662 - 44415 - 3 - _47。

[14]Rajan Padmanabhan,安东gleb stolyarov城镇某某先生,托米- Kinnunen。“从单一到多招生I-Vectors:实际PLDA得分变体议长验证。”数字信号处理31日(8月),2014年,页93 - 101。https://doi.org/10.1016/j.dsp.2014.05.001。

版本历史

介绍了R2021a

另请参阅

audioDatastore|audioFeatureExtractor|audioDataAugmenter|speakerRecognition

ivectorSystem

描述

创建

语法

描述

属性

InputType- - - - - -类型的输入“音频”(默认)|“特征”

SampleRate- - - - - -采样率的音频输入赫兹16000年(默认)|积极的标量

DetectSpeech- - - - - -应用语音检测真正的(默认)|假

详细的- - - - - -显示培训进展真正的(默认)|假

EnrolledLabels- - - - - -表包含了标签0——- - - - - -2表(默认)

对象的功能

例子

火车议长验证系统

训练说话人识别系统

火车环境声音分类系统

火车声故障识别系统

训练语音情感识别系统

火车词识别系统

引用

版本历史

另请参阅

主题

`InputType`- - - - - -类型的输入
`“音频”`(默认)|`“特征”`

`SampleRate`- - - - - -采样率的音频输入赫兹
`16000年`(默认)|积极的标量

`DetectSpeech`- - - - - -应用语音检测
`真正的`(默认)|`假`

`详细的`- - - - - -显示培训进展
`真正的`(默认)|`假`

`EnrolledLabels`- - - - - -表包含了标签
`0`——- - - - - -`2`表(默认)