主要内容

AudioFeatUreExtractor

流线音频功能提取

描述

AudioFeatUreExtractorencapsulates multiple audio feature extractors into a streamlined and modular implementation.

Creation

描述

AFE= AudioFeatUreExtractor()creates an audio feature extractor with default property values.

example

AFE= AudioFeatUreExtractor(名称,价值)specifies nondefault properties forAFE使用一个或多个名称值对参数。

Properties

expand all

主要特性

Analysis window, specified as a real vector.

Data Types:single|双倍的

相邻分析窗口的重叠长度,在[0,numel(Window))。

Data Types:single|双倍的

FFT长度,指定为整数。默认[],表示FFT长度等于窗口长度((numel(Window))。

Data Types:single|双倍的

输入样本率为Hz,指定为非负标量。

Data Types:single|双倍的

Input to spectral descriptors, specified as"linearSpectrum","melSpectrum","barkSpectrum", or"erbSpectrum"

受此属性影响的光谱描述符是:

这spectrum input to the spectral descriptors is the same as output from the corresponding feature:

例如,如果您设置"SpectralDescriptorInput"to"barkSpectrum", 和"spectralCentroid"to真的, thenAFE返回默认树皮光谱的质心。

[audioIn, fs] = audioread ('Counting-16-44p1-mono-15secs.wav');AFE= AudioFeatUreExtractor(“采样率”,fs,。。。"SpectralDescriptorInput","barkSpectrum",。。。"spectralCentroid",真的);barkspectralcentroid =提取物(AFE,Audioin);
If you specify a nondefaultbarkSpectrumusingsetExtractorParams,然后非默认树皮光谱是光谱描述符的输入。例如,如果您打电话setExtractorParams(afe,“ barkspectrum”,“ numbands”,40), thenAFE返回40频带频谱的质心。

setExtractorParams(afe,"barkSpectrum","NumBands",40) bark40SpectralCentroid = extract(aFE,audioIn);

Data Types:char|string

此属性仅阅读。

Total number of features output fromextract对于当前对象配置,指定为正整数。特征VectorLengthis equal to the second dimension of the output from theextract功能。

Data Types:single|双倍的

提取的功能

Extract the one-sided linear spectrum, specified as真的or错误的

To set parameters of the linear spectrum extraction, usesetExtractorParams:

setExtractorParams(afe,"linearSpectrum",“姓名”,价值)
线性频谱提取的可设置参数为:

  • "FrequencyRange"- Hz中提取光谱的频率范围,指定为逗号分隔对"FrequencyRange"以及在范围内数量增加的两元素向量[0,SampleRate/2]. If unspecified,频率范围默认为[0,SampleRate/2]

  • "SpectrumType"- 频谱类型,指定为逗号分隔对"SpectrumType"and"power"or"magnitude"。如果未指定,SpectrumType默认为"power"

  • “风向规范化”- 应用窗口归一化,指定为逗号分隔对“风向规范化”and真的or错误的。如果未指定,WindowNormalization默认为真的

Data Types:logical

Extract the one-sided mel spectrum, specified as真的or错误的

To set parameters of the mel spectrum extraction, usesetExtractorParams:

setExtractorParams(afe,"melSpectrum",“姓名”,价值)
Settable parameters for the mel spectrum extraction are:

  • "FrequencyRange"- Hz中提取光谱的频率范围,指定为逗号分隔对"FrequencyRange"以及在范围内数量增加的两元素向量[0,SampleRate/2]. If unspecified,频率范围默认为[0,SampleRate/2]

  • "SpectrumType"- 频谱类型,指定为逗号分隔对"SpectrumType"and"power"or"magnitude"。如果未指定,SpectrumType默认为"power"

  • "NumBands"- - MEL频带的数量,指定为逗号分隔对"NumBands"和一个整数。如果未指定,数字默认为32

  • "FilterBankNormalization"–– Normalization applied to bandpass filters, specified as the comma-separated pair consisting of"FilterBankNormalization"and“带宽”,“区域”, or“没有任何”。如果未指定,滤清器循环默认为“带宽”

  • “风向规范化”- 应用窗口归一化,指定为逗号分隔对“风向规范化”and真的or错误的。如果未指定,WindowNormalization默认为真的

  • “ FilterBankDesigndomain”–– Domain in which the filter bank is designed, specified as the comma-separated pair consisting ofFilterBankDesignDomain两者“线性”or“扭曲”。如果未指定,FilterBankDesignDomain默认为“线性”

Data Types:logical

Extract the one-sided Bark spectrum, specified as真的or错误的

要设置树皮光谱提取的参数,请使用setExtractorParams:

setExtractorParams(afe,"barkSpectrum",“姓名”,价值)
Settable parameters for the Bark spectrum extraction are:

  • "FrequencyRange"- Hz中提取光谱的频率范围,指定为逗号分隔对"FrequencyRange"以及在范围内数量增加的两元素向量[0,SampleRate/2]. If unspecified,频率范围默认为[0,SampleRate/2]

  • "SpectrumType"- 频谱类型,指定为逗号分隔对"SpectrumType"and"power"or"magnitude"。如果未指定,SpectrumType默认为"power"

  • "NumBands"–– Number of Bark bands, specified as the comma-separated pair consisting of"NumBands"和一个整数。如果未指定,数字默认为32

  • "FilterBankNormalization"–– Normalization applied to bandpass filters, specified as the comma-separated pair consisting of"FilterBankNormalization"and“带宽”,“区域”, or“没有任何”。如果未指定,滤清器循环默认为“带宽”

  • “风向规范化”- 应用窗口归一化,指定为逗号分隔对“风向规范化”and真的or错误的。如果未指定,WindowNormalization默认为真的

  • “ FilterBankDesigndomain”–– Domain in which the filter bank is designed, specified as the comma-separated pair consisting ofFilterBankDesignDomain两者“线性”or“扭曲”。如果未指定,FilterBankDesignDomain默认为“线性”

Data Types:logical

Extract the one-sided ERB spectrum, specified as真的or错误的

To set parameters of the ERB spectrum extraction, usesetExtractorParams:

setExtractorParams(afe,"erbSpectrum",“姓名”,价值)
ERB频谱提取的可设置参数为:

  • "FrequencyRange"- Hz中提取光谱的频率范围,指定为逗号分隔对"FrequencyRange"以及在范围内数量增加的两元素向量[0,SampleRate/2]. If unspecified,频率范围默认为[0,SampleRate/2]

  • "SpectrumType"- 频谱类型,指定为逗号分隔对"SpectrumType"and"power"or"magnitude"。如果未指定,SpectrumType默认为"power"

  • "NumBands"–– Number of ERB bands, specified as the comma-separated pair consisting of"NumBands"和一个整数。如果未指定,数字默认为ceil(HZ2ERB(频率(2)) -HZ2ERB(FrequencyRange(1)))

  • "FilterBankNormalization"–– Normalization applied to bandpass filters, specified as the comma-separated pair consisting of"FilterBankNormalization"and“带宽”,“区域”, or“没有任何”。如果未指定,滤清器循环默认为“带宽”

  • “风向规范化”- 应用窗口归一化,指定为逗号分隔对“风向规范化”and真的or错误的。如果未指定,WindowNormalization默认为真的

Data Types:logical

提取MEL频率曲线系数(MFCC), specified as真的or错误的

要设置MFCC提取的参数,请使用setExtractorParams:

setExtractorParams(afe,"mfcc",“姓名”,价值)
Settable parameters for the MFCC extraction are:

  • “ numcoeffs”–– Number of coefficients returned for each window, specified as a the comma-separated pair consisting of“ numcoeffs”和一个积极的整数。如果未指定,numcoeffs默认为13

  • "DeltaWindowLength"–– Delta window length, specified as the comma-separated pair consisting of"DeltaWindowLength"and an odd integer greater than 2. If unspecified,DeltaWindowLength默认为9。This parameter affects themfccdeltaandmfccdeltaDelta特征。

  • "Rectification"- 非线性整流类型,指定为逗号分隔对"Rectification"and"log"or"cubic-root"

使用MEL频率的Cepstral系数计算melSpectrum

Data Types:logical

Extract delta of MFCC, specified as真的or错误的

基于提取的MFCC计算三角洲MFCC。设置的参数MFCCaffectmfccdelta

Data Types:logical

提取MFCC的Delta-Delta, specified as真的or错误的

这delta-delta MFCC is calculated based on the extracted MFCC. Parameters set onMFCCaffectmfccdeltaDelta

Data Types:logical

提取γ-胶结系数(GTCC), specified as真的or错误的

要设置GTCC提取的参数,请使用setExtractorParams:

setExtractorParams(afe,“ GTCC”,“姓名”,价值)
Settable parameters for the GTCC extraction are:

  • “ numcoeffs”–– Number of coefficients returned for each window, specified as a the comma-separated pair consisting of“ numcoeffs”和一个积极的整数。如果未指定,numcoeffs默认为13

  • "DeltaWindowLength"–– Delta window length, specified as the comma-separated pair consisting of"DeltaWindowLength"and an odd integer greater than 2. If unspecified,DeltaWindowLength默认为9。This parameter affects thegtccDeltaandgtccdeltadelta特征。

  • "Rectification"- 非线性整流类型,指定为逗号分隔对"Rectification"and"log"or"cubic-root"

这gammatone cepstral coefficients are calculated using theErbspectrum

Data Types:logical

提取GTCC的增量,指定为真的or错误的

Delta GTCC是根据提取的GTCC计算的。设置的参数gtccaffectgtccDelta

Data Types:logical

提取GTCC的Delta-Delta, specified as真的or错误的

根据提取的GTCC计算三角洲 - 戴尔塔GTCC。设置的参数gtccaffectgtccdeltadelta

Data Types:logical

提取光谱中心, specified as真的or错误的

光谱中心体是在以下光谱表示之一上计算的,如SpectralDescriptorInputproperty:

Data Types:logical

提取光谱波峰,指定为真的or错误的

这spectral crest is calculated on one of the following spectral representations, as specified by theSpectralDescriptorInputproperty:

Data Types:logical

提取光谱减少, specified as真的or错误的

这spectral decrease is calculated on one of the following spectral representations, as specified by theSpectralDescriptorInputproperty:

Data Types:logical

提取光谱熵, specified as真的or错误的

光谱熵是在以下光谱表示之一上计算的,如SpectralDescriptorInputproperty:

Data Types:logical

Extract spectral flatness, specified as真的or错误的

光谱平坦度是在以下光谱表示之一上计算的,如SpectralDescriptorInputproperty:

Data Types:logical

提取光谱通量,指定为真的or错误的

这spectral flux is calculated on one of the following spectral representations, as specified by theSpectralDescriptorInputproperty:

To set parameters of the spectral flux extraction, usesetExtractorParams:

setExtractorParams(afe,"spectralFlux",“姓名”,价值)
Settable parameters for the spectral flux extraction are:

  • “ normtype”- 用于计算光谱通量的规范类型,指定为逗号分隔对“ normtype”and a1or2。如果未指定,NormType默认为2

Data Types:logical

提取光谱峰度, specified as真的or错误的

这spectral kurtosis is calculated on one of the following spectral representations, as specified by theSpectralDescriptorInputproperty:

Data Types:logical

提取光谱滚动点,指定为真的or错误的

这spectral rolloff point is calculated on one of the following spectral representations, as specified by theSpectralDescriptorInputproperty:

To set parameters of the spectral rolloff point extraction, usesetExtractorParams:

setExtractorParams(afe,“ Spectralrofloffpoint”,“姓名”,价值)
Settable parameters for the spectral flux extraction are:

  • “临界点”- - 截止点的阈值,指定为逗号分隔对“临界点”以及范围内的标量(0,1)。如果未指定,Threshold默认为0.95

Data Types:logical

Extract spectral skewness, specified as真的or错误的

光谱偏度是在以下光谱表示之一上计算的,如SpectralDescriptorInputproperty:

Data Types:logical

Extract spectral slope, specified as真的or错误的

光谱斜率是在以下光谱表示之一上计算的,如SpectralDescriptorInputproperty:

Data Types:logical

提取光谱传播,指定为真的or错误的

这spectral spread is calculated on one of the following spectral representations, as specified by theSpectralDescriptorInputproperty:

Data Types:logical

Extract pitch, specified as真的or错误的

To set parameters of the pitch extraction, usesetExtractorParams:

setExtractorParams(afe,"pitch",“姓名”,价值)
Settable parameters for the pitch extraction are:

  • “方法”- - 用于计算音调的方法,指定为逗号分隔对“方法”and"PEF","NCF",“ CEP”,“韩”, or"SRH"。如果未指定,Method默认为"NCF"。有关可用音高提取方法的说明,请参见沥青

  • “范围”–– Range within to search for the pitch in Hz, specified as the comma-separated pair consisting of“范围”and a two-element row vector of increasing values. If unspecified,范围默认为[50,400]

  • “中间腹力”- 随着时间的推移,用于平滑音高估计的中值滤波器长度,指定为逗号分隔对“中间腹力”和一个积极的整数。如果未指定,MedianFilterLength默认为1(no median filtering).

Data Types:logical

提取谐波比率, specified as真的or错误的

Data Types:logical

对象功能

extract Extract audio features
setExtractorParams 设置单个特征提取器的非默认参数值
info Output mapping and individual feature extractor parameters
generatematlabfunction CreateMATLABfunction compatible with C/C++ code generation

例子

全部收缩

Read in an audio signal.

[audioIn, fs] = audioread (“计数16-44p1-mono-15secs.wav”);

Create anAudioFeatUreExtractorobject that extracts the MFCC, delta MFCC, delta-delta MFCC, pitch, and spectral centroid of an audio signal. Use a 30 ms analysis window with 20 ms overlap.

AFE= AudioFeatUreExtractor(。。。“采样率”,fs,。。。"Window",hamming(round(0.03*fs),"periodic"),。。。"OverlapLength",圆形(0.02*fs),。。。"mfcc",true,。。。"mfccDelta",true,。。。“ mfccdeltadelta”,true,。。。"pitch",true,。。。"spectralCentroid",真的);

Callextract从音频信号中提取音频功能。

features = extract(aFE,audioIn);

利用info确定特征提取矩阵的哪一列对应于请求的沥青提取。

idx = info(afe)
idx =带有字段的结构:MFCC:[1 2 3 4 5 6 7 8 9 10 11 12 13] MFCCDELTA:[14 15 16 17 18 19 20 20 21 22 23 24 25 26] MFCCDELTADELTA:[27 28 29 30 30 30 30 31 32 33 33 34 35 36 37 38 38 39 39 39 39 39]光谱中心:40音高:41

Plot the detected pitch over time.

t = linspace(0,大小(Audioin,1)/fs,size(功能,1));绘图(t,功能(:,idx.pitch))标题('Pitch') xlabel(“时间)”)ylabel(“频率(Hz)”)

图包含一个轴对象。带有标题音调的轴对象包含类型线的对象。

Create an audio datastore that points to audio samples included with Audio Toolbox®.

文件夹= fullfile(matlabroot,'工具箱','audio','samples');ads = audioDatastore(folder);

查找对应于44.1 kHz的样本率的所有文件,然后子集the datastore.

keepFile = cellfun(@(x)contains(x,'44p1'),ads.Files); ads = subset(ads,keepFile);

将数据转换为tall大批。tallarrays are evaluated only when you request them explicitly usinggather。MATLAB® automatically optimizes the queued calculations by minimizing the number of passes through the data. If you have Parallel Computing Toolbox™, you can spread the calculations across multiple machines. The audio data is represented as anM-经过-1 tall cell array, whereMis the number of files in the audio datastore.

adsTall = tall(ads)
Starting parallel pool (parpool) using the 'local' profile ... Connected to the parallel pool (number of workers: 6). adsTall = M×1 tall cell array { 539648×1 double} { 227497×1 double} { 8000×1 double} { 685056×1 double} { 882688×2 double} {1115760×2 double} { 505200×2 double} {3195904×2 double} : : : :

Create anAudioFeatUreExtractor从每个音频文件中提取MEL频谱,树皮光谱,E​​RB频谱和线性频谱。为频谱提取使用默认分析窗口和重叠长度。

AFE= AudioFeatUreExtractor('SampleRate',44.1e3,。。。'melSpectrum',true,。。。'Barkspectrum',true,。。。'erbSpectrum',true,。。。'linearSpectrum',真的);

Define aCellFunfunction so that audio features are extracted from each cell of the tall array. Callgatherto evaluate the tall array.

specstall = cellfun(@(x)提取(afe,x),adstall,"UniformOutput",错误的);specs = chater(specstall);
使用平行池“本地”评估高表情: -  1 of 1:完成在12秒内完成的评估,在12秒内完成

specs从收集返回的变量是numFiles-经过-1 cell array, wherenumFilesis the number of files in the datastore. Each element of the cell array is anumhops-经过-numFeatures-经过-数字数组,其中啤酒花数和频道数量取决于音频文件的长度和数量,而功能的数量是音频数据中请求的功能数量。

numFiles = numel(specs)
numfiles = 12
[numHops1,numFeaturesFile1,numchanelsfile1] = size(specs {1})
numhops1 = 1053
NumFeaturesFile1 = 620
NumChanelsFile1 = 1
[numHops2,numFeaturesFile2,numchanelsfile2] = size(specs {2})
numhops2 = 443
numFeaturesFile2 = 620
numChanelsFile2 = 1

算法

AudioFeatUreExtractorcreates a feature extraction pipeline based on your selected features. To reduce computations,AudioFeatUreExtractor重用中介表示。一些中间表示形式可以作为功能输出:

例如,要创建一个对象,提取centroid of the Bark spectrum, the flux of the Bark spectrum, the pitch, the harmonic ratio, and the delta-delta of the MFCC, specify theAudioFeatUreExtractoras:

AFE= AudioFeatUreExtractor(。。。"SpectralDescriptorInput","barkSpectrum",。。。"spectralCentroid",true,。。。"spectralFlux",true,。。。"pitch",true,。。。“谐音”,true,。。。“ mfccdeltadelta”,true)
AFE= audioFeatureExtractor with properties: Properties Window: [1024×1 double] OverlapLength: 512 SampleRate: 44100 FFTLength: [] SpectralDescriptorInput: 'barkSpectrum' Enabled Features mfccDeltaDelta, spectralCentroid, spectralFlux, pitch, harmonicRatio Disabled Features linearSpectrum, melSpectrum, barkSpectrum, erbSpectrum, mfcc, mfccDelta gtcc, gtccDelta, gtccDeltaDelta, spectralCrest, spectralDecrease, spectralEntropy spectralFlatness, spectralKurtosis, spectralRolloffPoint, spectralSkewness, spectralSlope, spectralSpread To extract a feature, set the corresponding property to true. For example, obj.mfcc = true, adds mfcc to the list of enabled features.
该配置对应于突出显示的特征提取管道:

Note

BecauseAudioFeatUreExtractorreuses intermediary representations, the features output fromAudioFeatUreExtractor通过相应的单个特征提取器,可能与功能输出的默认配置相对应。

兼容性考虑

expand all

R2020b中的行为发生了变化

Extended Capabilities

Introduced in R2019b