Main Content

Audio Processing Using Deep Learning

Extend deep learning workflows with audio and speech processing applications

Apply deep learning to audio and speech processing applications by using Deep Learning Toolbox™ together with Audio Toolbox™. For signal processing applications, see信号处理ing Using Deep Learning. For applications in wireless communications, seeWireless Communications Using Deep Learning.

Apps

Audio Labeler Define and visualize ground-truth labels

Functions

expand all

audioDatastore Datastore for collection of audio files
audioDataAugmenter Augment audio data
audioFeatureExtractor Streamline audio feature extraction
ivectorSystem Create i-vector system
openl3Features Extract OpenL3 features
pitchnn Estimate pitch with deep learning neural network
vggishFeatures Extract VGGish features
classifySound Classify sounds in audio signal
crepe CREPE neural network
crepePreprocess Preprocess audio for CREPE deep learning network
crepePostprocess Postprocess output of CREPE deep learning network
openl3 OpenL3 neural network
openl3Features Extract OpenL3 features
openl3Preprocess Preprocess audio for OpenL3 feature extraction
pitchnn Estimate pitch with deep learning neural network
vggish VGGish神经network
vggishFeatures Extract VGGish features
vggishPreprocess Preprocess audio for VGGish feature extraction
yamnet YAMNet neural network
yamnetGraph Graph of YAMNet AudioSet ontology
yamnetPreprocess Preprocess audio for YAMNet classification

Topics

Introduction to Deep Learning for Audio Applications(Audio Toolbox)

学习常用工具和工作流to apply deep learning to audio applications.

Classify Sound Using Deep Learning(Audio Toolbox)

Train, validate, and test a simple long short-term memory (LSTM) to classify sounds.

Transfer Learning with Pretrained Audio Networks

Use transfer learning to retrain YAMNet, a pretrained convolutional neural network (CNN), to classify a new set of audio signals.

Speaker Identification Using Custom SincNet Layer and Deep Learning

Perform speech recognition using a custom deep learning layer that implements a mel-scale filter bank.

Dereverberate Speech Using Deep Learning Networks

Train a deep learning model that removes reverberation from speech.

Speech Command Recognition in Simulink

Detect the presence of speech commands in audio using a Simulink®model.

Spoken Digit Recognition with Wavelet Scattering and Deep Learning

This example shows how to classify spoken digits using both machine and deep learning techniques.

Cocktail Party Source Separation Using Deep Learning Networks

This example shows how to isolate a speech signal using a deep learning network.

Sequential Feature Selection for Audio Features

This example shows a typical workflow for feature selection applied to the task of spoken digit recognition.

Learn Pre-Emphasis Filter Using Deep Learning

Use a convolutional deep network to learn a pre-emphasis filter for speech recognition.

Featured Examples