Jumpstart your DCASE Challenge 2021 using MATLAB

Posted byJohanna Pingel那2021年6月28日

22 views (last 30 days) |0.Likes|0.comment

以下帖子来自Brian Hemmat，MathWorks的音频信号处理开发人员。

这Detection and Classification of Acoustic Scenes and Events (DCASE) community creates a yearly workshop and series of events that advance the state-of-the-art in computational scene and event analysis by bringing together researchers from both academic and industrial backgrounds.

每年，新的和更新的数据集和比赛都会发布，探索不同的应用，要求和目标。今年，DCASE 2021任务1A挑战是执行低复杂性声学场景分类，对各种录制设备（例如Studio - 质量麦克风等）和智能手机和摄像机的策略。目标是将音频分类为10个声学场景之一，如机场，地铁站，由电车旅行或公共广场旅行。从布拉格，巴黎和巴塞罗那等城市收集样品。

在matlab中创建基线

任务1A的官方基准在Python中释放，使用TensorFlow进行深度学习，并使用提供的DCENT实用程序工具箱进行预处理。

我重新实现了matlab的基线。MATLAB实现包含在单个脚本中，使非专家易于探索数据，了解基线实现，并修改其提交。Matlab中的音频工具箱提供了提取音频功能的功能和应用程序（audioFeatureExtractor) and augment data (audiodataAugmenter.), making it easy to explore modifications to the system.

挑战是建立一个模型的一部分的128 KB upper bound for non-zero parameters. This may be accomplished by developing a small model to begin with, by pruning a model, or by quantizing a model from the standard 32-bit floating point used for training to a smaller number of bits. This MATLAB baseline code leverages thedlquantizer对象并量化网络以使用带有深度学习工具箱模型量化库的8位整数。

注意：如果您还没有访问MATLAB，Deep Learing Toolbox和Audio Toolbox，则可以获得一个免费30天试用。

量化基线

应用量化是一种使用的直接任务dlquantizer。To use it, you specify the network you want to calibrate and the execution environment, and then calibrate with calibration data.

quantObj = dlquantizer(net,'ExecutionEnvironment','GPU');

这dlquantizerobject requires image datastores to perform calibration. Wrap the features and labels inAugmentedimageGedataStore对象。

augsimdstrain = upmentedimageageataStore（[Numfeatures，Numhops]，训练厂，大教训）;augsimdstest = upmentedimageageataStore（[NumFeatures，Numhops]，Testfeatures，避免）;

使用培训集来校准dlquantizerobject.

calResults = calibrate(quantObj,augimdsTrain);

One tip to keep in mind: Currently,dlquantizerdoes not support audioDatastore input. To use it with audio-based data (in this case, mel spectrograms), you must place the training data in memory, and then wrap it an augmentedImageDatastore, as shown in the code above. Then, you specify the augmentedImageDatastore as the calibration data to use when calibrating the network.

Mel spectrogram provides visualization for audio data

One advantage of usingdlquantizeris that it quantizes to int8, a low-precision data type that can be deployed to many embedded systems. It achieves this quantization result with minimal loss of accuracy, effectively creating the same network as the Python baseline which was quantized to float16.

Showing the Deep Network Quantizer app in action. You can use the app version ofdlquantizerto quickly see which layers are quantized, and the dynamic range of the weights, biases and activations, based on the dataset

开始的工具

这一基线代码的目标是激励您对Matlab的这种有趣挑战的解决方案。音频工具箱中的扩展功能可用于JumpStart您的设计探索。结果是更较小的，更包含的基线，更容易改善。虽然Python基线量化到Float16，但Int8提供了更小的推理模型。

基线正在开启GitHub.and you can download afree trialMatlab。

We hope that this baseline encourages new members to join the DCASE community, participate in the yearly competitions, and advance the state-of-the-art.

下载代码或叉子，然后开始！如果您在下面的评论中有任何疑问，请告诉我。