


Detect and classify animal sounds in huge sets of acoustic data acquired from oceans, fields, forests, and jungles


Develop a high-performance computing platform for acoustic data analysis using MATLAB, Parallel Computing Toolbox, and MATLAB Parallel Server


  • 多年的发展时间节省了
  • 分析时间从几周减少到几个小时
  • Previously unprocessed data analyzed in days

“High-performance computing with MATLAB enables us to process previously unanalyzed big data. We translate what we learn into an understanding of how human activities affect the health of ecosystems to inform responsible decisions about what humans do in the ocean and on land.”

生物声学研究计划使用的声学分析设备,以收集大型鲸鱼和其他海洋哺乳动物的数据。照片由Dimitri Ponirakis提供。

For more than 30 years, scientists have studied local animal populations by recording animal sounds in oceans, jungles, forests, and other natural environments. They use the results to assess the effect of man-made noise on natural environments, monitor endangered animal populations, and investigate animal communication. Passive acoustic monitoring systems record sounds continuously, generating terabytes of data. Scientists are often unable to process even 1% of this data because they lack the necessary advanced algorithms and processing capacity.

康奈尔(Cornell)鸟类学实验室的生物声学研究计划(BRP)科学家分析了大量的声学数据®,并行计算工具箱™和MATLAB Parallel Server™。该项目由海军研究办公室和国家海洋合作伙伴计划的赠款资助,由康奈尔大学的两名主要研究人员领导:BRP高级科学家兼董事Christopher Clark博士,以及Peter Data Scientist的首席数据科学家Peter Dugan博士对于brp。

“MATLAB and MATLAB parallel computing tools gave us the flexibility to dynamically improve and adapt the algorithms that we use to process our big acoustic data sets,” says Dr. Clark. “If we were using C++ or a similar language, we would not be able to move as quickly or explore as many scenarios.”


分析声学数据的研究人员必须与天气,其他动物以及附近的机械和车辆的噪音抗衡。物种内部个体的动物声音的可变性是另一个并发症。这两个因素 - 命名和可变性 - 提示假阳性和负面因素的数量,从而降低了检测算法的准确性。

处理上百tb的数据BRP is gathering presents another challenge. A typical project involves processing years of raw acoustic data—up to 10TB—recorded on multiple channels. Each channel may capture hundreds of millions of events—sounds that stand out when the data is viewed as a spectrogram. Algorithms tested on small, high-quality samples are often considerably less accurate when applied to larger, noisier data sets.





The algorithms use pattern matching, edge detection, connected region analysis, convolution, and other techniques supported by Image Processing Toolbox™ and Signal Processing Toolbox™, as well as machine learning techniques supported by Fuzzy Logic Toolbox™ and Deep Learning Toolbox™.




BRP与MarineXplore和Kaggle社区合作,赞助了一项全球竞赛,其中240多名参与者提交了算法,以检测和分类北大西洋右鲸的Upsweep联系电话。BRP使用其MATLAB HPC平台识别最准确的算法,该算法将用于防止与鲸鱼相撞。

In addition to detection and classification algorithms, BRP uses MATLAB for noise analysis and acoustic modeling, in which the time and frequency dispersion effects of marine or terrestrial environments are captured and simulated.


  • 多年的发展时间节省了。杜根博士说:“对预计费用的研究表明,如果我们必须自己这样做,那将需要三年,100万美元,以及许多外部帮助,以开发我们需要的HPC平台。”“使用并行计算工具箱和MATLAB并行服务器,我们在三个月内开发了该平台。”

  • 分析时间从几周减少到几个小时。“It took one of our algorithms 19 weeks to process 90 days of data,” says Dr. Dugan. “Using Parallel Computing Toolbox and MATLAB Parallel Server, we completed the same analysis on our cluster in 8 hours.”

  • Previously unprocessed data analyzed in days。“One data set captured 100,000 hours of sound. It was so large that we had previously processed less than 1% of it, estimating that it would take a year or more to process the rest,” says Dr. Dugan. “With our MATLAB HPC platform, we processed the data six times, using different detection algorithms, in two days.”

Cornell University is among the 1300 universities worldwide that provide campus-wide access to MATLAB and Simulink. With the Campus-Wide License, researchers, faculty, and students have access to a common configuration of products, at the latest release level, for use anywhere—in the classroom, at home, in the lab or in the field.