基于通道和帧级特征注意力模型的环境声音识别
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

TP391.42

基金项目:

基于机器学习的精密检测若干关键技术研究(No.517650,国家自然科学基金)


Environmental sound recognition based on channel and frame-level feature attention model
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    本文提出基于通道和帧级特征注意力的环境声音分类卷积神经网络模型。模型选择一维卷积对环境声音的二维对数梅尔谱特征进一步提取有效特征,再采用SE-Res2Net模块实现对浅层输入特征细粒度上的全局感受并获得通道注意力分数。在全连接层之前加入注意力统计池化,对输入的通道加权特征在帧级特征上施加注意力机制,获得不同时刻的帧级特征的重要度分数,加权计算后将不同通道的平均值μ和方差σ串联作为输出。采用Urbansound8K数据集对模型进行评估,实验最终在测试集达到94.5%的准确率,其表明本文所提模型可以学习到不同类声音的关键特征并正确分类。为进一步证明模型性能进行消融实验,分析实验结果可得对声音特征施加通道和帧级特征的注意力机制可使模型分类错误率的下降率为43.8%。

    Abstract:

    This paper proposes a convolutional neural network model for environmental sound classification based on channel and frame-level feature attention. The model selects one-dimensional convolution to further extract effective features from the two-dimensional logarithmic mel spectrum features of environmental sounds, and then uses the SE-Res2Net module to achieve a global perception of shallow input features at a fine-grained level and obtain channel attention scores. Attention statistical pooling is added before the full connection layer, and the attention mechanism is applied to the frame-level features of the input weighted channel features to obtain the importancescores ? of the frame-level features at different times for the channel. After weighted calculation, the average value μ and variance σ of the channel are connected in series as output.The experiment uses Urbansounda8k data set to train and test, and finally reaches 94.5 % accuracy, indicating that the channel-level and frame-level feature attention models have higher classification ability for sound. In order to further prove the performance of the model, the ablation experiment was carried out, and the analysis of the experimental results shows that the attention mechanism of applying channel and frame-level features to the sound features can reduce the classification error rate of the model by 43.8%.

    参考文献
    相似文献
    引证文献
引用本文

苏瑞轩,葛动元,姚锡凡. 基于通道和帧级特征注意力模型的环境声音识别[J]. 科学技术与工程, 2024, 24(16): 6792-6798.
Su Ruixuan, Ge Dongyuan, Yao Xifan. Environmental sound recognition based on channel and frame-level feature attention model[J]. Science Technology and Engineering,2024,24(16):6792-6798.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2023-05-05
  • 最后修改日期:2024-03-06
  • 录用日期:2023-10-10
  • 在线发布日期: 2024-06-13
  • 出版日期:
×
喜报!《科学技术与工程》入选国际著名数据库《工程索引》(EI Compendex)!
《科学技术与工程》“智能机器人关键技术”专栏征稿启事暨“2025智能机器人关键技术大会”会议通知