融合Swin Transformer和CNN的环境声音分类模型
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

TP391.7

基金项目:

国家自然科学基金项目(面上项目,重点项目,重大项目)


Environmental Sound Classification Model Combining Swin Transformer and CNN
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    环境声音分类已经成为计算机听觉领域的一项重要任务,可以作为计算机视觉的补充,帮助设备更好地理解环境和用户需求,具有广泛的应用前景,将对人类生活产生积极影响。近年来,环境声音分类领域采用了具有自注意力机制的Transformer模型,然而现有模型需要较大的内存,同时依赖于预训练的视觉模型,无法较好提取音频特征。为了解决这些问题并提高环境声音分类准确度,提出了一种新的具有双分支结构的Swin Conformer环境声音分类模型。通过融合卷积神经网络和具有窗口自注意力机制的Swin Transformer模型,以交互方式融合双分支特征并引入令牌语义模块,Swin Conformer模型在ESC-50和UrbanSound8K公共数据集上分别通过验证实现了98.1%和96.8%的分类准确度。与现有模型相比,具有更高的分类准确度,证明了该模型在环境声音分类任务中的可行性和优越性。

    Abstract:

    Environmental sound classification has become an important task in the field of computer hearing, which can be used as a supplement to computer vision to help devices better understand the environment and user needs, and has a wide range of application prospects, which will have a positive impact on human life. In recent years, Transformer model with self-attention mechanism has been adopted in the field of environmental sound classification. However, the existing model requires large memory and relies on pre-trained visual model, and cannot extract audio features well. In order to solve these problems and improve the accuracy of environmental sound classification, a new Swin Conformer environmental sound classification model with double branch structure is proposed. By fusing convolutional neural network and Swin Transformer model with window self-attention mechanism, the two-branch features are interactively fused and the token semantic module is introduced. The Swin Conformer model achieved 98.1% and 96.8% classification accuracy on ESC-50 and UrbanSound8K public data sets, respectively. Compared with the existing model, it has higher classification accuracy, which proves the feasibility and superiority of this model in the task of environmental sound classification.

    参考文献
    相似文献
    引证文献
引用本文

朱振飞,葛动元,姚锡凡,等. 融合Swin Transformer和CNN的环境声音分类模型[J]. 科学技术与工程, 2024, 24(28): 12259-12267.
Zhu Zhenfei, Ge Dongyuan, Yao Xifan, et al. Environmental Sound Classification Model Combining Swin Transformer and CNN[J]. Science Technology and Engineering,2024,24(28):12259-12267.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-01-20
  • 最后修改日期:2024-08-04
  • 录用日期:2024-03-21
  • 在线发布日期: 2024-11-05
  • 出版日期: