基于卷积循环神经网络的语音逻辑攻击检测
DOI:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

TN912.3;TP391.4

基金项目:

国家重点研发计划项目(2017YFC0821000)、广州市科技计划项目(2019030004)、司法部司法鉴定重点实验室(司法鉴定科学研究院)开放基金


Research on Speech Logic Attack Detection Based on CNN-RNN-DNN Network
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    语音合成和语音转换等技术正逐渐成为合成语音的主流方法,合成语音对社会稳定和国家安全都具有潜在的风险。为进一步提高合成、转换伪造语音检测的准确率,本文从混合网络模型,特征选择出发,提出了基于CNN-RNN-DNN网络的三种混合网络模型,分别为CNN-LSTM-DNN、CNN-GRU-DNN、CNN-BiLSTM-DNN。模型中CNN部分可以进行下采样,RNN部分解决语音中的时序问题,DNN部分则实现分类功能。每种混合网络模型包含20层网络层。对提取的6种声学特征进行实验,其中CNN-LSTM-DNN+MFCC的组合表现最优,等错误率为5.79%,比ASVspoof2019提供的B02基线系统低28.43%。比较了三种混合网络结合6种特征的表现并增加了其与4种单独网络的对照实验,结果表明本文提出的混合网络模型具有性能稳定、准确率高等优点且MFCC特征及MFCC+LFCC混合特征更适合此模型。

    Abstract:

    Speech synthesis and speech conversion and other technologies are gradually becoming the mainstream methods for synthesizing speech, which has potential risks to social stability and national security. To further improve the accuracy of synthesized and converted forged speech detection, three hybrid network models are proposed from the hybrid network model, feature selection, which are based on CNN-RNN-DNN networks, namely CNN-LSTM-DNN, CNN-GRU-DNN and CNN-BiLSTM-DNN. Subsampling can be carried out by the CNN part of the model; the timing problem of speech can be solved by the RNN part; and the classification function can be realized by the DNN part. 20 network layers are contained in each fusion network model. The extracted 6 acoustic features were tested, among which the combination of CNN-LSTM-DNN+MFCC performed the best, with an equal error rate of 5.79%, which was 28.43% lower than the B02 baseline system provided by ASVSPoof2019. At the same time, the performance of three fusion networks combined with six characteristics is compared. The results show that the hybrid network model proposed in this paper has the advantages of stable performance and high accuracy, besides the MFCC feature and MFCC+LFCC fusion feature is better fit with this fusion network.

    参考文献
    相似文献
    引证文献
引用本文

杨海涛,王华朋,楚宪腾,等. 基于卷积循环神经网络的语音逻辑攻击检测[J]. 科学技术与工程, 2022, 22(18): 7937-7944.
Yang Haitao, Wang Huapeng, Chu Xianteng, et al. Research on Speech Logic Attack Detection Based on CNN-RNN-DNN Network[J]. Science Technology and Engineering,2022,22(18):7937-7944.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2021-10-12
  • 最后修改日期:2022-06-08
  • 录用日期:2022-01-22
  • 在线发布日期: 2022-07-14
  • 出版日期:
×
诚邀您填写“面向2040消费需求的建筑领域工程科技发展方向研究”调查问卷