基于改进ECAPA-TDNN的法庭自动说话人识别
DOI:
作者:
作者单位:

中国刑事警察学院 公安信息技术与情报学院

作者简介:

通讯作者:

中图分类号:

TP391.4 D918.9

基金项目:

2017国家重点研发计划项目(2017YFC0821000);司法部司法鉴定重点实验室(司法鉴定科学研究院,KF202117)。


Forensic Automatic Speaker Recognition based on Enhanced ECAPA-TDNN
Author:
Affiliation:

School of Public Security Information Technology and Intelligence Criminal Investigation Police University of China,

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    为提高法庭说话人识别的可靠性和准确性,促进法庭语音检验方法和过程的科学评价范式转化,提出了一种基于改进ECAPA-TDNN网络架构的法庭自动说话人识别方法。该方法为提高模型的准确率和泛化能力,融合空间注意力机制、通道注意力机制和多头注意力机制。首先,选择训练效果最佳的频谱图与GFCC融合特征输入网络模型,把训练完成的神经网络作为深度特征提取器,然后,在法庭证据似然比量化评估体系中评估语音证据的强度。实验结果表明:在voceleb1数据集上,值为0.156,优于之前发表文献中的法庭自动说话人识别系统结果;在中文zhaishell数据集上,误判率和漏判率均为零,并且支持同源假设的似然比最小值为3.97e+6,支持非同源假设的似然比最大值为1.52e-31。该方法进一步提高了识别系统的可靠性和准确性,可以为法庭语音证据评估结论提供强有力的支撑。

    Abstract:

    In order to enhance the reliability and accuracy of speaker recognition in courtrooms, and facilitate the transformation of scientific evaluation paradigm for courtroom voice analysis methods and processes, a novel method for automatic speaker recognition in courtrooms based on an improved ECAPA-TDNN network architecture is proposed. This method integrates spatial attention mechanism, channel attention mechanism, and multi-head attention mechanism to enhance the accuracy and generalization capability of the model. The network model utilizes a fusion of spectrogram and GFCC features, selecting the one with the best training performance as the input. The trained neural network is employed as a deep feature extractor, followed by evaluating the strength of speech evidence using a likelihood ratio quantification evaluation system specifically designed for courtroom evidence. Experimental results demonstrate that on the voceleb1 dataset, the achieved value is 0.156, outperforming the previously published literature on automatic speaker recognition systems in courtrooms. On the Chinese Zhaishell dataset, the false acceptance rate and false rejection rate both reach zero, with a minimum likelihood ratio supporting the homogeneity hypothesis of 3.97e+6 and a maximum likelihood ratio supporting the heterogeneity hypothesis of 1.52e-31. Consequently, this method further enhances the reliability and accuracy of the recognition system, providing robust support for the conclusion of evaluating speech evidence in courtrooms.

    参考文献
    相似文献
    引证文献
引用本文

万玫汐,王华朋,闫道申,等. 基于改进ECAPA-TDNN的法庭自动说话人识别[J]. 科学技术与工程, , ():

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2023-10-12
  • 最后修改日期:2024-05-22
  • 录用日期:2024-05-29
  • 在线发布日期:
  • 出版日期:
×
亟待确认版面费归属稿件,敬请作者关注