基于子音节表征的苗语语音合成方法
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

TP391

基金项目:

国家自然科学基金项目(62162012);贵州省科技计划项目(黔科合基础-ZK[2022]一般195,黔科合基础-ZK[2023]一般143,黔科合平台人才-ZCKJ[2021]007);贵州省教育厅自然科学研究项目(黔教技[2023]061号,黔教技[2023]012号,黔教技[2022]015号);贵州省青年科技人才成长项目(黔教合KY字[2021]115,黔教合KY字[2021]110);贵州省模式识别与智能系统重点实验室开放课题(GZMUKL[2022]KF01,GZMUKL[2022]KF05)、贵州省高层次创新型人才项目(编号:黔科合平台人才-GCC[2023]027);教育部产学合作协同育人项目(221001766110209)


Sub-syllable Representation-based Hmong Language Text-to-Speech Method
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    少数民族语言的语音合成有助于民族文化的传承、保护和发展,目前相关研究成果较少。针对不同声调的相同词发音相似时易出现语音合成错误的问题,该文提出一种基于子音节表征的苗语语音合成方法,该方法利用子音节作为训练基元来表征苗语发音信息,以区分学习不同音节间的相似发音。根据文本序列和梅尔谱图之间对齐的单调性,引入单调对齐损失来指导注意力模块进行更准确的对齐学习,以减少因注意力机制的自回归性带来的跳词、重复等合成现象。为验证所提方法的有效性,以自建苗语语音合成语料库HmongSpeech(下载链接:http://sxjxsf.gzmu.edu.cn/info/1728/1214.htm)作为基准数据集,与典型的语音合成方法进行对比实验。实验结果表明,所提方法能够降低不同声调的相同词发音相似时导致的合成错误率,词错误率仅为0.96%,较基线方法改善了6.25%。

    Abstract:

    Speech synthesis of minority languages contributes to the preservation, protection and development of national culture, while the research results in this field are currently limited. To address the problem of speech synthesis errors where words with different tones sound similar, a sub-syllable representation-based text-to-speech method for the Hmong language is proposed in this paper. The method utilizes sub-syllables as training primitives to accurately represent the pronunciation information of the Hmong language, enabling distinctive learning of similar sounds across different syllables. According to the monotonicity of alignment between text sequence and Mel-spectrogram, a monotonic alignment loss is introduced to guide the attention module to learn alignment more accurately, thereby reducing synthesis phenomena such as word skipping and repetition inherent in the autoregressive attention mechanism. To verify the effectiveness of the proposed method, a self-built Hmong language speech synthesis corpus, HmongSpeech(download link: http://sxjxsf.gzmu.edu.cn/info/1728/1214.htm), is utilized as the benchmark dataset. Comparative experiments are conducted with typical speech synthesis methods. The experimental results show that the proposed method successfully reduces the synthetic error rate caused by the similar pronunciation of words with different tones. Notably, the word error rate is only 0.96%, outperforming the baseline method by 6.25%.

    参考文献
    相似文献
    引证文献
引用本文

蔡姗,王林,谭棉,等. 基于子音节表征的苗语语音合成方法[J]. 科学技术与工程, 2024, 24(19): 8176-8185.
Cai Shan, Wang Lin, Tan Mian, et al. Sub-syllable Representation-based Hmong Language Text-to-Speech Method[J]. Science Technology and Engineering,2024,24(19):8176-8185.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2023-08-04
  • 最后修改日期:2024-06-13
  • 录用日期:2023-11-11
  • 在线发布日期: 2024-07-18
  • 出版日期:
×
亟待确认版面费归属稿件,敬请作者关注
《科学技术与工程》入选维普《中文科技期刊数据库》自然科学类期刊月度下载排行榜TOP10