基于特征语谱图和自适应聚类SOM的快速说话人识别

首页 > 过刊浏览>2019年第19卷第15期 >211-218

基于特征语谱图和自适应聚类SOM的快速说话人识别
DOI:
                        
                    
作者:
                        
                        
                    
作者单位:东北师范大学物理学院计算智能研究所
作者简介:
通讯作者:
中图分类号:TP391
基金项目:国家自然科学基金(21227008)和吉林省科技发展计划（20170204035GX）

Fast Speaker Recognition Based on Characteristic Spectrogram and an Adaptive Clustering Self-organizing Feature Map

Author:

Affiliation:

Institute of Computational Intelligence，School of Physics，Northeast Normal University

Fund Project:

The National Natural Science Foundation of China (21227008)，The Jilin Provincial Science and Technology Development Plan Foundation, China (20170204035GX).

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

为获得说话人发音特征，基于仿生思想，提出一种基于语谱图统计的方法，通过对说话人短时语谱图的线性叠加获得可表征说话人稳定发音特征的特征语谱图。为解决资源受限的设备中说话人识别系统网络训练速度慢、识别效率低的问题，基于传统自组织映射（Self-Organizing feature Map，简称SOM）神经网络提出了一种自适应聚类SOM (Adaptive Clustering-SOM，简称AC-SOM) 算法，随着待识别说话人数的增加，自动调节增加竞争层神经元个数，直至聚类数达到说话人个数。采用该AC-SOM模型对100人的自建特征语谱图样本库进行聚类识别，最大训练时间只需304s，最大单张识别时间小于28ms；在识别人数相同时，相对于所对比的其他识别方法，该方法大大提升了网络训练速度和识别速度，满足了边缘智能（Edge Intelligence）系统中对数据处理与执行的实时性的要求。

Abstract:

To obtain a speaker’s pronunciation characteristics, we propose a method, based on an idea from bionics that uses spectrogram statistics to achieve a characteristic spectrogram, giving a stable representation of the speaker’s pronunciation, from a linear superposition of short-time spectrograms. To deal with the issue of slow network training and recognition speeds for speaker recognition systems on resource-constrained devices, we propose an adaptive clustering self-organizing feature map SOM (AC-SOM) algorithm, based on a traditional SOM neural network. This automatically adjusts the number of neurons in the competition layer based on the number of speakers to be recognized until the number of clusters matches the number of speakers. We have also built a 100-speaker database of characteristic spectrogram samples and applied our AC-SOM model to it, yielding a maximum training time of only 304s, with a maximum sample recognition time of less than 28ms. Compared with applying other approaches to the same number of people, our method offers greatly improved training and recognition speeds. This means it can potentially satisfy the real-time data processing and execution requirements of edge intelligence systems more easily than previous speaker recognition methods.

参考文献

相似文献

引证文献

引用本文

贾艳洁. 基于特征语谱图和自适应聚类SOM的快速说话人识别[J]. 科学技术与工程, 2019, 19(15): 211-218.
JIA Yanjie. Fast Speaker Recognition Based on Characteristic Spectrogram and an Adaptive Clustering Self-organizing Feature Map[J]. Science Technology and Engineering,2019,19(15):211-218.

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2018-12-21
最后修改日期:2019-03-13
录用日期:2019-03-06
在线发布日期: 2019-06-10
出版日期:

首页

期刊简介

投稿指南

分类索引

刊文选读

订阅指南

资料下载

样刊邮寄查询

常见问题解答

联系我们

引用本文

分享

文章指标

历史