基于空间分布优选初始聚类中心的改进K-均值聚类算法
DOI:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

TP301.6

基金项目:

山西省自然科学基金项目


An improved K-mean clustering algorithm based on spatial distribution to optimize the initial clustering center
Author:
Affiliation:

Fund Project:

Natural Science Foundation of Shanxi Province

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对海量数据聚类过程中,经典的K-均值聚类算法对其K个初始聚类中心点的选择以及数据集噪声十分敏感的问题,提出了一种针对海量数据考虑初始聚类中心点选择的聚类算法。该算法首先采用冒泡排序法对数据集进行排序,获取数据集的各维中心值组成第一个初始聚类中心点。其次,通过计算与第一个初始聚类中心点的欧式距离,对剩余候选初始聚类中心点进行优化选择,保证所有的聚类中心点均匀的分布在数据集密度较大的空间上,以此减少聚类过程中的迭代次数和提高聚类算法效率。最后,基于UCI中多个数据集,进行聚类算法对比实验,结果表明,在不降低聚类效果的前提下,该聚类算法的迭代次数平均降低到50%,该聚类算法所需的时间降低平均达10%,由实验结果还能推出,当点集的数目越多时,该算法就能表现出越明显的聚类优势效果。

    Abstract:

    Aiming at the problem that the classical K-mean clustering algorithm is very sensitive to the selection of K initial clustering center points and the noise of data sets in the mass data clustering process, a clustering algorithm considering the selection of initial clustering center points is proposed. Firstly, The algorithm first use bubble sort to sort the data set, and gets the data set"s center values of each dimension to form the first initial clustering center point. Secondly, by calculating the Euclidean distance from the first initial clustering center, the remaining candidate initial clustering center points are optimized to ensure that all clustering center points are uniformly distributed in the space with high data set density, so as to reduce the number of iterations in the clustering process and improve the efficiency of the clustering algorithm. Finally, based on the multiple data sets in the UCI, clustering algorithm comparison experiment, the results show that under the premise of without reducing the clustering effect, the clustering algorithm of iteration times reduced to 50% on average, the clustering algorithm to reduce the time needed for an average of 10%, by the result of the experiment can also launch, when the more the number of point set, the algorithm can show the effect of clustering, the more obvious advantages.

    参考文献
    相似文献
    引证文献
引用本文

宋仁旺,苏小杰,石慧. 基于空间分布优选初始聚类中心的改进K-均值聚类算法[J]. 科学技术与工程, 2021, 21(19): 8094-8100.
Song Renwang, Su Xiaojie, Shi Hui. An improved K-mean clustering algorithm based on spatial distribution to optimize the initial clustering center[J]. Science Technology and Engineering,2021,21(19):8094-8100.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2020-11-08
  • 最后修改日期:2021-04-19
  • 录用日期:2021-03-14
  • 在线发布日期: 2021-07-22
  • 出版日期:
×
律回春渐,新元肇启|《科学技术与工程》编辑部恭祝新岁!
亟待确认版面费归属稿件,敬请作者关注