导航
English 大连理工大学登录
人工智能应用
论文成果
Visual analytics for the clustering capability of data
发表时间:2019-03-09 点击次数:
论文类型: 期刊论文
第一作者: Lu ZhiMao
通讯作者: Lu, ZM (reprint author), Harbin Engn Univ, Pattern Recognit & Nat Computat Lab, Harbin 150001, Peoples R China.
合写作者: Liu Chen,Zhang Qi,Zhang ChunXiang,Fan DongMei,Yang Peng
发表时间: 2013-05-01
发表刊物: SCIENCE CHINA-INFORMATION SCIENCES
收录刊物: SCIE、EI
文献类型: J
卷号: 56
期号: 5
页面范围: 1-14
ISSN号: 1674-733X
关键字: data mining; clustering analysis; visual analysis; minimum distance spectrum; nearest neighbor spectrum; outliers
摘要: Clustering analysis is an unsupervised method to find hidden structures in datasets and has been widely used in various fields. However, it is always difficult for users to understand, evaluate, and explain the clustering results in the spaces with dimension greater than three. Although high-dimensional visualization of clustering technology can express clustering results well, it still has significant limitations. In this paper, a visualization cluster analysis method based on the minimum distance spectrum (MinDS) is proposed, aimed at reducing the problems of clustering multidimensional datasets. First, the concept of MinDS is defined based on the distance between high-dimensional data. MinDS can map any dataset from high-dimensional space to a lower dimension to determine whether the data set is separable. Next, a clustering method which can automatically determine the number of categories is designed based on MinDS. This method is not only able to cluster a dataset with clear boundaries, but can also cluster the dataset with fuzzy boundaries through the edge corrosion strategy based on the energy of each data point. In addition, strategies for removing noise and identifying outliers are designed to clean datasets according to the characteristics of MinDS. The experimental results presented validate the feasibility and effectiveness of the proposed schemes and show that the proposed approach is simple, stable, and efficient, and can achieve multidimensional visualization cluster analysis of complex datasets.
是否译文: