一种基于最大公共子图的文本谱聚类算法

英文篇名：A Text Spectral Clustering Algorithm Based on Maximum Common Subgraph
作者：冯仁群山 ; 陈笑蓉
英文作者：FENG Renqunshan;CHEN Xiaorong;College of Computer Science and Technology,Guizhou University;
关键词：文本聚类 ; 谱聚类 ; 最大公共子图
英文关键词：text clustering;;spectral clustering;;maximum common subgraph
中文刊名：GZDI
英文刊名：Journal of Guizhou University(Natural Sciences)
机构：贵州大学计算机科学与技术学院;
出版日期：2018-04-15
出版单位：贵州大学学报(自然科学版)
年：2018
期：v.35
基金：国家自然科学基金项目资助(61363028)
语种：中文;
页：GZDI201802017
页数：6
CN：02
ISSN：52-5002/N
分类号：87-92

摘要

传统的基于空间向量的文本谱聚类方法容易忽略文本上下文之间的语义联系,通过图结构进行文本表示可以很好的解决这一问题,在此基础上,本文提出了基于最大公共子图的谱聚类算法——SC-MCS算法。该算法通过求解文本之间的最大公共子图来进行文本相似度的计算,最后进行文本聚类。实验结果表明,与传统的基于空间向量的文本谱聚类方法相比,该算法在准确率和召回率都取得了一定的提升。
When using the traditional text spectral clustering method based on vector space,the context semantic relations are easily ignored. But the problem can be solved by representing text through the graph structure,on the basis of which,a spectral clustering algorithm based on the maximum common subgraph was proposed( hereafter called SC-MCS). The algorithm calculates text similarity by solving the maximum common subgraph of texts.The experimental results show that compared with the traditional text spectral clustering method based on vector space,the algorithm has improved accuracy and recall rate.

引文

[1]VONLUXBURG U.A tutorial on spectral clustering[J].Statistics and computing,2007,17(4):395-416.
    [2]SALTON G,WONG A,YANG C S.A vector space model for automatic indexing[J].Communications of the Acm,1975,18(11):613-620.
    [3]SCHENKER A,LAST M,BUNKE H,et al.Comparison of distance measures for graph-based clustering of documents[C]//Iapr International Conference on Graph Based Representations in Pattern Recognition.York,UK:Springer-Verlag,2003:202-213.
    [4]BUNKE H,FOGGIA P,GUIDOBALDI C,et al.A Comparison of Algorithms for Maximum Common Subgraph on Randomly Connected Graphs[C]//Joint Iapr International Workshop on Structural,Syntactic,and Statistical Pattern Recognition.Italy:Springer-Verlag,2002:123-132.
    [5]SHI J,MALIK J.Normalized Cuts and Image Segmentation[J].IEEE Transactions on Pattern Analysis&Machine Intelligence,2000,22(8):888-905.
    [6]蔡晓妍,戴冠中,杨黎斌.谱聚类算法综述[J].计算机科学,2008,35(7):14-18.
    [7]周昭涛,卜东波,程学旗.文本的图表示初探[J].中文信息学报,2005,19(2):36-43.
    [8]刘巧凤.基于图结构的中文文本聚类方法研究[D].大连:大连理工大学,2009.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700