摘要
英汉双语文本聚类是一项非常有价值的研究。使用单语言文本聚类算法,在英汉双语新闻语料基础上,对基于中文单语、英文单语和英汉双语混合的方法进行了文本聚类比较研究,实验结果表明,基于英汉双语混合的文本聚类方法可以取得较好的聚类结果。
English-Chinese bilingual doucment clustering is a valuale research.Based on monolingual algorithm,the paper makes an comparative study about monolingual-based clustering and mixed language-based method clustering by using the corpus of English-Chinese bilingual news document,.According to the experimental result,it shows that mixed language-based method can make a better performance.
引文
1 Boley D,Gini M,Gross R,et al.Partitioning-Based cluster ing for web document categorization[J].Decision Support Sys tem Journal,1999,27(3):329-341.
2 Mao J,Jain A K.A self-organizing network for hyperellipsoi dal clustering[J].IEEE Trans.Neural Networks,1996,7(2):16-29.
3 Cai WL,Chen SC,Zhang DQ.Fast and robust fuzzy c-meansclustering algorithms incorporating local information for im age segmentation[J].Pattern Recognition,2007,40(3):825-833.
4章成志,王惠临.多语言文本聚类研究综述[J].现代图书情报技术,2009,(6):31-36.
5 Chen H H,Lin C J.A Multilingual News Summarizer[C].InProceedings of the 18th International Conference on Computa tional Linguistics,2000:159-165.
6 Lawrence J L.Newsblaster Russian-English Clustering Perfor mance Analysis[R].Columbia Computer Science TechnicalReports,2003.
7 David K,Evans J,Klavans R.Columbia Newsblaster:Multilin gual News Summarization on the Web Demonstration[A].HLT-NAACL 2004[C].PA,USA,2004:1-4.
8 Mathieu B,Besancon R,Fluhr C.Multilingual document clus ters discovery[C].In Proceedings of RIAO2004,2004:1-10.
9 Montalvo S,Martinez R,Casillas A,et al.Multilingual Docu ment Clustering:an Heuristic Approach Based on CognateNamed Entities[C].In Proceedings of the 21st InternationalConference on Computational Linguistics and 44th AnnualMeeting of the ACL,2006:1145-1152.
10 Dumais S T,Letsche T A,Littman M L,et al.AutomaticCross-Language Information Retrieval using Latent Seman tic Indexing[C].In the Proceedings of the AAAI Symposiumon Cross-language Text and Speech Retrieval.American As sociation for Artificial Intelligence,1997:15-21.
11 Chih-Ping Wei,Chistoper C Yang,Chia-Min Lin.A LatentSemantic Indexing-based approach to multilingual docu ment clustering[J].Desision Support Systems,2008,45(3):606-620.
12 Montalvo S,Martinez R,Casillas A,et al.Bilingual NewsClustering Using Named Entities and Fuzzy Similarity[C].Inthe Proceedings of TSD 2007,2007:107-114.
13刘素,柴松.命名实体的网络话题K-means动态检测方法[J].智能系统学报,2010,5(2):122-126.
14赵世奇,刘挺,李生.一种基于主题的文本聚类方法[J].中文信息学报,2007,21(2):58-62.