基于混合策略的英汉双语新闻聚类研究
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Research on English-Chinese Bilingual News Clustering Based on Mixed Strategy
  • 作者:韩普 ; 万接喜 ; 王东波
  • 英文作者:HAN Pu,WAN Jie-xi,WANG Dong-bo (Department of Information Management,Nanjing University,Nanjing 210093,China)
  • 关键词:双语聚类 ; 多语聚类 ; 混合策略方法
  • 英文关键词:bilingual clustering;multilingual clustering;mixed strategy
  • 中文刊名:QBKX
  • 英文刊名:Information Science
  • 机构:南京大学信息管理系;
  • 出版日期:2013-01-05
  • 出版单位:情报科学
  • 年:2013
  • 期:v.31;No.257
  • 基金:教育部人文社会科学重点研究基地重大项目(08JJD870225);; 2011年南京大学研究生科研创新基金资助项目(2011CW12)
  • 语种:中文;
  • 页:QBKX201301022
  • 页数:5
  • CN:01
  • ISSN:22-1264/G2
  • 分类号:121-125
摘要
英汉双语文本聚类是一项非常有价值的研究。使用单语言文本聚类算法,在英汉双语新闻语料基础上,对基于中文单语、英文单语和英汉双语混合的方法进行了文本聚类比较研究,实验结果表明,基于英汉双语混合的文本聚类方法可以取得较好的聚类结果。
        English-Chinese bilingual doucment clustering is a valuale research.Based on monolingual algorithm,the paper makes an comparative study about monolingual-based clustering and mixed language-based method clustering by using the corpus of English-Chinese bilingual news document,.According to the experimental result,it shows that mixed language-based method can make a better performance.
引文
1 Boley D,Gini M,Gross R,et al.Partitioning-Based cluster ing for web document categorization[J].Decision Support Sys tem Journal,1999,27(3):329-341.
    2 Mao J,Jain A K.A self-organizing network for hyperellipsoi dal clustering[J].IEEE Trans.Neural Networks,1996,7(2):16-29.
    3 Cai WL,Chen SC,Zhang DQ.Fast and robust fuzzy c-meansclustering algorithms incorporating local information for im age segmentation[J].Pattern Recognition,2007,40(3):825-833.
    4章成志,王惠临.多语言文本聚类研究综述[J].现代图书情报技术,2009,(6):31-36.
    5 Chen H H,Lin C J.A Multilingual News Summarizer[C].InProceedings of the 18th International Conference on Computa tional Linguistics,2000:159-165.
    6 Lawrence J L.Newsblaster Russian-English Clustering Perfor mance Analysis[R].Columbia Computer Science TechnicalReports,2003.
    7 David K,Evans J,Klavans R.Columbia Newsblaster:Multilin gual News Summarization on the Web Demonstration[A].HLT-NAACL 2004[C].PA,USA,2004:1-4.
    8 Mathieu B,Besancon R,Fluhr C.Multilingual document clus ters discovery[C].In Proceedings of RIAO2004,2004:1-10.
    9 Montalvo S,Martinez R,Casillas A,et al.Multilingual Docu ment Clustering:an Heuristic Approach Based on CognateNamed Entities[C].In Proceedings of the 21st InternationalConference on Computational Linguistics and 44th AnnualMeeting of the ACL,2006:1145-1152.
    10 Dumais S T,Letsche T A,Littman M L,et al.AutomaticCross-Language Information Retrieval using Latent Seman tic Indexing[C].In the Proceedings of the AAAI Symposiumon Cross-language Text and Speech Retrieval.American As sociation for Artificial Intelligence,1997:15-21.
    11 Chih-Ping Wei,Chistoper C Yang,Chia-Min Lin.A LatentSemantic Indexing-based approach to multilingual docu ment clustering[J].Desision Support Systems,2008,45(3):606-620.
    12 Montalvo S,Martinez R,Casillas A,et al.Bilingual NewsClustering Using Named Entities and Fuzzy Similarity[C].Inthe Proceedings of TSD 2007,2007:107-114.
    13刘素,柴松.命名实体的网络话题K-means动态检测方法[J].智能系统学报,2010,5(2):122-126.
    14赵世奇,刘挺,李生.一种基于主题的文本聚类方法[J].中文信息学报,2007,21(2):58-62.