用户名: 密码: 验证码:
网络热点话题发现的关键技术研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着互联网技术的不断革新和计算机技术的迅速更替,互联网已经由最初的通信网络发展成Web2.0模式。在Web2.0时代,新型的网络资源及网络Web应用程序不断增加,存在“信息孤岛”和“信息过载”现象,从海量的网络信息资源中发现和分析热点话题成为亟待解决的重要问题。尽管机器学习、自然语言处理等多个方面的技术已经在网络热点话题发现中得到了广泛的应用,但是现有的网络热点话题发现算法具有相对局限性,算法的性能仍然不能达到用户满意的标准,还有许多问题有待进一步研究。针对存在的信息资源多样化和冗余性、提出了资源聚合的方法,针对潜在关键词、高维灾难、时间延迟等问题,提出了基于Mantaras距离优化的关键词词组的提取、基于蚁群优化的迭代自适应聚类算法及基于特征优化的热点话题过滤算法,并且通过实验验证了所提出算法的准确性和高效性。
With the Internet technology development and computer technological innovation, the Internet has evolved into Web2.0from the original form of the Web. Increasing number of new network resources and web applications live with "Island of infomation" and "Infomation overload" phenomenon. Discover and analysis hot topics from massive internet infomations become an important problem to be resoled. Although machine learning, natural language process technology has been widely used in network hot topic discovery, but the existing network hot topic detection algotithm still have relative limitations. The result of algorithm still cannot meet the customer's standard, many issues there to be studied further. This study focus on the diversity and redundancy information resource, potential keywords, high-dimensional disater, time latency and other issues, provide a new method of resource aggregation, a new method of network information resource integration, a new method of keywords extraction based on Mantaras distance optimization and also an iteractive adaptive clustering algorithm based on ant colony optimization and hot topic filtering algorithms based on feature. High-performance and high-precision of the proposed algorithms were evalued by experiments.
引文
1. R. A. Jarvis. Edward A. Patrick. Clustering Using a Similarity Measure Based on Shared Near Neighbors. IEEE Transactions on Computers,1973.22(11):1025-1034.
    2. JIM C. Web2.0:Is it a whole new internet?. http://www.mima.org.2005205218/2006203215,2007.
    3. Marti Hearst. Multi-paragraph segmentation of expository text. In 32nd Annual Meeting of the Association for Computational Linguistics,1994,9-16.
    4. Brett Kessler, Geoffrey Nunberg, Hinrich Schvtze. Automatic detection of text genre. The 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics,1997,32-38.
    5. Martin, A., Doddington, G., Kamm. The DET Curve in Assessment of Detection Task Performance. European Speech Communication Association,1997,4:1895-1898.
    6. P. van Mulbregt. I. Carp. L. Gillick. Text segmentation and topic tracking on broadcast news via a hidden Markov model approach. The 5th international conference on spoken language processing 1998.
    7. J. M. Ponte. W. B. Croft. A language modeling approach to information retrieval. Proceedings of the 21st ACM SIGIR Conference on Research and Development in Information Retrieval,1998,275-281.
    8. James Allan. Ron Papka. Victor Lavrenko. On-line new event detection and tracking. Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval,1998,37-45.
    9. F. Walls. H. Jin, S. Sista and R. Schwartz. Probabilistic models for topic detection and tracking. Proceedings.ICASSP 99,1999.
    10.第25次中国互联网络发展状况统计报告.CNNIC. http://tech.163.com/10/0115/11/5T2MBUGK000943MC.html,2009.
    11. Doug Beeferman. Adam Berger, John Lafferty. Statistical Models for Text Segmentation. Machine Learning,1999,34(1-3):177-210.
    12. S. Dharanipragada, M. Franz, J. S. McCarley. Story Segmentation and Topic Detection in the Broadcast News Domain. In Proceedings of the DARPA Broadcast News Workshop,1999.
    13. J. M. Schultz. M. Liberman Topic Detection and Tracking using idf-Weighted Cosine Coefficient. In Proceedings of the DARPA Broadcast News Workshop,1999.
    14. J. Yamron. I. Carp, L. Gillick, S.Lowe. Topic tracking in a news stream. In Proceedings of DARPA Broadcast News Workshop,1999,133-136.
    15. Ferret O., Grau B. A Topic Segmentation of Texts based on Semantic Domains. ECAI 2000,426-430.
    16. Brigitte Bigi, Armelle Brun, Jean-Paul Haton. A Comparative Study of Topic Identification on Newspaper and E-mail. in Proceedings of the String Processing and Information Retrieval Conference. 2001.
    17. Charles Wayne. Multilingual topic detection and tracking:successful research enabled by corpora and evaluation. In Proceedings of the Second International Conference on Language Resources and Evaluation,2000,1487-1493.
    18. Hang Li. Kenji Yamanishi. Topic analysis using a finite mixture model. Information Processing and Management:an International Journal,2003.39(4):521-541.
    19. Charles L. Wayne. Topic detection and tracking in English and Chinese. Proceedings of the fifth international workshop on Information retrieval with Asian languages,2000,165-172.
    20.柏宏飞,金城.基于分层块过滤和笔划特征的场景文字提取方法.计算机应用与软件,2010,5.60-63.
    21. F. Ciravegna. Adaptive Information Extraction from Text by Rule Induction and Generalization. In Bernhard Nebel, editor. Proceedings of the Seventeenth International Conference on Artificial Intelligence,2001,1251-1256.
    22. Marie-Francine Moens. Rik De Busser. Generic topic segmentation of document texts. Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval,2001,418-419.
    23. J. Allan, R. Gupta, V. Khandewal. Topic models for summarizing novelty. In Proceedings of the Workshop on Language Modeling and Information Retrieval,2001.66-71.
    24. Olivier Ferret, Brigitte Grau. A bootstrapping approach for robust topic analysis. Natural Language Engineering,2002,8(3):209-233.
    25. Yi Zhang, Jamie Callan, Thomas Minka. Novelty and redundancy detection in adaptive filtering. Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval,2002,11-15.
    26.:王飞,张德贤,韩金淑.蚁群优化与模糊聚类结合的文本聚类研究.计算机工程与应用,2010.46(32),126-129.
    27. S. Chung, D. McLeod. Dynamic topic mining from news stream data. In Proceedings of International Conference on Ontologies. Databases and Applications of Semantics,2003,653-670.
    28.田力威,曹安得.基十信息熵的蚁群聚类组合算法的研究.计算机应用研究,2011,28(4),1269-1271.
    29. James Allan, Courtney Wade, Alvaro Bolivar. Retrieval and novelty detection at the sentence level. Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval.2003.
    30. Ramesh Nallapati. Semantic language models for topic detection and tracking. Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology:Proceedings of the HLT-NAACL 2003 student research workshop,2003,1-6.
    31. Kurtz, A. J., Mostafa. J. Topic Detection and Interest Tracking in a Dynamic Online News Source. Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries. IEEE Computer Society Washington,2003,122-124.
    32. Chang T-H, Lee Ch-H. Topic segmentation for short texts. Proceedings of PACLIC 17, Colips Publications,2003,159-165.
    33. Matsuo, Y., Ishizuka, M. Keyword Extraction from a Single Document Using Word Co-Occurrence Statistical Information. Proc.16th Intl. Florida AI Research Society,2003,392-396.
    34. Kelly, D., Diaz, F., Belkin, N. J.. A user-centered approach to evaluating topic models. Proceedings of the European Conference on Information Retrieval,2004,27-41.
    35. Xiaoyong Liu, W. Bruce Croft. Cluster-based retrieval using language models. Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, 2004,25-29.
    36. Adar, E., Zhang, L., Adamic, L.. Implicit structure and the dynamics of blogspace. Presented at the Workshop on the Weblogging Ecosystem at the 13th International World Wide Web Conference,2004.
    37. Daniel Gruhl, R. Guha. David Liben-Nowell. Information diffusion through blogspace. Proceedings of the 13th international conference on World Wide Web,2004.
    38.王培崇.基于群智能计算技术的网络入侵检测模型研究.中国矿业大学(北京),2010.
    39.彭菲菲.网络资源聚合技术的研究与应用.山东科技大学,2009.
    40. Joe Carthy, A Smeaton. The Design of a Topic Tracking System. Proceedings of the BCS-IRSG colloquium on IR Research.2000.
    41. Armelle BRUN, Kamel SMAILI. Jean-Paul HATON. Experiment Analysis in Newspaper Topic Detection. String Processing and Information Retrieval. SPIRE 2000. Proceedings. Seventh International Symposium on Digital Object Identifier,2000.
    42.莫映,王开福.集体智慧编程.北京:电子工业出版社,2009.
    43. Wai Lam, Helen M. Meng, Kin Hui. Multilingual topic detection using a parallel corpus. Proceedings of the DARPA TDT 2000 Workshop,2000.
    44. Seo Y W, Sycara K. Text clustering for topic detection. USA:Carnegie Mellon University,2004.
    45.阿稳.陈刚译.智能Web算法.北京:北京:电子工业出版社.2011.
    46. Yiming Yang. A study on thresholding strategies for text categorization. In Proc. of the ACM SIGIR2001, 2001,137-145.
    47.罗亚平.基于用户浏览行为的网络热点话题发现模型研究.北京邮电大学.2008.
    48. Pyung Kim, Sung Hyon Myaeng. Usefulness of temporal information automatically extracted from news articles for topic tracking. ACM Transactions on Asian Language Information Processing,2004,3(4): 227-242.
    49. Fernando Diaz, Rosie Jones. Using temporal profiles of queries for precision prediction. Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval,2004.
    50. Makkonen, J., Ahonen-Myka, H.. Utilizing temporal information in topic detection and tracking. In Proceedings of the European Conference on Digital Libraries,2003.393-404.
    51. Y. Uzun. Keyword Extraction Using Naive Bayes. Bilkent University, Department of Computer Science, Turkey www.es.bilkent.edu.tr/-gu venir/courses/CS550/Workshop/Yasin_Uzun.pdf,2005.
    52. Lun-Wei Ku, Li-Ying Lee. Tung-Ho Wu. Major topic detection and its application to opinion summarization. Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval,2005.
    53. Ravi Kumar, Jasmine Novak, Prabhakar Raghavan. On the bursty evolution of blogspace. Proceedings of the 12th international conference on World Wide Web,2003.
    54. Gabriel Pui Cheong Fung, Jeffrey Xu Yu. Philip S. Parameter free bursty events detection in text streams. Proceedings of the 31st international conference on Very large data bases.2005.
    55. Trieschnigg, D., Kraaij, W. Hierarchical topic detection in large digital news archives. In:Proceedings of the Fifth Dutch-Belgian Information Retrieval Workshop.2005,55-62.
    56. Allan J, Harding S, Fisher D. Taking Topic Detection From Evaluation to Practice[A]. System Sciences. Proceedings of the 38th Annual Hawaii International Conference,2005.
    57. Hoogma, N. The Modules and Methods of Topic Detection and Tracking. Proceedings of the 2nd student conference on IT,2005.
    58. Gediminas Adomavicius, Alexander Tuzhilin. Toward the Next Generation of Recommender Systems:A Survey of the State-of-the-Art and Possible Extensions, IEEE Transactions on Knowledge and Data Engineering,2005,17(6):734-749.
    59. Eytan Adar, Lada A. Adamic. Tracking Information Epidemics in Blogspace. Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence,2005,207-214.
    60.Lemnitzer, L., Degorski. L.:Language technology for eLearning-implementing a keyword extractor. In: EDEN Research Workshop Research into online distance education and eLearning,2006.
    61. M. Oka. H. Abe. K. Kato. Extracting topics from weblogs through frequency segments. In Proc. of the Workshop on the Weblogging Ecosystem:Aggregation. Analysis and Dynamics.2006.
    62. Andreas Krause, Jure Leskovec, Carlos Guestrin. Data association for topic intensity tracking. Proceedings of the 23rd international conference on Machine learning.2006,497-504.
    63. S. C. H. Haichao Dong, Y. He. Structural analysis of chat messages for topic detection. Online Information Review,2006,30(5):496-516.
    64. Sekiguchi Y, Kawashima H, Okuda H. Topic detection from Blog documents using users'interests. In: Aberer K. Hara T, eds. Proc. of the 7th Int'l Conf. on Mobile Data Management. Washington:IEEE Computer Society,2006,108-111.
    65. Lothar Lemnitzer, Paola Monachesi. Keyword extraction for metadata annotation of Learning Objects. In Proceedings of Workshop on Natural Language Processing and Knowledge Representation for eLearning Environments,2007.
    66. Wan, X. Y., Yang, J. W.. Towards Iterative Reinforcement Approach for Simultaneous Document Summarization and Keyword Extraction. In Proceedings of ACL.2007.
    67. McGlohon, M.:Lekovec. Finding Patterns in Blog Shapes and Blog Evolution. In:Proceedings of ICWSM 2007,2007.
    68. Kuan-Yu Chen, Luesak Luesukprasert, Seng-cho T. Chou. Hot Topic Extraction Based on Timeline Analysis and Multidimensional Sentence Modeling. IEEE Transactions on Knowledge and Data Engineering,2007,19(8):1016-1025.
    69. Kimura, M., Saito, K., and Nakano, R. Extracting influential nodes for information diffusion on a social network. In Proceedings of the 22nd AAAI Conference on Artificial Intelligence,2007,1371-1376.
    70. R. Witte, S. Bergler. Fuzzy Clustering for Topic Analysis and Summarization of Document Collections. In Proc. of the 20th Canadian Conference on Artificial Intelligence.2007,476-488.
    71. Xuanhui Wang. ChengXiang Zhai, Xiao Hu. Mining correlated bursty topic patterns from coordinated text streams. Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining,2007.
    72. Akshay Java. Xiaodan Song, Tim Finin. Why we twitter:understanding microblogging usage and communities. Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis,2007,56-65.
    73. Weerkamp, W., de Rijke. Credibility improves topical blog post retrieval. In ACL-2008:HLT,2008, 923-931.
    74. Loulwah AlSumait, Daniel Barbara. Carlotta Domeniconi. On-line LDA:Adaptive Topic Models for Mining Text Streams with Applications to Topic Detection and Tracking. Proceedings of the 2008 Eighth IEEE International Conference on Data Mining,2008,3-12.
    75. Christian Wartena, Rogier Brussee. Topic Detection by Clustering Keywords. Proceedings of the 2008 19th International Conference on Database and Expert Systems Application,2008,54-58.
    76. Berlea, A., Dohring, M., Reuschling. Content and Communication based Subcommunity Detection using Probabilistic Topic Models. In International Conference on Intelligent Systems and Agents ISA. 2009.
    77. Sun Aaron R., Jiesi Cheng, Daniel D. Zeng. A Novel Recommendation Framework for Micro-blogging based on Information Diffusion. In the proceedings of the 19th Workshop on Information Technologies and Systems,2009.
    78. Jill Burstein, Derrick Higgins. Advanced Capabilities for Evaluating Student Writing. Detecting Off-Topic Essays Without Topic-Specific Training,2005,112-119.
    79. Makoto Nakatsuji, Makoto Yoshida, Toru Ishida. Detecting innovative topics based on user-interest ontology. Web Semantics:Science, Services and Agents on the World Wide Web,2009,7(2):107-120.
    80. Garcia. R. D., Berlea. A., Scholl, P.. Improving Topic Exploration in the Blogosphere by Detecting Relevant Segments. In Proceedings of I-KNOW'09.2009,177-188.
    81. Levent Bolelli.Seyda Ertekin. C. Lee Giles. Topic and Trend Detection in Text Collections Using Latent Dirichlet Allocation. Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval,2009.
    82. J.M. Odobez. Topic models for scene analysis and abnormality detection. In ICCV-12th International Workshop on Visual Surveillance,2009.
    83. J. Ratkiewicz. M. Conover. M. Meiss. Detecting and Tracking the Spread of Astroturf Memes in microblog streams. Proceedings of the 20th international conference companion on world wide web. 2010.
    84. Miles Efron. Hashtag retrieval in a microblogging environment. Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval,2010.
    85. M. Mathioudakis. N. Bansal, N. Koudas. Identifying, Attributing and Describing Spatial Bursts. The 36th International Conference on Very Large Data Bases,2010,3(1):1091-1102.
    86. Takeshi Sakaki. Makoto Okazaki, Yutaka Matsuo. Earthquake Shakes Twitter Users:Real-time Event Detection by Social Sensors. In WWW2010. ACM.2010.
    87. Holz. F.. S. Teresniak. Towards automatic detection and tracking of topic change. Computational Linguistics and Intelligent Text Processing,2010,327-339.
    88. Haewoon Kwak. Changhyun Lee. Hosung Park. What is Twitter, a social network or a news media?. Proceedings of the 19th international conference on World wide web,2010.
    89. Kamran Massoudi. Manos Tsagkias. Maarten de Rijke. Incorporating query expansion and quality indicators in searching microblog posts. Proceedings of the 33rd European conference on Advances in information retrieval.2011.
    90. Del Corso G M, Gulli A. Ranking a stream of news. Proceedings of the 14th international conference on World Wide Web.2005,5,97-106.
    91. He T T. Qu G Z. Li S W. Semi-automatic hot event detection. Proceedings of the 2nd International Conference on Advanced Data Mining and Applications.2006,4093,1008-1016.
    92. Yao J Y. Wang J. Li Z W. Ranking web news via homepage visual layout and cross-site voting. Proceedings of the 28th annual European Conference on Information Retrieval.2006,3936,131-142.
    93. Hu Y, Li M J, Li Z W. Discovering authoritative news sources and top news stories. Proceedings of 3rd Asia Information Retrieval Symposium.2006,4182,230-243.
    94. Wang C H, Zhang M. Ru L Y. Automatic online news topic ranking using media focus and user attention based on aging theory. Proceeding of the 17th ACM conference on Information and knowledge management.2008.10,1033-1042.
    95. Liu Y T. Gao B. Liu T Y. BrowseRank:Letting Web users vote for page importance. Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval.2008,7, 451-458.
    96. Ster M. Kriegel II P. Sander J. A density-based algorithm for discovering clusters in large spatial databases with noise. Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining.1996,226-231.
    97. S. Carter. M. Tsagkias. W. Weerkamp. Semi-supervised priors for microblog language identification. In Dutch-Belgian Information Retrieval workshop,l2011.
    98. Kerstin Denecke, Marko Brosowski. Topic detection in noisy data sources. ICDIM 2010:50-55.
    99.李东方,俞能海,尹华罡.一种Web2.0环境下互联网热点挖掘算法.电子与信息学报,2010,32(5):1141-1144.
    100.陶卿,姚穗,范劲松.一种新的机器学习算法Support Vector Machines模式识别与人工智能,2000.12(3):285-289.
    101.张学工.关于统计学习理论与支持向量机.自动化学报,2000,26(1):32-42.
    102.李有梅.基于词义的关键词抽取方法研究.情报理论与实践,2000,23(2):81-83.
    103.韩客松,王永成.一种用于主题提取的非线性加权方法.情报学报,2000,19(6):650-653.
    104.何清.机器学习与文本挖掘若干算法研究.北京:中科院计算技术研究所,2002,84-116.
    105.李保利,俞士汶.话题识别与跟踪研究.计算机工程与应用,2003,39(17):7-11.
    106.刘云中,林亚平,陈治平.基于隐马尔可夫模型的文本信息抽取.系统仿真学报,2004,16(3):507-510.
    107.张锋,樊孝忠,许云.基于遗传算法的文本聚类特征选择.华南理工大学学报(自然科学版),2004.32(z1):133-136.
    108.李素建,王厚峰.俞士汶.关键词自动标引的最大熵模型应用研究.计算机学报,2004,27(9):1192-1197.
    109.邹嘉麟,陈家训.Web信息资源整合系统模型和方法.计算机工程,2004,30(12):175-177.
    110.何蕾.Web信息资源整合系统的技术研究设实现.计算机工程与应用,2004,40(2):139-142.
    111.刘涛,吴功宜,陈正.一种高效的用十文本聚类的无监督特征选择算法.计算机研究与发展.2005.42(3):381-386.
    112.赵晖.支持向量机分类方法及其在文本分类中的应用研究.大连:大连理工大学,2005.11-65.
    113.杨文川,德妍,杨巍.数据挖掘中一种机器学习算法的研究.计算机科学,2005,32(8):207-208.
    114.冯晋,李春平.基于统计学和语义信息的中文文本主题识别技术.清华大学学报(自然科学版),2005,45(S1):1791-1794.
    115.周俊生,戴新宇.尹存燕.自然语言信息抽取中的机器学习方法研究.计算机科学,2005,32(3):186-189.
    116.郑家恒,卢娇丽.关键词抽取方法的研究.计算机工程,2005,31(18):194-196.
    117..王宁,王延章,叶鑫.一种基于数据中心的政府信息资源整合系统架构设计.计算机应用研究,2005,22(9):67-71.
    118.王萍,李其均.基于门户框架的资源整合系统的设计和实现.计算机应用研究,2005,22(6):162-164.
    119.赵华,赵铁军,张妹.基于内容分析的话题检测研究.哈尔滨工业大学学报,2006,38(10):1740-1743.
    120.索红光,刘玉树,曹淑英.一种基于词汇链的关键词抽取方法.中文信息学报,2006,20(6):25-30.
    121.蒋刚.核机器学习方法若干问题研究.四川:西南交通大学,2006,7-24.
    122.骆卫华,于满泉,许洪波.基于多策略优化的分治多层聚类算法的话题发现研究.中文信息学报,2006,20(1):29-36.
    123.张庆国,薛德军,张振海.海量数据集上基于特征组合的关键词自动抽取.情报学报,2006,25(5):587-593.
    124.赵欣,徐孟春,赵卓.基于J2EE的网络管理系统资源整合.计算机工程,2006,32(2):122-123.
    125.辛欣,李涓子.文本信息抽取平台的设计与实现--基于机器学习.第七届中文信息处理国际会议.2007,328-334.
    126.周亚东,孙钦东,管晓宏.流量内容词语相关度的网络热点话题提取.西安交通大学学报,2007.41(10):1142-1150.
    127.曾依灵,许洪波.网络热点信息发现研究.通信学报,2007,28(12):141-146.
    128.张亮.基于机器学习的信息过滤和信息检索的模型和算法研究.天津:天津管理学院,2007,51-80.
    129.罗瑜.支持向量机在机器学习中的应用研究.四川:西南交通大学,2007,17-52.
    130.时达明.林鸿飞.基于内容相关度和语义分析的Blog热点话题发现.第九届全国计算语言学学术会议,2007,570-575.
    131.史树敏.黄河燕.刘东升.自适应文本信息抽取方法研究.计算机工程与应用.2007.43(专刊):16-18.
    132.寇苏玲.蔡庆生.应用于用户兴趣建模的多文本关键词抽取研究.计算机仿真.2007,24(2):103-105.
    133.刘佳宾,陈超,邵正荣.基于机器学习的科技文摘关键词自动提取方法.计算机工程与应用.2007.43(14):170-172.
    134.褚瑞,卢锡城,肖侬.一种基于聚类的虚拟计算环境资源聚合方法.软件学报,2007,18(8):1858-1869.
    135.邱立坤,陶然.面向互联网的话题发现技术研究.全国网络与信息安全技术研讨会’2007,373-379.
    136.赵鹏,蔡庆生,王清毅.一种基于复杂网络特征的中文文档关键词抽取算法.模式识别与人工智能.2007,20(6):827-831.
    137.朱明,李香,郑烩.基于多学习策略的网页信息抽取方法.计算机应用与软件,2008,25(12):68-69.
    138.程娟.基于机器学习的网页文本抽取技术.图书馆学研究.2008,""(5):21-22.
    139.沈记全.矫吉祥.基于Ajax和Web services企业信息资源整合研究.2008,18(4):92-94
    140.钱爱兵.江岚.摧十改进TF-IDF的中文网页关键词抽取--以新闻网页为例.情报理论与实践.2008.31(6):945-950
    141.洪宇,张宇.范基礼.基于语义域语言模型的中文话题关联检测.软件学报.2008,19(9):2265-2275.
    142.张雪英.中文文本关键词自动抽取方法研究.情报学报,2008,27(4):512-520.
    143.赵华,赵铁军,赵霞.时间信息在话题检测中的应用研究.计算机科学,2008,35(1):221-222.
    144.刘星星.何婷婷,龚海军.网络热点事件发现系统的设计.中文信息学报, 2008,22(6):80-85.
    145.刘菲.黄萱菁,吴立德.利用关联规则挖掘文本主题词的方法.计算机工程,2008,34(7):81-83.
    146.陈先来,杨路明.基于均矢量相似性的机器学习样本集划分.中南大学学报(自然科学版),2009.40(6):1636-1641.
    147.张庆国,章成志,薛德军.适用于隐含主题抽取的K最近邻关键词自动抽取.情报学报,2009,28(2):163-168.
    148.赵丽,袁睿翕,管晓宏.博客网络中具有突发性的话题传播模型.软件学报,2009,20(5):1384-1392.
    149.金春霞,周海岩.基于机器学习的Web文本分类技术及算法.长春工业大学学报(自然科学版),2009,30(3):347-351
    150.王巍.杨武.齐海凤.基于多中心模型的网络热点话题发现算法.南京理工大学学报(自然科学版),2009,33(4):422-426.
    151.何国辉.吴礼发.基于机器学习的文本分类技术的研究.计算机与现代化,2009,""(8):4-6.
    152.王苑,徐德智.陈建二.复杂中文文本的实体关系抽取研究.计算机科学,2009,36(8):208-211.
    153.施聪莺.徐朝军.杨晓江.TFIDF算法研究综述.计算机应用,2009.29(S1):167-171.
    154.罗准辰,王挺.基于分离模型的中文关键词提取算法研究.中文信息学报,2009,23(1):63-70.
    155.洪宇.基于语义结构和时序特征的话题检测与跟踪技术研究.哈尔滨:哈尔滨工业大学,2009.45-93.
    156.薛贞霞.支持向量机及半监督学习中若干问题的研究.西安:西安电子科技大学,2009,5-13.
    157.胡瑜,王立志.基于HTML结构特征的网页信息提取.辽宁石油化工大学学报,2009,29(3):65-69.
    158.李恒训,张华平,秦鹏.基于主题词的网络热点话题发现.第五届全国信息检索学术会议CCIR2009,2009, 134-143.
    159.章成志.基于机器学习的文本聚类描述算法研究.第三届全国信息检索与内容安全学术会议.2009,216-225.
    160.邓箴,包宏.改进的关键词抽取方法研究.计算机工程与设计,2009,30(20):4677-4680.
    161.张虹.基于自动文本分类的关键词抽取算法.计算机工程.2009,35(12):145-147.
    162.周法国,王映龙:杨炳儒.非结构化信息抽取关键技术研究探讨计算机工程与应用,2009,,45(14):1-7.
    163.戴东波.汤春蕾,邱伯仁.一种优化多重过滤的序列查询算法,计算机研究与发展,2010,47(10):1785-1796.
    164.林达真,李绍滋,曹冬林.基于时间分布特征的博客突发事件检测.计算机工程与科学,2010,32(10):145-149.
    165.吴永辉.王晓龙,丁宇新.基于主题的自适应、在线网络热点发现方法及新闻推荐系统.电子学报.2010.38(11):2620-2624.
    166.祝伟华,卢熠.刘斌斌.基于HMM的Web信息抽取算法的研究与应用.计算机科学,2010,37(2):203-206.
    167.杨树仁,沈洪远.基于相关向量机的机器学习算法研究与应用.计算技术与自动化,2010,29(1):43-47.
    168.孙德才,孙星明,张伟.基于匹配区域特征的相似字符串匹配过滤算法.计算机研究与发展,2010.47(4):663-670.
    169.李小琳,何湘东,陈传明.一种利用不可行解的贝叶斯网学习算法.同济大学(自然科学版),2010.38(5):744-748.
    170.李凡长,何书萍,钱旭培.李群机器学习研究综述.计算机学报,2010,33(7):1115-1126.
    171.童亚拉,彭江.群智能在网络舆情热点发现及研判机制中的应用分析.电脑学习,2010,(4):128-129.
    172.杨尔弘.突发事件信息提取研究.北京语言大学,2005.
    173.沈凤仙.朱巧明,刘粉香.改进的Web文本自适应过滤策略.计算机与现代化,2010,(9):48-52.
    174.刘京礼.鲁棒最小二乘支持向量机研究与应用.北京:中国科学技术大学.2010,55-100.
    175.余伟.基于本体的微博客用户行为模型研究.广东技术师范学院学报(自然科学),2010,31(2):27-30.
    176.严卉珍.微博客平台信息资源共享研究.中国科技资源导刊,2010,42(1):54-58.
    177.胡学钢.李星华,谢飞.基于词汇链的中文新闻网页关键词抽取方法.模式识别与人工智能,2010.23(1):45-51.
    178.郑魁,疏学明,袁宏永.网络舆情热点信息自动发现方法.计算机工程,2010,36(3):4-6.
    179.刘铭,王晓龙.刘远超.基于词汇链的关键短语抽取方法的研究.计算机学报,2010,33(7):124-1255.
    180.石晶.李万龙.基于LDA模型的主题词抽取方法.计算机工程,2010,36(19):81-84.
    181.王刚,邱玉辉.基于本体及相似度的文本聚类研究.计算机应用研究,2010,27(7):2494-2497.
    182.陈友,程学旗,杨森.面向网络论坛的突发话题发现.中文信息学报,2010,24(3):29-36.
    183.李静月,李培峰,朱巧明.一种改进的TFIDF网页关键词提取方法.计算机应用与软件,2011.28(5):25-27.
    184.宋明月秋,张瑞雪.基于HTML树的网页结构相似度研究.情报学报,2011,30(2):160-165.
    185.苏伟峰,李绍滋,李堂秋.一个基于概念的中文文本分类模型,计算机工程与应用,2002,38(6): 193-195.
    186.王国胤,何晓.一种不确定性条件下的自主式知识学习模型.软件学报,2003,14(6):1096-1102.
    187.张猛,王大玲,于戈.一种基于自动阈值发现的文本聚类方法.计算机研究与发展,2004,41(10):1748-1753.
    188.孙爽,章勇.一种基于语义相似度的文本聚类算法.南京航孔航天大学学报,2006,38(6):712-716.
    189.朱征宇.李力沛.罗颖.一种应用于中文文本聚类的适应值函数.计算机科学,2009,36(5):244-247.
    190.张亮,李敏强.一种有限混合模型对无监督文本聚类的广义方法.模式识别与人工智能,2007,20(5):698-703.
    191.黄承慧,印鉴,侯防.一种结合词项语义信息和TF-IDF方法的文本相似度量方法.计算机学报,2011,34(5):856-864.
    192.徐森,卢志茂,顾国昌.使用谱聚类算法解决文本聚类集成问题.通信学报,2010,31(6):58-66.
    193.陈建超,胡桂武,杨志华.基于全局性确定聚类中心的文本聚类.计算机工程与应用,2011,47(10):147-150.
    194.钟将,刘龙海,梁传伟.基十成对约束的主动半监督文本聚类.计算机工程,2011,37(13):183-187.
    195.蔡岳,袁津生.基于改进DBSCAN算法的文本聚类.计算机工程,2011,37(12):50-52.
    196.张霞,尹怡欣,于海燕,基于模糊粒度计算的文本聚类研究.计算机工程与应用,2010,46(13):53-55.
    197.胡熠,陆汝占,陈玉泉.基于词典中词语量化关系的中文文本聚类研究.高技术通讯,2007,17(8):778-782.
    198.卜东波,白硕,李国杰.文本聚类中权重计算的对偶性策略.软件学报,2002,13(11):2083-2089.
    199.何峰,丁晓青.结合文本聚类和文本检索的语料选取方法.高技术通讯.2010,20(12):1224-1228.
    200.张云,冯博琴,麻首强.蚁群-遗传融合的文本聚类算法.西安交通大学学报,2007,41(10):1146-1150.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700