摘要
针对海量网络文档涵盖着广泛的主题和类别,需要使用大数据技术提取有用信息的问题,使用文本挖掘技术和进化模糊算法,基于模糊规则的分类器,提出一种增强型网络文档分类模型,将网络文档归到不同类别(领域)中,进化模糊算法可依据文档内容的变化实现文档分类的动态实时更新。通过和其它经典分类算法对比,验证了该分类算法能够取得较好的效果。
Due to the problem of large number of Internet documents which include a broad range of topics and categories,which need to use big data processing technology to extract the useful information,an enhanced Internet document classification model was put forward that can classify Internet document to different categories(domain)based on the classifier of fuzzy rules using text mining technology and evolutionary fuzzy algorithms.Among them,evolutionary fuzzy algorithms realized dynamic real-time updates of the document classification on the basis of the change of content.The proposed algorithm shows better effects through comparison with other classical classification algorithms.
引文
[1]Wen Aihong.Multi-classification cluster analysis of large data based on knowledge element in microblogging short text[J].Cluster Computing,2018,1:1-9.
[2]Andrei M.From image to text classification:A novel approach based on clustering word embeddings[J].Procedia Computer Science,2017,6(112):1783-1792.
[3]Kabadjov M,Steinberger J,Steinberger R.Multilingual statistical news summarization[C]//Proc of Multi-Source,Multilingual Information Extraction and Summarization,Theory and Applications of Natural Language Processing.Springer Berlin Heidelberg,2013:229-252.
[4]Kim D,M Jo,Hwang E.SNS-based issue detection and related news summarization scheme[C]//Proc of the 8th International Conference on Ubiquitous Information Management and Communication.ACM,2014:1-7.
[5]Makki R.Twitter message recommendation based on user interest profiles[C]//Proc of IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.IEEE,2016:406-410.
[6]Makki R.Context-specific sentiment lexicon expansion via minimal user interaction[C]//Proc of 5th International Conference on Information Visualization Theory and Applications.IEEE,2014:178-186.
[7]Kim D,Hwang E,Rho S.Twitter trends:A spatio-temporal trend detection and related keywords recommendation scheme[J].Multimedia Syst,2014,21(1):73-86.
[8]Yu Jiangsheng,Chen Xuewen.Latent topic-semantic indexing based automatic text summarization[C]//Proc of 15th IEEEInternational Conference on Machine Learning and Applications.IEEE,2016:120-126.
[9]LIU Weidong,LUO Xiangfeng,ZHANG Jun.Semantic summary automatic generation in news event[J].Concurrency and Computation:Practice and Experience,2017,10(29):41-45.
[10]Malhotra S,Dixit A.An effective approach for news article summarization[J].Int J Comput Appl,2013,76(16):5-10.
[11]Chowdhury SG,Routh S,Chakrabarti S.News analytics and sentiment analysis to predict stock price trends[J].Int JComput Sci Inform Technol,2014,5(3):3595-3604.
[12]Gambhir M.Recent automatic text summarization techniques:A survey[J].Artificial Intelligence Review,2017,47(1):1-66.
[13]Yang Wu.News recommendation method by fusion of contentbased recommendation and collaborative filtering[J].Journal of Computer Applications,2016,36(2):414-418.
[14]Francisci Morales G De,Gionis A,Lucchese C.From chatter to headlines:Harnessing the real-time web for personalized news recommendation[C]//Proc of the Fifth ACM International Conference on Web Search and Data Mining.ACM,2012:153-162.
[15]Huang Taiwen.Multilingual multi-document summarization with enhanced hLDA features[C]//Chinese Computational Linguistics and Natural Language Processing based on Naturally Annotated Big Data,2016:299-312.
[16]Ranjitha NS.Abstractive multi-document summarization[C]//Proc of International Conference on Advances in Computing,Communications and Informatics.IEEE,2017:1690-1694.