摘要
非通用语言信息爆炸导致人们的时间更加稀缺且注意力更加发散。该文围绕韩国语文本的自动文摘问题,提出一种新的基于关键体词抽取的韩国语文摘算法。该文认为韩国语体词主要表示语义信息,而韩国语谓词更多地担负句法框架功能。实验结果表明基于关键体词抽取的文摘算法效果优于采用谓词或全词的效果,且新提出的韩国语文摘算法在韩国语文摘任务中能够达到最优性能,证明了体词主要表示语义信息的论断是有效的。
This paper addresses the issue of automatic summarization for Korean texts and presents a novel Korean summarization(KKS)method based on key-noun extraction.We deem that Korean nouns mainly represent semantic information,while Korean predicates are more responsible for syntactic frame function.The experimental results show that the performance of our KKS algorithm is better than that of predicate-based one or all-word-based one,and the KKS algorithm can achieve the best performance in the Korean summarization task,which also proves the effectiveness of our assertion for the semantic function of Korean nouns.
引文
[1]Horacio Saggion,Thierry Poibeau.Automatic Text summarization:Past,present and future[M].Multisource,Multilingual Information Extraction and Summarization,Springer,2013:3-21.
[2]H P Luhn.The automatic creation of literature abstracts[J].IBM Journal of Research and Development,1958,2(2):159-165.
[3]K S Jones,E-N Brigitte.Introduction:Automatic summarizing[J].Information Processing&Management,1995,31(5):625-630.
[4]Yu Lei,Ren Fuji.A study on cross-language text summarization using supervised methods[C]//Proceedings of the International Conference on Natural Language Processing and Knowledge Engineering,2009.
[5]Amini Massih-Reza,Gallinari Patrick.The use of unlabeled data to improve supervised learning for text summarization[C]//Proceedings of SIGIR Forum,2002:105-112.
[6]Nomoto Tadashi,Matsumoto Yuji.An experimental comparison of supervised and unsupervised approaches to text summarization[C]//Proceedings of the IEEEInternational Conference on Data Mining,2001:630-632.
[7]C Y Lin,E Hovy.From single to multi-document summarization:A prototype system and its evaluation[C]//Proceedings of the ACL,2002:457-464.
[8]Wen-tau Yih,Joshua Goodman,Lucy Vanderwende,et al.Multi-document summarization by maximizing informative content-words[C]//Proceedings of the International Joint Conference on Artificial Intelligence,2007:1776-1782.
[9]Belkebir Riadh,Guessoum Ahmed.A supervised approach to Arabic text summarization using Adaboost[J].Advances in Intelligent Systems and Computing,2015,353:227-236.
[10]Gupta Vishal,Kaur Narvinder.A novel hybrid text summarization system for Punjabi Text[J].Cognitive Computation,2016,8(2):261-277.
[11]Jae-Hoon Kim,Joon-Hong Kim,Dosam Hwang.Korean text summarization using an aggregate similarity[C]//Proceedings of the International Workshop on Information Retrieval with Asian Languages,2000:111-118.
[12]Nenkova A,Vanderwende L.The impact of frequency on summarization[R].Technical Report,MSR-TR-2005-101,2005.
[13]Sangwon Park,DongHyun Choi,Eun-kyung Kim,et al.A plug-in component-based Korean morphological analyzer[C]//Proceedings of HCLT 2010:2010,197-201.
[14]Hyoungil Jeong,Youngjoong Ko,Jungyun Seo.Efficient keyword extraction and text summarization for reading articles on smart phone[J].Computing and Informatics,2015,34(4):779-794.
[15]Jayashree R,Srikanta Murthy K,Sunny K.Keyword extraction based summarization of categorized Kannada Text documents[J].International Journal on Soft Computing,2011,2(4):81-93.
[16]Kamal Sarkar.Automatic single document text summarization using key concepts in documents[J].Journal of Information Processing Systems,2013,9(4):602-620.
[17]Lin Chin-Yew.ROUGE:A package for automatic evaluation of summaries[C]//Proceedings of the Workshop on Text Summarization,2004.