摘要
本文提出了基于句子重要度的累积贡献率摘要句筛选算法和改进的TextRank双层单文档摘要提取算法﹒摘要提取算法采用了分层结构,在不同层上融合了基于句子重要度的累积贡献率摘要句筛选算法,同时使用了长句和短句两种不同分割方式相结合的策略来构建摘要提取算法﹒用手工整理的中文单文档摘要数据集验证了算法的性能,结果表明:提取的摘要质量非常好﹒
A summation sentence selection algorithm based on accumulating contribution rate of sentence importance and an improved TextRank double layers single-document summation extraction algorithm are proposed in this paper. The summation extraction algorithm adopts the hierarchical structure, on the different layer, the summation sentence selection algorithm based on accumulating contribution rate of sentence importance is blended, at the same time, using long sentences and short sentences in two different ways to construct summation extraction algorithm. The manual finishing Chinese single-document summation data set is used to verify the performance of the algorithm, the results show that the quality of the extraction summation is very fine.
引文
[1]LUHN H P.The automatic creation of literature abstracts[J].IBM Journal of Research and Development,1958,2(2):159-165.
[2]WANG Y C.Automatic extraction of words from Chinese textual data[J].Journal of Computer Science and Technology,1987,2(4):287-291.
[3]曹洋.基于Text Rank算法的单文档自动文摘研究[D].江苏:南京大学,2016.
[4]MIHALCEA R,TARAU P.Textrank:Bringing order into texts[C].Conference on Empirical Methods in Natural Language Processing,EMNLP 2004,2004:404-411.
[5]PAGE L,BRIN S,MOTWANI R,et al.The Page Rank citation ranking:Bringing order to the web[J].Stanford Digital Libraries Working Paper,1998,9(1):1-17.
[6]余珊珊,苏锦钿,李鹏飞.基于改进的Text Rank的自动摘要提取方法[J].计算机科学,2016,43(6):240-247.
[7]AIZAWA A.An information-theoretic perspective of TF–IDF measures[J].Information Processing&Management,2003,39(1):45-65.
[8]HU B T,CHEN Q C,ZHU F Z.LCSTS:A large scale Chinese short text summarization dataset[J].Computer Science,2015,1967-1972.
[9]EDMUNDSON H P.New methods in automatic extracting[J].Journal of the Acm,1969,16(2):264-285.
[10]秦玉平,邱凤凤,冷强奎.组合凸线器和Hadamard纠错码相结合的多类文本分类算法[J].渤海大学学报:自然科学版,2017,38(1):71-75.