利用整数线性规划自动抽取多样性关键短语

英文篇名：Automatic Extraction of Diversity Keyphrase by Utilizing Integer Liner Programming
作者：李珊珊 ; 陈黎 ; 唐裕婷 ; 王艺霖 ; 于中华
英文作者：LI Shan-shan;CHEN Li;TANG Yu-ting;WANG Yi-lin;YU Zhong-hua;College of Computer Science,Sichuan University;
关键词：关键短语自动抽取 ; 整数线性规划 ; 语义过度生成 ; 多样性
英文关键词：Automatic keyphrase extraction;;Integer liner programming;;Semantic over-generation;;Diversity
中文刊名：JSJA
英文刊名：Computer Science
机构：四川大学计算机学院;
出版日期：2019-06-15
出版单位：计算机科学
年：2019
期：v.46
基金：四川省科技支撑项目(2014GZ0063);; 四川省重点研发项目(2018GZ0182)资助
语种：中文;
页：JSJA2019S1011
页数：5
CN：S1
ISSN：50-1075/TP
分类号：66-69+80

摘要

关键短语是文本信息的精简概括,能够代表文本的主题和核心观点。而关键短语的自动抽取更是自然语言处理和信息检索的重要任务之一。针对目前无监督方法自动抽取关键短语存在过度生成候选短语语义的问题,提出了一种将整数线性规划和短语语义相似度相结合的自动抽取算法。通过惩罚语义相似度高的候选短语实现目标函数的最大化,以此形成多样性的关键短语。实验利用TextRank和TFIDF算法在两种不同的语料集中分别产生候选短语,并利用提出的优化算法对候选短语的权值得分进行优化。最后将所提算法产生的优化结果与现有多个算法的结果进行了比较。实验结果表明,通过加入相似性度量的惩罚能够有效解决语义过度问题,并获取更多样的关键短语,其优化结果的P,R和F值均高于其他算法。
Keyphrases are the concise summary of text information,which can represent the main topics and the core ideas of texts.And the automatic extraction of key phrases is one of the important tasks for natural language processing and information retrieval.Aiming at the existing problem caused by semantic over-generation on candidate phrases with unsupervised method,this paper proposed an algorithm for automatic extraction of keyphrase by using integer linear programming(ILP) and similarity of candidate phrases,in which candidate phrases with high sematic similarity are punished for maximizing the object function to obtain diversified keyphrases.TextRand and TFIDF algorithms are applied in the proposed method to create candidate phrases based on two different corpus sets and the proposed optimization algorithm is utilized to optimize the weight scores of candidate phrases.Finally,the results of the proposed optimization algorithm is compared with the ones of baseline methods,and the experimental results show that the proposed method can solve the semantic over-generation problem effectively by punishing candidate phrases with high semantic similarity.Moreover,the optimization algorithm can obtain more diverse keyphrases and the optimized results of P,R and F value outperform the ones of baseline methods.

引文

[1] BARKER K,CORNACCHIA N.Using Noun Phrase Heads to Extract Document Keyphrases[C]//Advances in Artificial Intelligence.Springer Berlin Heidelberg,2000:40-52.
    [2] EKMAN P.An argument for basic emotions[J].Cogition and emotion,1992,6(3-4):169-200.
    [3] CARAGEA C,BULGAROV F A,GODEA A,et al.Citation-Enhanced Keyphrase Extraction from Research Papers:A Supervised Approach[C]//Conference on Empirical Methods in Natural Language Processing.2014:1435-1446.
    [4] MIHALCEA R,TARAU P.TextRank:Bringing Order into Texts[M].Emnlp,2004:404-411.
    [5] WAN X,XIAO J.CollabRank:Towards a Collaborative Ap- proach to SingleDocumentKeyphrase Extraction[C]//Proceedings of the Conference International Conference on Computational Linguistics,COLING 2008.Manchester,Uk.DBLP,2008:969-976.
    [6] LIU Z,LIANG C,SUN M.Topical Word Trigger Model for Keyphrase Extraction[C]//COLING.2012:1715-1730.
    [7] NGUYEN T D,KAN M Y.Keyphrase extraction in scientific publications[C]//International Conference on Asian Digital Libraries:Looking Back 10 Years and Forging New Frontiers.Springer-Verlag,2007:317-326.
    [8] TOMOKIYO T,HURST M.A language model approach to keyphrase extraction[C]//ACL 2003 Workshop on Multiword Expressions:Analysis,Acquisition and Treatment.Association for Computational Linguistics,2003:33-40.
    [9] HASAN K S,NG V.Automatic Keyphrase Extraction:A Survey of the State of the Art[C]//Meeting of the Association for Computational Linguistics.2014:1262-1273.
    [10] WITTEN I H,PAYNTER G W,FRANK E,et al.KEA:practical automatic keyphrase extraction[C]//ACM Conference on Digital Libraries.Berkeley,CA,USA.CiteSeer,1999:254-255.
    [11] TURNEY P D.Coherent keyphrase extraction via Web mining [C]//International Joint Conference on Artificial Intelligence.Morgan Kaufmann Publishers Inc,2003:434-439.
    [12] FLORESCU C,CARAGEA C.A Position-Biased PageRank Algorithm for Keyphrase Extraction[C]//Proceedings of the American Association for Artificial Intelligence.2017.
    [13] HASAN K S,NG V.Automatic Keyphrase Extraction:A Survey of the State of the Art[C]//Meeting of the Association for Computational Linguistics.2014:1262-1273.
    [14] BOUDIN F.Reducing Over-generation Errors for Automatic Keyphrase Extraction using Integer Linear Programming[C]//ACL 2015 Workshop on Novel Computational Approaches to Keyphrase Extraction.China,2015.
    [15] HULTH A.Improved automatic keyword extraction given more linguistic knowledge[C]//Conference on Empirical Methods in Natural Language Processing.Association for Computational Linguistics,2003:216-223.
    [16] KIM S N,MEDELYANO,KAN M Y,et al.SemEval-2010 task 5:Automatic keyphrase extraction from scientific articles[C]//International Workshop on Semantic Evaluation.Association for Computational Linguistics,2010:21-26.
    [17] LE T T N,NGUYEN M L,SHIMAZU A.Unsupervised Keyphrase Extraction:Introducing New Kinds of Words to Keyphrases[C]//Australasian Joint Conference on Artificial Intelligence.Springer International Publishing,2016:665-671.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700