Review Authorship Attribution in a Similarity Space
详细信息    查看全文
  • 作者:Tie-Yun Qian (1)
    Bing Liu (2)
    Qing Li (3)
    Jianfeng Si (4)

    1. State Key Laboratory of Software Engineering
    ; Wuhan University ; Wuhan ; 430072 ; China
    2. Department of Computer Science
    ; University of Illinois at Chicago ; Chicago ; 60607 ; U.S.A
    3. Multimedia Software Engineering Research Centre and Department of Computer Science
    ; City University of Hong Kong ; Hong Kong ; China
    4. Data Analytics Department
    ; Institute for Infocomm Research ; Singapore ; 138632 ; Singapore
  • 关键词:authorship attribution ; supervised learning ; similarity space
  • 刊名:Journal of Computer Science and Technology
  • 出版年:2015
  • 出版时间:January 2015
  • 年:2015
  • 卷:30
  • 期:1
  • 页码:200-213
  • 全文大小:708 KB
  • 参考文献:1. Grieve J. Quantitative authorship attribution: An evaluation of techniques. / Literary and Linguistic Computing, 2007, 22(3): 251-270. CrossRef
    2. Baayen H, van Halteren H, Tweedie F. Outside the cave of shadows: Using syntactic annotation to enhance authorship attribution. / Literary and Linguistic Computing, 1996, 11(3): 121-132. CrossRef
    3. Argamon S, Whitelaw C, Chase P, Hota S R, Garg N, Levitan S. Stylistic text classification using functional lexical features: Research articles. / Journal of the Association for Information Science and Technology, 2007, 58(6): 802-822. CrossRef
    4. Hedegaard S, Simonsen J G. Lost in translation: Authorship attribution using frame semantics. In / Proc. the 49th ACL, June 2011, pp. 65-70.
    5. Hirst G, Feiguina O. Bigrams of syntactic labels for authorship discrimination of short texts. / Literary and Linguistic Computing, 2007, 22(4): 405-417. CrossRef
    6. Holmes D I, Forsyth R S. The federalist revisited: New directions in authorship attribution. / Literary and Linguistic Computing, 1995, 10(2): 111-127. CrossRef
    7. Koppel M, Schler J. Authorship verification as a one-class classification problem. In / Proc. the 21st ICML, July 2004.
    8. Diederich J, Kindermann J, Leopold E, Paass G. Authorship attribution with support vector machines. / Applied Intelligence, 2000, 19(1/2): 109-123. 4908771" target="_blank" title="It opens in new window">CrossRef
    9. Escalante H J, Solorio T, Montes-y-G贸mez M. Local histograms of character / n-grams for authorship attribution. In / Proc. the 49th ACL, June 2011, pp. 288-298.
    10. Li J, Zheng R, Chen H. From fingerprint to writeprint. / Communications of the ACM, 2006, 49(4): 76-82. 45/1121949.1121951" target="_blank" title="It opens in new window">CrossRef
    11. Stamatatos E, Fakotakis N, Kokkinakis G. Automatic text categorization in terms of genre and author. / Computational Linguistics, 2000, 26(3): 471-495. CrossRef
    12. Graham N, Hirst G, Marthi B. Segmenting documents by stylistic character. / Natural Language Engineering, 2005, 11(4): 397-415.
    13. Seroussi Y, Bohnert F, Zukerman I. Authorship attribution with author-aware topic models. In / Proc. the 50th ACL, July 2012, pp. 264-269.
    14. de Vel O, Anderson A, Corney M, Mohay G. Mining e-mail content for author identification forensics. / ACM SIGMOD Record, 2001, 30(4): 55-64. 45/604264.604272" target="_blank" title="It opens in new window">CrossRef
    15. Koppel M, Schler J, Argamon S. Authorship attribution in the wild. / Language Resources and Evaluation, 2011, 45(1): 83-94. CrossRef
    16. Solorio T, Pillay S, Raghavan S, y G贸mez M M. Modality specific meta features for authorship attribution in Web forum posts. In / Proc. the 5th IJCNLP, Nov. 2011, pp. 156-164.
    17. Kim S, Kim H, Weninger T, Han J, Kim H D. Authorship classification: A discriminative syntactic tree mining approach. In / Proc. the 34th SIGIR, July 2011, pp. 455-464.
    18. Jindal N, Liu B. Opinion spam and analysis. In / Proc. WSDM, Feb. 2008, pp. 219-230.
    19. Rudin C. The p-norm push: A simple convex ranking algorithm that concentrates at the top of the list. / The Journal of Machine Learning Research, 2009, 10: 2233-2271.
    20. Yih W, Meek C. Improving similarity measures for short segments of text. In / Proc. AAAI, Nov. 2007, pp. 1489-1494.
    21. Agichtein E, Brill E, Dumais S T, Ragno R. Learning user interaction models for predicting web search result preferences. In / Proc. the 29th SIGIR, Aug. 2006, pp. 3-10.
    22. Mosteller F, Wallace D L. Inference and Disputed Authorship: The Federalist. Addison-Wesley, 1964.
    23. Argamon S, Levitan S. Measuring the usefulness of function words for authorship attribution. In / Proc. the 2005 ACH/ALLC Conference, June 2005.
    24. Gamon M. Linguistic correlates of style: Authorship classification with deep linguistic analysis features. In / Proc. the 20th COLING, Aug. 2004, Article No. 611.
    25. Peng F, Schuurmans D, Wang S, Keselj V. Language independent authorship attribution using character level language models. In / Proc. EACL, April 2003, pp. 267-274.
    26. Burrows J F. Not unless you ask nicely: The interpretative nexus between analysis and information. / Literary and Linguistic Computing, 1992, 7(2): 91-109. CrossRef
    27. Sanderson C, Guenter S. Short text authorship attribution via sequence kernels, Markov chains and author unmasking: An investigation. In / Proc. EMNLP, July 2006, pp. 482-491.
    28. Madigan D, Genkin A, Lewis D, Argamon S, Fradkin D, Ye L. Author identification on the large scale. In / Proc. CSNA, June 2005.
    29. Cao Y, Xu J, Liu T, Li H, Huang Y, Hon H. Adapting ranking SVM to document retrieval. In / Proc. the 29th SIGIR, Oct. 2006, pp. 186-193.
    30. Stamatatos E. A survey of modern authorship attribution methods. / Journal of the Association for Information Science and Technology, Aug. 2009, 60(3): 538-556.
    31. Hoover D L. Statistical stylistics and authorship attribution: An empirical investigation. / Literary and Linguistic Computing, 2001, 16(4): 421-444. 4.421" target="_blank" title="It opens in new window">CrossRef
    32. Zheng R, Li J, Chen H, Huang Z. A framework for authorship identification of online messages: Writing style features and classification techniques. / Journal of the Association for Information Science and Technology, 2006, 57(3): 378-393. CrossRef
    33. Uzuner 脰, Katz B. A comparative study of language models for book and author recognition. In / Proc. the 2nd IJCNLP, Oct. 2005, pp. 969-980.
    34. Zhao Y, Zobel J. Effective and scalable authorship attribution using function words. In / Proc. the 2nd Asia Information Retrieval Symposium, Oct. 2005, pp. 174-189.
    35. Luyckx K, Daelemans W. Authorship attribution and verification with many authors and limited data. In / Proc. the 22nd COLING, Aug. 2008, pp. 513-520.
    36. Vapnik V N. Statistical Learning Theory. Wiley-Interscience, 1998.
    37. Graepely T, Herbrichz R, Bollmann-Sdorraz P, Obermayery K. Classification on pairwise proximity data. In / Proc. NIPS, Jan. 1999, pp. 438-444.
    38. Chen Y, Garcia E K, Gupta M R, Rahimi A, Cazzanti L. Similarity-based classification: Concepts and algorithms. / The Journal of Machine Learning Research, 2009, 10: 747-776.
    39. Pezkalska E, Duin R P W. Dissimilarity representations allow for building good classifiers. / Pattern Recognition Letters, 2002, 23(8): 943-956. 4-7" target="_blank" title="It opens in new window">CrossRef
    40. Liao L, Noble W S. Combining pairwise sequence similarity and support vector machines for remote protein homology detection. In / Proc. the 6th RECOMB, April 2002, pp. 225-232.
    41. Wang L, Yang C, Feng J. On learning with dissimilarity functions. In / Proc. the 24th ICML, June 2007, pp. 991-998.
    42. Balcan M F, Blum A, Srebro N. A theory of learning with similarity functions. / Machine Learning, 2008, 72(1/2): 89-112. 4-008-5059-5" target="_blank" title="It opens in new window">CrossRef
    43. Kar P, Jain P. Similarity-based learning via data driven embeddings. In / Proc. the 25th NIPS, Dec. 2011.
    44. Yule G U. The Statistical Study of Literary Vocabulary. Cambridge University Press, 1944.
    45. Metzler D, Bernstein Y, Croft W B, Moffat A, Zobel J. Similarity measures for tracking information flow. In / Proc. the 14th CIKM, Oct. 2005, pp. 517-524.
    46. Joachims T. Training linear SVMs in linear time. In / Proc. the 12th KDD, Aug. 2006, pp. 217-226.
    47. Klein D, Manning C D. Accurate unlexicalized parsing. In / Proc. the 41st ACL, July 2003, pp. 423-430.
  • 刊物类别:Computer Science
  • 刊物主题:Computer Science, general
    Software Engineering
    Theory of Computation
    Data Structures, Cryptology and Information Theory
    Artificial Intelligence and Robotics
    Information Systems Applications and The Internet
    Chinese Library of Science
  • 出版者:Springer Boston
  • ISSN:1860-4749
文摘
Authorship attribution, also known as authorship classification, is the problem of identifying the authors (reviewers) of a set of documents (reviews). The common approach is to build a classifier using supervised learning. This approach has several issues which hurts its applicability. First, supervised learning needs a large set of documents from each author to serve as the training data. This can be difficult in practice. For example, in the online review domain, most reviewers (authors) only write a few reviews, which are not enough to serve as the training data. Second, the learned classifier cannot be applied to authors whose documents have not been used in training. In this article, we propose a novel solution to deal with the two problems. The core idea is that instead of learning in the original document space, we transform it to a similarity space. In the similarity space, the learning is able to naturally tackle the issues. Our experiment results based on online reviews and reviewers show that the proposed method outperforms the state-of-the-art supervised and unsupervised baseline methods significantly.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700