Text Classification in the Domain of Applied Linguistics as Part of a Pre-editing Module for Machine Translation Systems
详细信息    查看全文
  • 关键词:Machine translation (MT) ; Automatic pre ; editing ; Domain adaptation ; Document classification ; TF ; IDF term weighting ; Vector space model ; Cosine similarity
  • 刊名:Lecture Notes in Computer Science
  • 出版年:2016
  • 出版时间:2016
  • 年:2016
  • 卷:9811
  • 期:1
  • 页码:691-698
  • 全文大小:947 KB
  • 参考文献:1.Albitar, S., Fournier, S., Espinasse, B.: An effective TF/IDF-based text-to-text semantic similarity measure for text classification. In: Benatallah, B., Bestavros, A., Manolopoulos, Y., Vakali, A., Zhang, Y. (eds.) WISE 2014, Part I. LNCS, vol. 8786, pp. 105–114. Springer, Heidelberg (2014)
    2.Arhivy foruma “Govorim po-russki”. http://​www.​speakrus.​ru/​dict/​
    3.Manning, C., Raghavan, P., Schtze, H.: An Introduction to Information Retrieval, pp. 109–134. Cambridge University Press, New York (2009)
    4.“Computational Linguistics and Intellectual Technologies” journal. http://​www.​dialog-21.​ru/​digest/​
    5.Google Translate. https://​translate.​google.​ru/​?​hl=​ru
    6.Kim, H.K., Kim, M.: Model-induced term-weighting schemes for text classification. Appl. Intell. 6, 1–14 (2016). Springer, New York
    7.Lenta.ru. https://​lenta.​ru/​
    8.Potapova, R.K.: Rech: kommunikatsiya, informatsiya, kibernetika. Knizhnyiy dom “Librokom”, Moskva (2010). (in Russ.)
    9.Potapova, R., Oskina, K.: Semantic multilingual differences of terminological definitions regarding the concept “Artificial Intelligence”. In: Ronzhin, A., Potapova, R., Fakotakis, N. (eds.) SPECOM 2015. LNCS, vol. 9319, pp. 356–363. Springer, Heidelberg (2015)CrossRef
    10.“Speech Technology” journal. http://​speechtechnology​.​ru/​
    11.The Perl Programming Language. https://​www.​perl.​org/​
    12.Yoo, J.Y., Yang, D.: Classification scheme of unstructured text document using TF-IDF and naive bayes classifier. In: COMCOMS 2015. ASTL, vol. 111, pp. 263–266. SERSC, Tasmania (2015)
    13.Yun-tao, Z., Ling, G., Yong-cheng, W.: An improved TF-IDF approach for text classification. J. Zhejilang Univ. SCI. 6(1), 49–55 (2005). Springer, ZhejilangCrossRef MATH
  • 作者单位:Ksenia Oskina (16)

    16. Institute of Applied and Mathematical Linguistics, Moscow State Linguistic University, Moscow, Russia
  • 丛书名:Speech and Computer
  • ISBN:978-3-319-43958-7
  • 刊物类别:Computer Science
  • 刊物主题:Artificial Intelligence and Robotics
    Computer Communication Networks
    Software Engineering
    Data Encryption
    Database Management
    Computation by Abstract Devices
    Algorithm Analysis and Problem Complexity
  • 出版者:Springer Berlin / Heidelberg
  • ISSN:1611-3349
  • 卷排序:9811
文摘
This article describes the method of document classification on the basis of a vector space model with regard to the domain of Applied Linguistics for Russian. This method makes it possible to classify input text data in two different categories: applied linguistics texts (AL) and non-applied linguistics texts (nonAL). The proposed method is implemented using the statistical measure of TF-IDF and the evaluation measure of cosine similarity. The study gives promising results and opens up further prospects for the application of this approach to text classification in other languages.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700