A Frame-Based Approach for Reference Metadata Extraction
详细信息    查看全文
  • 作者:Yu-Lun Hsieh (21)
    Shih-Hung Liu (21)
    Ting-Hao Yang (21)
    Yu-Hsuan Chen (21)
    Yung-Chun Chang (21)
    Gladys Hsieh (21)
    Cheng-Wei Shih (21)
    Chun-Hung Lu (22)
    Wen-Lian Hsu (21)
  • 关键词:Reference Metadata Extraction ; Knowledge representation ; Frame ; based approach
  • 刊名:Lecture Notes in Computer Science
  • 出版年:2014
  • 出版时间:2014
  • 年:2014
  • 卷:8916
  • 期:1
  • 页码:154-163
  • 全文大小:197 KB
  • 参考文献:1. Agichtein, E., Ganti, V.: Mining reference tables for automatic text segmentation. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 20鈥?9 (2004)
    2. Chen, C.C., Yang, K.H., Chen, C.L., Ho, J.M.: BibPro: A citation parser based on sequence alignment. IEEE Transactions on Knowledge and Data Engineering聽24(2), 236鈥?50 (2012) CrossRef
    3. Chowdhury, G.: Template mining for information extraction from digital documents. Library Trends聽48, 182鈥?08 (1999)
    4. Cortez, E., da Silva, A.S., Goncalves, M.A., Mesquita, F., de Moura, E.S.: FLUX-CiM: Flexible unsupervised extraction of citation metadata. In: Proceedings of the Seventh ACM/IEEE-CS Joint Conf. Digital Libraries, pp. 215鈥?24 (2007)
    5. Day, M.Y., Tsai, T.H., Sung, C.L., Hsieh, C.C., Lee, C.W., Wu, S.H., Wu, K.P., Ong, C.S., Hsu, W.L.: Reference metadata extraction using a hierarchical knowledge representation framework. Decision Support Systems聽43, 152鈥?67 (2007) CrossRef
    6. Ding, Y., Chowdhury, G., Foo, S.: Template mining for the extraction of citation from digital documents. In: Proceedings of the Second Asian Digital Library Conference, pp. 47鈥?2 (1999)
    7. Giles, C.L., Bollacker, K.D., Lawrence, S.: CiteSeer: An automatic citation indexing system. In: Proceedings of the Third ACM Conference on Digital Libraries, pp. 89鈥?8 (1998)
    8. Han, H.C., Giles, L., Manavoglu, E., Zha, H., Zhang, Z., Fox, E.A.: Automatic document metadata extraction using support vector machines. In: Proceedings of the Third ACM/IEEE-CS Joint Conference on Digital libraries, pp. 37鈥?8 (2003)
    9. Mitchell, T.M.: Machine Learning. McGraw-Hill, Inc. (1997)
    10. Peng, F., McCallum, A.: Accurate information extraction from research papers using conditional random fields. In: Proceedings of the Human Language Technology Conference and North American Chapter of the Association for Computational Linguistics (HLT-NAACL), pp. 320鈥?36 (2004)
    11. Seymore, K., McCallum, A., Rosenfeld, R.: Learning hidden markov model structure for information extraction. In: Proceedings of the AAAI-99 Workshop on Machine Learning for Information Extraction, pp. 37鈥?2 (1999)
    12. Wu, S.H., Tsai, T.H., Hsu, W.L.: Domain event extraction and representation with domain ontology. In: Proceedings of the IJCAI 2003 Workshop on Information Integration on the Web, Acapulco, Mexico (2003)
  • 作者单位:Yu-Lun Hsieh (21)
    Shih-Hung Liu (21)
    Ting-Hao Yang (21)
    Yu-Hsuan Chen (21)
    Yung-Chun Chang (21)
    Gladys Hsieh (21)
    Cheng-Wei Shih (21)
    Chun-Hung Lu (22)
    Wen-Lian Hsu (21)

    21. Institute of Information Science, Academia Sinica, Taipei, Taiwan
    22. Innovative Digitech-Enabled Applications & Services Institute, III, Taiwan
  • ISSN:1611-3349
文摘
In this paper, we propose a novel frame-based approach (FBA) and use reference metadata extraction as a case study to demonstrate its advantages. The main contributions of this research are three-fold. First, the new frame matching algorithm, based on sequence alignment, can compensate for the shortcomings of traditional rule-based approach, in which rule matching lacks flexibility and generality. Second, an approximate matching is adopted for capturing reasonable abbreviations or errors in the input reference string to further increase the coverage of the frames. Third, experiments conducted on extensive datasets show that the same knowledge framework performed equally well on various untrained domains. Comparing to a widely-used machine learning method, Conditional Random Fields (CRFs), the FBA can drastically reduce the average field error rate across all four independent test sets by 70% (2.24% vs. 7.54%).

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700