<tiger2/>: serialising the ISO SynAF syntactic object model
详细信息    查看全文
  • 作者:Laurent Romary (1) (2)
    Amir Zeldes (3)
    Florian Zipser (1) (2)

    1. Inria
    ; Paris ; France
    2. Institut f眉r Deutsche Sprache und Linguistik
    ; Humboldt-Universit盲t zu Berlin ; Berlin ; Germany
    3. Department of Linguistics
    ; Georgetown University ; Washington ; DC ; USA
  • 关键词:syntactic annotation ; XML format ; corpus ; corpora ; Treebank ; Tiger XML
  • 刊名:Language Resources and Evaluation
  • 出版年:2015
  • 出版时间:March 2015
  • 年:2015
  • 卷:49
  • 期:1
  • 页码:1-18
  • 全文大小:2,093 KB
  • 参考文献:1. Bies, A., Ferguson, M., Katz, K., & MacIntyre, R. (1995). / Bracketing guidelines for Treebank II style. Penn Treebank Project. CIS Technical Report MS-CIS-95-06.
    2. Bosch, S., Choi, K.-S., Villemonte De La Clergerie, E., Fang, A. C., Faass, G., Lee, K., et al. (2012). tiger2 as a standardized serialisation for ISO 24615鈥擲ynAF. In I. Hendrickx, S. K眉bler, & K. Simov (Eds.), / TLT11鈥?em class="a-plus-plus">11th international workshop on Treebanks and Linguistic Theories, Nov 2012, Lisbon, Portugal. Edi莽oes Colibri, pp. 37鈥?0.
    3. Burnard, L., & Bauman, S. (2008). / TEI P5: Guidelines for electronic text encoding and interchange. Manual. http://www.tei-c.org/Guidelines/P5/
    4. Dipper, S. (2005). XML-based stand-off representation and exploitation of multi-level linguistic annotation. In / Proceedings of Berliner XML Tage 2005 (BXML 2005), Berlin, Germany, pp. 39鈥?0.
    5. Haji膷, J., Panevov谩, J., Haji膷ov谩, E., Panevov谩, J., Sgall, P., Pajas, P., et al. (2006). / Prague dependency Treebank 2.0. Philadelphia: Linguistic Data Consortium.
    6. Ide, N., & Romary, L. (2003). Encoding syntactic annotation. In A. Abeill茅e (Ed.), / Treebanks: Building and using parsed corpora (pp. 281鈥?96). Dordrecht: Kluwer. CrossRef
    7. Ide, N., & Suderman, K. (2007). GrAF: A graph-based format for linguistic annotations. In / Proceedings of the linguistic annotation workshop 2007, Prague, pp. 1鈥?.
    8. Ide, N., & Suderman, K. (2014). The linguistic annotation framework: A standard for annotation interchange and merging. / Language Resources and Evaluation, / 8(3), 395鈥?18. CrossRef
    9. Ide, N., & V茅ronis, J. (1995). Encoding dictionaries. / Computers and the Humanities, / 29(2), 167鈥?79. CrossRef
    10. Krause, T., Ritz, J., Zeldes, A., & Zipser, F. (2011). Topological fields, constituents and coreference: A new multi-layer architecture for T眉Ba-D/Z. In H. Hedeland, T. Schmidt, & K. W枚rner (Eds.), / Multilingual resources and multilingual applications. Proceedings of GSCL 2011 (pp. 259鈥?62). Hamburg: Hamburger Zentrum f眉r Sprachkorpora.
    11. Langendoen, D. T., & Simons, G. F. (1995). A rationale for the TEI recommendations for feature-structure markup. / Computers and the Humanities, / 29(3), 191鈥?09. CrossRef
    12. Lee, K., Burnard, L., Romary, L., de la Clergerie, E., Declerck, T., Bauman, S., et al. (2004). Towards an international standard on feature structures representation. In / Proceedings of LREC 2004, Lisbon, Portugal, pp. 373鈥?76.
    13. Mengel, A., & Lezius, W. (2000). An XML-based encoding format for syntactically annotated corpora. In / Proceedings of the second international conference on language resources and engineering (LREC 2000), Athens, pp. 121鈥?26.
    14. Miller, J., & Mukerji, J. (Eds.). (2003). / MDA guide version 1.0.1. Object Management Group (OMG), Needham, MA.
    15. Maedche, A., & Staab, S. (2000). Discovering conceptual relations from text. In / Proceedings of ECAI 2000, pp. 321鈥?25.
    16. Pollard, C. J., & Sag, I. A. (1994). / Head-driven phrase structure grammar. Chicago: University of Chicago Press.
    17. Romary, L. (2001). An abstract model for the representation of multilingual terminological data: TMF鈥攖erminological markup framework. In / Proceedings of terminology in advanced management applications (TAMA) 2001. Antwerp, Belgium.
    18. Romary, L. (2013a). Standardization of the formal representation of lexical information for NLP. In R. Gouws, U. Heid, W. Schweickard, & H. Wiegand (Eds.), / Dictionaries. An International Encyclopedia of Lexicography. Supplementary volume: Recent developments with special focus on computational lexicography. Mouton de Gruyter.
    19. Romary, L. (2013b). TEI and LMF crosswalks. In S. Gradmann & F. Sasaki (Eds.), / Digital Humanities: Wissenschaft vom Verstehen. Humboldt Universit盲t zu Berlin, Berlin.
    20. Romary, L., & Ide, N. (2004). International standard for a linguistic annotation framework. / Natural Language Engineering, / 10(3鈥?), 211鈥?25.
    21. Romary, L., & Witt, A. (2012). Data formats for phonological corpora. In U. Gut (Ed.), / Handbook of corpus phonology. Oxford: Oxford University Press.
    22. Steinberg, D., Budinsky, F., Paternostro, M., & Merks, E. (2009). / EMF: Eclipse modeling framework 2.0. Upper Saddle River, NJ: Addison-Wesley.
    23. Telljohann, H., Hinrichs, E., & K眉bler, S. (2004). The T眉Ba-D/Z Treebank鈥攁nnotating German with a context-free backbone. In / Proceedings of the fourth international conference on language resources and evaluation (LREC 2004), Lisbon, Portugal, pp. 2229鈥?232.
    24. Telljohann, H., Hinrichs, E. W., K眉bler, S., Zinsmeister, H., & Beck, K. (2009). / Stylebook for the T眉bingen Treebank of Written German (T眉Ba- / D/Z). T眉bingen: Universit盲t T眉bingen, Seminar f眉r Sprachwissenschaft.
    25. Zeldes, A., Ritz, J., L眉deling, A., & Chiarcos, C. (2009). ANNIS: A search tool for multi-layer annotated corpora. In / Proceedings of corpus linguistics 2009, Liverpool, July 20鈥?3, 2009.
    26. Zipser, F. (2009). / Entwicklung eines Konverterframeworks f眉r linguistisch annotierte Daten auf Basis eines gemeinsamen (Meta- / )Modells. Diploma thesis, Humboldt-Universit盲t zu Berlin, Institut f眉r Informatik. http://hal.archives-ouvertes.fr/docs/00/60/61/02/PDF/Diplomarbeit_FZ_final.pdf
    27. Zipser, F., & Romary, L. (2010). A model oriented approach to the mapping of annotation formats using standards. In / Proceedings of the workshop on language resource and language technology standards, LREC 2010. Malta, pp. 7鈥?8.
    28. ISO/DIS 24611 Language resource management鈥擬orpho-syntactic annotation framework (MAF)
    29. ISO/DIS 24612 Language resource management鈥擫inguistic annotation framework (LAF)
    30. ISO 24615 Language resource management鈥擲yntactic annotation framework (SynAF)
    31. ISO 12620 Terminology and other language and content resources鈥擲pecification of data categories and management of a Data Category Registry for language resources; implemented in ISOcat.org
    32. ISO 24610-1. Language resource management鈥擣eature structures鈥擯art 1: Feature structure representation.
    33. ISO 24613 Language resource management鈥擫exical markup framework (LMF).
  • 刊物类别:Humanities, Social Sciences and Law
  • 刊物主题:Linguistics
    Computational Linguistics
    Computer Science, general
    Linguistics
    Languages and Literature
  • 出版者:Springer Netherlands
  • ISSN:1574-0218
文摘
This paper introduces , an XML format developed to serialise the object model defined by the ISO Syntactic Annotation Framework SynAF. Based on widespread best practices we adapt a popular XML format for syntactic annotation, TigerXML, with additional features to support a variety of syntactic phenomena including constituent and dependency structures, binding, and different node types such as compounds or empty elements. We also define interfaces to other formats and standards including the Morpho-syntactic Annotation Framework MAF and the ISOCat Data Category Registry. Finally a case study of the German Treebank TueBa-D/Z is presented, showcasing the handling of constituent structures, topological fields and coreference annotation in tandem.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700