Information structure in African languages: corpora and tools
详细信息    查看全文
  • 作者:Christian Chiarcos (1) chiarcos@uni-potsdam.de
    Ines Fiedler (2) ines.fiedler@rz.hu-berlin.de
    Mira Grubic (1) grubic@uni-potsdam.de
    Katharina Hartmann (2) k.hartmann@rz.hu-berlin.de
    Julia Ritz (1) jritz@uni-potsdam.de
    Anne Schwarz (3) anne.schwarz@jcu.edu.au
    Amir Zeldes (2) amir.zeldes@rz.hu-berlin.de
    Malte Zimmermann (1) mazimmer@uni-potsdam.de
  • 关键词:African language resources – ; Pragmatics – ; Corpus search infrastructure
  • 刊名:Language Resources and Evaluation
  • 出版年:2011
  • 出版时间:September 2011
  • 年:2011
  • 卷:45
  • 期:3
  • 页码:361-374
  • 全文大小:359.1 KB
  • 参考文献:1. Brants, T., & Plaehn, O. (2000). Interactive corpus annotation. In Proceedings of the second international conference on language resources and evaluation (LREC-2000) (pp. 453–459). Athens, Greece.
    2. Busemann, A., & Busemann, K. (2008). Toolbox self-training. tech. rep., Summer Institute of Linguistics (SIL). http://www.sil.org/ (Version 1.5.4 Oct 2008).
    3. Chafe, W. L. (1976). Givenness, contrastiveness, definiteness, subjects, topics and point of view. In C. N. Li (Ed.) Subject and topic (pp. 27–55). Academic Press, New York.
    4. Chiarcos, C., Dipper, S., G枚tze, M., Leser, U., L眉deling, A., Ritz, J., & Stede, M. (2008). A flexible framework for integrating annotations from different tools and tag sets. Traitement Automatique des Langues, 49(2), 271–293.
    5. Crysmann, B. (2009). Autosegmental representations in an HPSG of Hausa. In Proceedings of the ACL-IJCNLP workshop on grammar engineering across frameworks (GEAF 2009) (pp. 28–36). Singapore.
    6. Dipper, S. (2005). XML-based Stand-off representation and exploitation of multi-level linguistic annotation. In R. Eckstein & R. Tolksdorf (Eds.), Proceedings of Berliner XML tage (pp. 39–50).
    7. Dipper, S., & G枚tze, M. (2005). Accessing heterogeneous linguistic data—generic XML-based representation and flexible visualization. In Proceedings of the 2nd language and technology conference 2005 (pp. 23–30). Poznan, Poland.
    8. Dipper, S., G枚tze, M., & Skopeteas, S. (Eds.) (2007). Information structure in cross-linguistic corpora: Annotation guidelines for phonology, morphology, syntax, semantics, and information structure. Interdisciplinary Studies on Information Structure 7. Potsdam: Universit盲tsverlag Potsdam.
    9. Fiedler, I. (2009). Contrastive topic marking in Gbe. In Current issues in unity and diversity of languages. Collection of papers selected from the CIL 18 (pp. 295–308). Seoul: The Linguistic Society of Korea.
    10. Fiedler, I., Hartmann, K., Reineke, B., Schwarz, A., & Zimmermann, M. (2010). Subject Focus in West African Languages. In M. Zimmermann & C. F茅ry (Eds.), Information structure theoretical, typological, and experimental perspectives (pp. 234–257). Oxford: Oxford University Press.
    11. Green, M., & Jaggar, P. (2003). Ex-situ and in-situ focus in Hausa: syntax, semantics and discourse. In J. Lecarme (Ed.), Research in Afroasiatic grammar 2 (current issues in linguistic theory) (pp. 187–213). Amsterdam: John Benjamins.
    12. Hartmann, K., & Zimmermann, M. (2007a). Focus strategies in Chadic: The case of tangale revisited. Studia Linguistica, 61(2), 95–129.
    13. Hartmann, K., & Zimmermann, M. (2007b). In place—Out of place? Focus in Hausa. In K. Schwabe & S. Winkler (Eds.), On information structure, meaning and form: Generalizing across languages (pp. 365–403). Benjamins: Amsterdam.
    14. Hartmann, K., & Zimmermann, M. (2009). Morphological focus marking in G霉r霉nt霉m (West Chadic). Lingua, 119(9), 1340–1365.
    15. Hellwig, B., Van Uytvanck, D., & Hulsbosch, M. (2008). ELAN Linguistic annotator. Tech. rep., Max Planck Institute. http://www.lat-mpi.eu/tools/elan/ (June 13, 2011).
    16. Hyman, L., & Watters, J. (1984). Auxiliary focus. Studies in African Linguistics, 15, 233–273.
    17. Krifka, M. (2008). Basic notions of information structure. Acta Linguistica Hungarica, 55, 243–76.
    18. M眉ller, C., & Strube, M. (2006). Multi-level annotation of linguistic data with MMAX2. In S. Braun, K. Kohn, & J. Mukherjee (Eds.), Corpus technology and language pedagogy: New resources, new tools, new methods (pp. 197–214). Frankfurt: Peter Lang.
    19. Newman, P. (2000). The Hausa language. An encyclopedic reference grammar. Interdisciplinary studies on information structure 4. New Haven: Yale University Press.
    20. O’Donnell, M. (2000). RSTTool 2.4—A markup tool for rhetorical structure theory. In Proceedings of the international natural language generation conference (INLG’2000) (pp. 253–256). Mitzpe Ramon, Israel.
    21. Orasan, C. (2003). PALinkA: a highly customisable tool for discourse annotation. In Proceedings of the 4th SIGdial workshop on discourse and dialogue (pp. 39–43). Sapporo, Japan.
    22. Randell, R., Bature, A., & Schuh, R. (1998). Hausar Baka. http://www.humnet.ucla.edu/humnet/aflang/hausarbaka/ (June 13, 2011).
    23. Schmidt, T. (2004). Transcribing and annotating spoken language with EXMARaLDA. In Proceedings of the LREC-workshop on XML based richly annotated corpora, Lisbon 2004 (pp. 69–74). Paris: ELRA.
    24. Schwarz, A. (2010). Verb-and-predication focus markers in Gur. In I. Fiedler & A. Schwarz (Eds.) The expression of information structure. A documentation of its diversity across Africa. (Typological Studies in Language 91) (pp. 287–314). Amsterdam Philadelphia: John Benjamins.
    25. Schwarz, A., & Fiedler, I. (2007). Narrative focus strategies in Gur and Kwa. In E. Aboh, K. Hartmann, & M. Zimmermann (Eds.), Focus strategies in African languages. The interaction of focus and grammar in Niger-Congo and Afro-Asiatic(pp. 267–286). Berlin: Mouton de Gruyter.
    26. Skopeteas, S., Fiedler, I., Hellmuth, S., Schwarz, A., Stoel, R., Fanselow, G., F茅ry, C., & Krifka, M. (2006). Questionnaire on information structure (QUIS). Interdisciplinary studies on information structure 4. Potsdam: Universit盲tsverlag Potsdam.
    27. Witten, I. H., & Frank, E. (2005). Data mining: Practical machine learning tools and techniques (2nd edn). San Francisco: Morgan Kaufman.
    28. Zeldes, A., Ritz, J., L眉deling, A., & Chiarcos, C. (2009). A search tool for multi-layer annotated corpora. In Proceedings of corpus linguistics 2009. Liverpool, UK.
    29. Zimmermann, M. (2008). Contrastive focus and emphasis. Acta Linguistica Hungarica, 55, 347–360.
    30. Zipser, F., & Romary, L. (2010). A model oriented approach to the mapping of annotation formats using standards. In Proceedings of the workshop on language resource and language technology standards, LREC 2010 (pp. 7–18). Malta.
  • 作者单位:1. Universit盲t Potsdam, Karl-Liebknecht-Str. 24-25, 14476 Potsdam, Germany2. Humboldt-Universit盲t zu Berlin, Unter den Linden 6, 10099 Berlin, Germany3. The Cairns Institute / James Cook University, PO Box 6811, Cairns, QLD 4870, Australia
  • 刊物类别:Humanities, Social Sciences and Law
  • 刊物主题:Linguistics
    Computational Linguistics
    Computer Science, general
    Linguistics
    Languages and Literature
  • 出版者:Springer Netherlands
  • ISSN:1574-0218
文摘
In this paper, we describe tools and resources for the study of African languages developed at the Collaborative Research Centre 632 “Information Structure”. These include deeply annotated data collections of 25 sub-Saharan languages that are described together with their annotation scheme, as well as the corpus tool ANNIS, which provides unified access to a broad variety of annotations created with a range of different tools. With the application of ANNIS to several African data collections, we illustrate its suitability for the purpose of language documentation, distributed access, and the creation of data archives.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700