CityU corpus of essay drafts of English language learners: a corpus of textual revision in second language writing
详细信息    查看全文
  • 作者:John Lee ; Chak Yan Yeung ; Amir Zeldes ; Marc Reznicek…
  • 关键词:Learner corpus ; Textual revision ; Feedback ; English as a second language ; Multi ; layer corpus annotation ; Corpus search and visualization
  • 刊名:Language Resources and Evaluation
  • 出版年:2015
  • 出版时间:September 2015
  • 年:2015
  • 卷:49
  • 期:3
  • 页码:659-683
  • 全文大小:1,538 KB
  • 参考文献:Andreu Andr茅s, M. A., Guardiola, A. A., Matarredona, M. B., MacDonald, P., Fleta, B. M., & P茅rez Sabater, C. (2010). Analysing EFL learner output in the MiLC Project: An error * it鈥檚, but which tag? In M. C. Campoy-Cubillo, B. Bell茅s-Fortu帽o, & M. L. Gea-Valor (Eds.), Corpus-based approaches to English language teaching (pp. 167鈥?79). London: Continuum.
    Ashwell, T. (2000). Patterns of teacher response to student writing in a multiple-draft composition classroom: Is content feedback followed by form feedback the best method? Journal of Second Language Writing, 9(3), 227鈥?57.View Article
    Barzilay, R., & Elhadad, N. (2003). Sentence Alignment for Monolingual Comparable Corpora. In Proceedings of the 2003 conference on empirical methods in natural language processing. Sapporo, Japan, pp. 25鈥?2.
    Biber, D., Nekrasova, T., & Horn, B. (2011). The effectiveness of feedback for L1-English and L2-writing development: A meta-analysis. TOEFL iBT research report.
    Bitchener, J., & Ferris, D. R. (2012). Written corrective feedback in Second Language Acquisition and Writing. New York, NY: Routledge.
    Burstein, J., Chodorow, M., & Leacock, C. (2004). Automated essay evaluation: The criterion online writing service. AI Magazine, 25(3), 27鈥?6.
    Chandler, J. (2003). The efficacy of various kinds of error feedback for improvement in the accuracy and fluency of L2 student writing. Journal of Second Language Writing, 12(3), 267鈥?96.View Article
    Dahlmeier, D., & Ng, H. T. (2011). Grammatical error correction with alternating structure optimization. Proceedings of the 49th annual meeting of The Association for Computational Linguistics (pp. 915鈥?23). Stroudsburg, PA: ACL.
    Dahlmeier, D., Ng, H. T., & Wu, S. M. (2013). Building a large annotated corpus of learner English: The NUS corpus of learner English. In Proceedings of the Eighth workshop on innovative use of NLP for building educational applications, 22鈥?1.
    Dale, R., & Kilgarriff, A. (2011). Helping our own: The HOO 2011 pilot shared task. In Proceedings of the 13th European Workshop on Natural Language Generation (ENLG). Nancy, France, 242鈥?49.
    Dipper, S. (2005). XML-based stand-off representation and exploitation of multi-level linguistic annotation. In Proceedings of Berliner XML Tage 2005 (BXML 2005). Berlin, Germany, 39鈥?0.
    Eriksson, A., Finnegan, D., Kauppinen, A., Wiktorsson, M., W盲rnsby, A., & Withers, P. (2012). MUCH: The Malm枚 University-Chalmers Corpus of Academic Writing as a Process. In Proceedings of the 10th teaching and language corpora conference.
    Fathman, A. K. & Whalley, E. (1990). Teacher response to student writing: Focus on form versus content. In Kroll, B. (ed.) Second language writing: Research insights for the classroom, pp. 178鈥?90.
    Ferris, D. R. (1997). The influence of teacher commentary on student revision. TESOL Quarterly, 31(2), 315鈥?39.View Article
    Ferris, D. R. (2006). Does error feedback help student writers? New evidence on the short-and long-term effects of written error correction. In K. Hyland & F. Hyland (Eds.), Feedback in second language writing: Contexts and issues (pp. 81鈥?04). Cambridge: Cambridge University Press.View Article
    Ferris, D. R., & Roberts, B. (2001). Error feedback in L2 writing classes: How explicit does it need to be? Journal of Second Language Writing, 10, 161鈥?84.View Article
    Foster, J., Wagner, J., & van Genabith, J. (2008). Adapting a WSJ-trained parser to grammatically noisy text. In Proceedings of ACL.
    Graham, S., & Perin, D. (2007). A meta-analysis of writing instruction for adolescent students. Journal of Educational Psychology, 99(3), 445鈥?76.View Article
    Granger, S. (1999). Use of tenses by advanced EFL learners: Evidence from error-tagged computer corpus. In H. Hasselg氓rd (Ed.), Out of Corpora鈥擲tudies in Honour of Stig Johansson (pp. 191鈥?02). Amsterdam, Atlanta: Rodopi.
    Granger, S. (2004). Computer learner corpus research: Current status and future prospects. Language and Computers, 23, 123鈥?45.
    Granger, S. (2008). Learner corpora. In A. L眉deling & M. Kyto (Eds.), Corpus linguistics: An international handbook (Vol. 1). Berlin: Mouton de Gruyter.
    Granger, S., Dagneaux, E., Meunier, F., & Paquot, M. (2009). The international corpus of learner English. Version 2. Handbook and CD-ROM. Louvain-la-Neuve: Presses universitaires de Louvain.
    Han, N.-R., Chodorow, M., & Leacock, C. (2006). Detecting errors in English article usage by non-native speakers. Natural Language Engineering, 12(2), 115鈥?29.View Article
    Ide, N., Bonhomme, P., & Romary, L. (2000). XCES: An XML-based encoding standard for linguistic corpora. Proceedings of the second international language resources and evaluation conference (pp. 825鈥?30). Paris: ELRA.
    Krause, T. & Zeldes, A. (2014). ANNIS3: A new architecture for generic corpus query and visualization. To appear in Literary and Linguistic Computing. http://鈥媎sh.鈥媜xfordjournals.鈥媜rg/鈥媍ontent/鈥媏arly/鈥?014/鈥?2/鈥?2/鈥媗lc.鈥媐qu057
    Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33, 159鈥?74.View Article
    Lee, J., & Seneff, S. (2008). An analysis of grammatical errors in nonnative speech in English. In Proceedings of the IEEE Workshop on Spoken Language Technology 2008. pp. 89鈥?2.
    Lee, J., Tetreault, J., & Chodorow, M. (2009). Human evaluation of article and noun number usage: Influences of context and construction variability. In Proceedings of the Third Linguistic Annotation Workshop, pp. 60鈥?3.
    Lipnevich, A. A., & Smith, J. K. (2009). 鈥淚 really need feedback to learn:鈥?Students鈥?perspectives on the effectiveness of the differential feedback messages. Educational Assessment, Evaluation and Accountability, 21(4), 347鈥?67.View Article
    L眉deling, A., Doolittle, S., Hirschmann, H., Schmidt, K., & Walter, M. (2008). Das Lernerkorpus Falko. Deutsch als Fremdsprache, 2, 67鈥?3.
    L眉deling, A., Walter, M., Kroymann, E., & Adolphs, P. (2005). Multi-level Error Annotation in Learner Corpora. In Proceedings of Corpus Linguistics 2005. Birmingham.
    L眉deling, A., & Hirschmann, H. (to appear). Error Annotation. In Granger, S., Gilquin, G., & Meunier, F. (eds.), The Cambridge Handbook of Learner Corpus Research. Cambridge: Cambridge University Press.
    Marcus, M. P., Santorini, B., & Marcinkiewicz, M. A. (1993). Building a large annotated corpus of English: The Penn Treebank. Special Issue on Using Large Corpora, Computational Linguistics, 19(2), 313鈥?30.
    Nagata, R., Whittaker, E., & Sheinman, V. (2011). Creating a manually error-tagged and shallow-parsed learner corpus. Proceedings of the 49th annual meeting of the association for computational linguistics (pp. 1210鈥?219). Stroudsburg, PA: ACL.
    Nesi, H., Sharpling, G., & Ganobcsik-Williams, L. (2004). Student papers across the curriculum: designing and developing a corpus of British student writing. Computers and Composition, 21(4), 439鈥?50.View Article
    Nguyen, N. L. T. & Miyao, Y. (2013). Alignment-based annotation of proofreading texts toward professional writing assistance. In Proceedings of the international joint conference on natural language processing, pp. 753鈥?59.
    Nicholls, D. (2003). The Cambridge learner corpus: Error coding and analysis for lexicography and ELT. In Proceedings of the corpus linguistics 2003 conference.
    Paulus, T. M. (1999). The effect of peer and teacher feedback on student writing. Journal of Second Language Writing, 8(3), 265鈥?89.View Article
    Polio, C., & Fleck, C. (1998). 鈥淚f I only had more time:鈥?ESL learners鈥?changes in linguistic accuracy on essay revisions. Journal of Second Language Writing, 7(1), 43鈥?8.View Article
    Reznicek, M., L眉deling, A., & Hirschmann, H. (2013). Competing target hypotheses in the Falko corpus: A flexible multi-layer corpus architecture. In A. D铆az-Negrillo, N. Ballier, & P. Thompson (Eds.), Automatic treatment and analysis of learner corpus data (pp. 101鈥?24). Amsterdam: John Benjamins.View Article
    Rosen, A., Hana, J., Stindlova, B., & Feldman, A. (2014). Evaluating and automating the annotation of a learner corpus. Language Resources and Evaluation, 48, 65鈥?2.View Article
    Rozovskaya, A., & Roth, D. (2010). Annotating ESL errors: Challenges and rewards. In: Proceedings of NAACL鈥?0 workshop on innovative use of NLP for building educational applications.
    Russell, J., & Spada, N. (2006). The effectiveness of corrective feedback for the acquisition of L2 grammar: A meta-analysis of the research. In J. Norris & L. Ortega (Eds.), Synthesizing research on language learning and teaching (Language learning and language teaching 13) (pp. 133鈥?64). Amsterdam and Philadelphia: John Benjamins.
    Shemtov, H. (1993). Text Alignment in a Tool for Translating Revised Documents. Proceedings of the sixth conference on European chapter of the Association for Computational Linguistics (EACL-93) (pp. 449鈥?53). Stroudsburg, PA: ACL.View Article
    Snover, M., Dorr, B., Schwartz, R., Micciulla, L., & Makhoul, J. (2006). A study of translation edit rate with targeted human annotation. In Proceedings of the 7th conference of the association for machine translation in the Americas. Cambridge, MA, pp. 223鈥?31.
    Tetreault, J. R., & Chodorow, M. (2008). Native judgments of non-native usage: Experiments in preposition error detection. In Proceedings of the workshop on human judgements in computational linguistics, pp. 24鈥?2.
    Toutanova, K., Klein, D., Manning, C. D., & Singer, Y. (2003). Feature-rich part-of-speech tagging with a cyclic dependency network. Proceedings of the 2003 conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (NAACL-HLT 2003) (pp. 252鈥?59). Stroudsburg, PA: ACL.
    Toutanova, K., & Manning, C. D. (2000). Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In Proceedings of the 2000 joint SIGDAT conference on empirical methods in natural language processing and very large corpora. Hong Kong, pp. 63鈥?0.
    Truscott, J. (1996). The case against grammar correction in L2 writing classes. Language Learning, 46(2), 327鈥?69.View Article
    Truscott, J., & Hsu, A. Y.-P. (2008). Error correction, revision, and learning. Journal of Second Language Writing, 17(4), 292鈥?05.View Article
    Webster, J., Chan, A., & Lee, J. (2011). Introducing an online language learning environment and its corpus of tertiary student writing. Asia Pacific World, 2(2), 44鈥?5.View Article
    Wible, D., Kuo, C.-H., Chien, F.-L., Liu, A., & Tsao, N.-L. (2001). A web-based EFL writing environment: Integrating information for learners, teachers, and researchers. Computers & Education, 37(3鈥?), 297鈥?15.View Article
    Zeldes, A., Ritz, J., L眉deling, A., & Chiarcos, C. (2009). ANNIS: A search tool for multi-layer annotated corpora. In Proceedings of corpus linguistics 2009. Liverpool, UK.
    Zipser, F., & Romary, L. (2010). A model oriented approach to the mapping of annotation formats using standards. Proceedings of the workshop on language resource and language technology standards, LREC-2010 (pp. 7鈥?8). Malta: Valletta.
  • 作者单位:John Lee (1)
    Chak Yan Yeung (1)
    Amir Zeldes (2)
    Marc Reznicek (3)
    Anke L眉deling (4)
    Jonathan Webster (1)

    1. City University of Hong Kong, Kowloon, Hong Kong
    2. Georgetown University, Washington, DC, USA
    3. Universidad Complutense de Madrid, Madrid, Spain
    4. Humboldt-Universit盲t zu Berlin, Berlin, Germany
  • 刊物类别:Humanities, Social Sciences and Law
  • 刊物主题:Linguistics
    Computational Linguistics
    Computer Science, general
    Linguistics
    Languages and Literature
  • 出版者:Springer Netherlands
  • ISSN:1574-0218
文摘
Learner corpora consist of texts produced by non-native speakers. In addition to these texts, some learner corpora also contain error annotations, which can reveal common errors made by language learners, and provide training material for automatic error correction. We present a novel type of error-annotated learner corpus containing sequences of revised essay drafts written by non-native speakers of English. Sentences in these drafts are annotated with comments by language tutors, and are aligned to sentences in subsequent drafts. We describe the compilation process of our corpus, present its encoding in TEI XML, and report agreement levels on the error annotations. Further, we demonstrate the potential of the corpus to facilitate research on textual revision in L2 writing, by conducting a case study on verb tenses using ANNIS, a corpus search and visualization platform.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700