计算机化的大学英语分级考试效度分析
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
为适应建立国际性研究型大学及社会发展的需要,实行分级教学改革以提高教学质量,湖南大学大学英语分级考试项目组开发了“湖南大学计算机化的大学英语分级考试”(CCEPT)。为确保考试的质量,本研究针对新开发的考试进行了较全面的效度分析,为以后的CCEPT建设提出了一些改进的建议。
     本研究以现代语言测试理论为指导,检验CCEPT能否有效地反映学生的英语语言能力。综合1985年美国“教育与心理测量标准”提出的内容、效标和构念效度3种效度类型,本研究的效度框架具体集中在构念效度、内容效度和表面效度三方面。通过对考生答题情况、考生成绩、考试试卷以及考生问卷调查结果等分析,检验CCEPT的效度。所有研究的数据都使用统计软件SPSS分析。通过对试题难度和区分度的计算以及试卷内容对考试大纲所规定内容的覆盖度分析,判断CCEPT的内容效度;通过相关分析和因子分析,判断CCEPT构念效度;通过对考生的问卷调查,判断CCEPT的表面效度。
     分析结果表明CCEPT在构念效度、内容效度和表面效度表现出合理性,但还有许多地方有待改进。阅读理解和写作的相关系数只有+0.169,这很大程度上是由于写作部分的评分效度较低。用主成分分析法进行因子分析,说明试题共考察了“听、写能力”和“阅读能力”,但是作为一个交际语言能力测试,只有总分达050以上的考生才能进行口语测试,这将对考生产生不良的后效。
     整体来说,考试的内容效度也较理想。听力部分的难度系数是0.464,接近理想值0.5。但是听力部分的题目大多是涉及细节内容而不是语义功能,与考试大纲的要求不太符合。阅读部分题目对大纲规定的各项微技能考查分配合理;该部分的难度系数是0.569,最低为0.465,最高为0.708,但是Part D的难度偏高。该部分的区分度为0.27,接近理想值0.3。作文部分题材接近学生生活,但该部分的评分信度较低。
     对考生的问卷调查涉及到了考生计算机熟悉程度调查、对考试整体印象调查、考试公平性调查、考试可能产生的后效调查、试题所考察技能的情况调查等,结果表明大多数考生认同这次考试。
     本文也提出了一些建议:CCEPT应该扩大试题库以确保试题的质量和试卷的稳定性,尽量降低测试模式的影响。CCEPT也应进一步提高其信度,尤其是写作部分的阅卷信度,加强对评分员的评分培训。本文的研究成果有助于完善计算机化英语分级测试,对其他学科的分级测试也有一定的指导作用。
The Computer-based College English Placement Test (CCEPT) in Hunan University is a new test that aims to develop a scientific placement system tailored to Hunan University to facilitate the academic success of students in the University, and to meet the requirements of establishing an international research university. The study is a comprehensive validation of CCEPT, and several suggestions are given to the improvement of the test.
     The study is firmly rooted in the modern paradigm of test validation. The author employs the Standards for Educational and Psychological Testing (American Educational Research Association, American Psychological Association & National Council on Measurement in Education, 1985) to build arguments. Based on the theory of language testing, particularly that of communicative language testing, this study focuses on the analysis of content validity, construct validity and face validity to evaluate the test to ensure that it reflects the English language performance of the test takers in the light of the test specifications. Quantitative and qualitative forms of data are gathered from test takers. All statistical analysis is performed using SPSS 13.0. Logical analysis is used to evaluate the adequacy with which the test content represents the content domain. Discrimination Index and Facility Value of items are calculated. Correlation study and factor analysis are conducted to test the construct validity. Students’responses to the questionnaire are analyzed to see the face validity of the test.
     Results show that the validity of CCEPT is convincing. CCEPT has relatively high construct validity. The correlations between each subtest with another subtest and with the total test are generally satisfactory. However correlation between Reading Comprehension and Writing Test is as low as +0.169 which could be due to the marking reliability of the Writing test. The factor analysis shows that the extracted two factors in the test can be named as“ability of listening and writing”and“ability of reading”, and they can explain general test format of CCEPT. However, speaking test is available only to those whose final written score is 050 or above. As a communicative language test, it has not tested all the aspects of English language competence.
     Analyses of the content validity also show generally satisfactory results. The Facility Value of Listening Comprehension is 0.464, around the ideal value 0.5. However the listening component focuses mainly on specific details rather than on function of utterance, which fails to meet the specifications adequately. Items tested in Reading Comprehension reflect a harmonious balance among different micro-level skills. The overall Facility Value is 0.569 with the lowest 0.465 and the highest 0.708. The discrimination index of this component is 0.27 (around the standard value 0.3). The writing test topics are close enough to everyday life to encourage students to write on. However, study of inter-rater consistency evaluation shows the Scorer Reliability is rather low in the Writing Test.
     Responses to questionnaires concerning computer familiarity and computer anxiety, general impression of the test, fairness of the test, possible backwash effect, and skills tested in the test and so on, show that most of the test takers find CCEPT satisfactory and they treated it seriously.
     In the end, the thesis puts forward several suggestions: CCEPT should enlarge item bank in order to ensure the quality and stability of the test. Mode effect should be decreased to the lowest degree. In addition to the analysis of validity, the thesis further points out that CCEPT should improve its reliability, especially scorer reliability in writing scoring.
     The findings will have important practical implications for implementing a placement English test on computers and they will also be applicable in other academic contexts.
引文
[1] Amanda Bayliss & Ingram D E. IELTS as a Predicator of Academic Language Performance [R]. IELTS Research Reports, 2006, (3): 153-205
    [2] American Educational Research Association, American Psychological Association & National Council on Measurement in Education. Standards for Educational and Psychological Testing [M]. Washington DC: APA, 1985: 28
    [3] American Educational Research Association, American Psychological Association & National Council on Measurement in Education. Standards for Educational and Psychological Testing [M]. Washington DC: AERA, 1999: 35
    [4] Alderson J C, Clapham C & Wall D. Language Test Construction and Evaluation [M]. Beijing: Foreign Language Teaching and Research Press, 2000: 183-185
    [5] Bachman L F. Fundamental Considerations in Language Testing [M]. London: Oxford University Press, 1990: 190-223
    [6] Bachman L F. Statistical Analyses for Language Assessment [M]. Cambridge: Cambridge University Press, 2004 : 153-190
    [7] Bachman L F & Palmer A S. Language Testing in Practice [M]. London: Oxford University Press, 1999: 47, 90, 25
    [8] Child D. The Essentials of Factor Analysis [M]. London: Holt & Rinehart & William, 1970: 1-90
    [9] Cotton F & Conrow F. An Investigation of the Predictive Validity of IELTS among a Group of International Students at the University of Tasmania [R]. IELTS Research Reports, 1998, (1): 72-115
    [10] Davies A. Principles of Language Testing [M]. Oxford: Basil Blackwell, 1990: 167
    [11] Fiocoo M. English Proficiency Levels of Students from a Non-English Speaking Background: a Study of IELTS as an Indicator of Tertiary Success [R]. IELTS Research Reports, 1992, (8): 36-72
    [12] Freedle R & Kostin I. Does the Text Matter in a Multiple-choice Test of Comprehension: the Case for the Construct Validity of TOEFL’s Mini-talks [J]. Language Testing, 1999, (16): 2-32
    [13] Grant Henning. A Guide to Language Testing: Development, Evaluation and Research [M]. Beijing: Foreign Language Teaching and Research Press, 2001: 170-181
    [14] Guion R M. On Trinitarian Doctrines of Validity [J]. Professional Psychology, 1980, (11): 385-398
    [15] Heaton J B. Writing English Language Tests [M]. Beijing: Foreign Language Teaching and Research Press, 2000: 160-161, 137
    [16] Henning G. A Guide to Language Testing [M]. Cambridge, Massachusetts: Newbury House, 1987: 91-93, 49, 53
    [17] Hills J R, Hirsch T M & Subhiyah R G. Issues in Placement [J]. Washington DC: ERIC Clearinghouse on Tests, Measurement and Evaluation/American Institutes for Research, 1990, (5): 28-39
    [18] Hughes A. Testing for Language Teachers [M]. Cambridge: Cambridge University Press, 1989: 26, 22-23
    [19] Jennifer Greene & Young-Ju Lee. The Predictive Validity of an ESL Placement Test [J]. Journal of Mixed Methods Research, 2007, 1 (4): 366-389
    [20] Kaiser H F. “The Varimax Criterion for Analytic Rotation in Factor Analysis” [J]. Psychometrika, 1958, (23):187-200
    [21] Kane M T. An Argument-based Approach to Validity [J]. Psychological Bulletin, 1992, (112): 271-350
    [22] Kane M T. Current Concerns in Validity Theory [J]. Language Testing, 2001, 38 (4): 319-342
    [23] Kim J O & Mueller C W. Introduction to Factor Analysis: What It Is and How to Do It [M]. Beverly Hills, California: Sage, 1978: 9
    [24] Larson J. Considerations for Testing Reading Proficiency via Computer-adaptive Testing [C]. Cambridge: University of Cambridge Press, 1999, (10): 71–90
    [25] Mead A D & Drasgow F. Equivalence of Computerized and Paper-and-pencil Cognitive Ability Tests: a Meta-analysis [J]. Psychological Bulletin, 1993, (114): 449–580
    [26] Merrylees B & Mcdowell C. IELTS Research Reports [R]. IELTS Research Reports , 1999, (2): 1-35
    [27] Messick S. Validity [J]. Educational Measurement, 1989, (3): 13-104
    [28] Moss P A. Shifting Conceptions of Validity in Educational Measurement: Implications for Performance Assessment [J]. Review of Educational Research, 1992, (62): 229-258
    [29] Powers D E, Schedl M A, Butler F A, et a1.. Validating the Revised Test of Spoken English against a Criterion of Communicative Success [R]. New Jersey: Educational Testing Service, 1999: 59-90
    [30] Richards J C & Rodges T S. Approaches and Methods in Language Teaching [M]. Cambridge: Cambridge University Press, 1991: 180
    [31] Rosenfeld M, Leung S & Oltman P K. The Reading, Writing, Speaking and Listening Tasks are Important for Academic Success at the Undergraduate and Graduate Levels [R]. TOEFL Monograph Series Reports, 2001, (21): 81-95
    [32] Rosenfeld M, Oltman P K & Sheppard K. Investigating the Validity of TOEFL: A Feasibility Study Using Content and Criterion-Related Strategies [EB/OL]. http://www.ets.org/Media/Research/pdf/RM-04-03.pdf, 2004, (8): 6-18
    [33] Sands W A, Waters B K & McBride J R. Computerized Adaptive Testing: from Inquiry to Operation [R]. Washington, DC: American Psychological Association, 1997, (10): 9
    [34] Shephard L. Evaluating Test Validity [J]. Review of Research in Education, 1993, (9): 405-450
    [35] Shepard L A. The Centrality of Test Use and Consequences for Test Validity [J]. Educational Assessment: Issues and Practice, 1997, 16 (2): 5-13
    [36] Stephen B D, Daniel M K & Hoover H D. Quality Control in the Development and Use of Performance Assessments [J]. Applied Measurement in Education, 1991, 4 (4): 289-303
    [37] Stevenson D K. Authenticity, Validity and a Tea Party [J]. Language Testing, 1985, (2): 41-47
    [38] Weir C J. Communicative Language Testing [M]. Hertfordshire: Prentice Hall International (UK) Ltd, 1990: 67, 71
    [39] Weir C J. Construct Validity [C]. British Council and Cambridge Local Examinations Syndicate: Hughes A & Porter D & Weir C J, 1988: 15-25
    [40] Weir C J. Language Testing and Validation [M]. Palgrave: Macmillan, 2005: 89-107
    [41] Weir C J. Understanding and Developing Language Tests [M]. Hertfordshire: Prentice Hall International (UK) Ltd, 1993: 144
    [42] Wu Yi’an. What do Test of Listening Comprehension Test: A Retrospection Study of EFL Test-takers Performing a Multiple-choice Task [J]. Language Testing, 1998, 15 (1): 21-24
    [43] Young-Ju Lee. Construct Validation of an Integrated, Process-oriented and Computerized English for Academic Purposes Placement Test: a Mixed Method Approach [D]. Illinois: Urbana-Champaign, 2005 :16-21
    [44] 郭树军. 汉语水平考试 (HSK) 项目内部结构效度检验[C]. 汉语水平考试研究论文集. 北京:现代出版社,1995: 9-15
    [45] 何芳. 汉语水平考试 (HSK) 信度、效度分析报告[C]. 首届汉语考试国际学术讨论会论文. 北京:北京语言大学出版社,1995: 36-47
    [46] 金艳. 高级英语阅读测试的开发和效度研究[M]. 上海: 上海交通大学出版社, 2002: 9-12
    [47] 刘润清, 韩宝成.语言测试和它的方法[M]. 北京:外语教学与研究出版社,2003: 2
    [48] 王小玲. HSK(初中等)效度研究报告[J]. 语言教学与研究, 2006, (6): 49-56
    [49] 杨惠中, Weir. 大学英语四六级考试效度研究[M].上海:上海外语教育出版社, 1998: 82-93
    [50] 张凯. 汉语水平考试结构效度初探[C]. 首届汉语考试国际学术讨论会论文. 北京:北京语言大学出版社,1995: 24-35
    [51] 邹申.TEM 考试效度研究[M].上海: 上海外语教育出版社, 1997:1-86
    [52] 邹申,张艳莉,周越美. 阅读测试中题目类型、策略与分数的关系[J].外语与外语外语教学,2002, (5):19-22

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700