计算机辅助语言测试:效度分析
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
计算机已经广泛用于了教育的各个领域,包括教,学以及测试。在
    语言测试方面,计算机的应用日趋广泛,并有逐步代替传统笔试的趋势。
    随着计算机技术的日新月异的发展,可以用计算机进行测试的种类从最
    初的客观题已经发展到听,说,读,写各个语言技巧方面的测试(见第
    一章)。
     所谓计算机辅助测试(CAT)指的是在过程的任何环节运用了计算
    机技术的测试。就象任何新产生的事物一样,人们对计算机辅助测试的
    信度(Reliability),效度(Validity)等方面依然存在着疑问。本文除了对计
    算机辅助语言测试目前的所取得的进展以及今后的发展方向进行了全面
    的描述以外,重点探讨了关于测试效度的几个问题。在讨论计算机辅助
    语言测试的效度问题的时候,本文分两个章节分别对基于试题库的计算
    机自适应性测试(CALT)和计算机化的现实性测试(Authentic
    Assessment)的效度进行了分析。
     本文分由序言,第一章,第二章,第三章,第四章,结语六个部分
    组成。
     序言主要介绍了本文的研究方向,目的和文章结构。
     第一章主要描述了目前计算机辅助语言测试所取得的进展和使用范
    围。主要分四个方面进行了论述:试题库的建设;计算机辅助语言测试
    可适用的测试种类;计算机自适应性语言测试以及计算机在语言测试中
    运用的效果。
     试题库建设包括生成,操作,分析,储存,管理,挑选试题的所有
    过程。一个大的,科学管理的试题库是生成规范试卷的保证。一个小的
    
    试题库可以用传统的方法,比如说文件卡片的方式进行管理。可是一个
    容量的非常大的试题库就只能用计算机进行建设和分析,并且很多这方
    面的软件己经问世,这样缸使得试题库处理更加简住和有效;用计算机
    进行语言测试并不彤响测试的种类,在第一章的第二部分,本文总结了
    计算机辅助语言测试的种类,并且分析了运用计算机进行测试的优势所
    在:打破了传统测试集体性和限时性,达到了个别化测试和不限时测试,
    是测试上的一次革命;接着第一章介绍了计算机辅助测试的一个很独特
    并且得到最广泛运用的分支:计算机自适应性语言测试.和其他的语言
    测试相比,它有以下几个特点:试题根据波测试者的不同而自行调节:
    一旦棱测试者的实际水平被测出,测试自行停止;一般在曰目数虽上要
    少于别的测试。计算机自适应性语言测试是基于项目反应理论mem
    ResOO’lseTheoryorIRT)的一种测试,已经被美国研究生人学考试
    N桐沏te ReCOrd W or GRE和工商管理类研究生人学考试
    (Gradua for Manapernent and Ad’llllllsthaon Tdor GMA)以及全美护
    士国家委员会资格考试(NUrS Co’llllllltee Lcense Test or
    NNCL)等都已采取了CAT的方式。不难看出CAT代表了今后教育、
    心理测验发展的方向与重点。第一章的第四部分从测试本身与人为因素
    两个方面论述了计算机辅助语言测试的优势,接着从物理考虑和测试者
    表现考虑论述了计算机输助语言测试存在的不足之处。
     第二章专门对基于试题库的计算机自适应性语言测试的效度进行了
    分析。首先给出了“试图库建设\“项目反应理论”和“效度”一些术
    语的定义。然后对于影响计算机自适应性语言测试的几个主要因素进行
    了讨论并一些解诀办法。这些因素主要包括:测试模式的变化,测试者
    对计算机的戮悉程度的不同,计算机测试带来的紧张愚,速度的彤响,
    题日顺序的变化,试卷长度的不同,测试者对出题方式的事先训练,和
    
    试题库维度(碰mensiom山勺)的考虑。
     计算机辅助语言测试除了机遇试题库的以外,还包括现实语言运用
    的测试,主要指写作能力的测试。第三章就是通过一个实验对这种类型
    测试的效度问题进行了分析。现在很多写作方面的测试仍然要求被测试
    者用笔答的方式进行,但很多学生己经习惯了计算机上的写作,或者相
    反的情况下,那么这种测试模式和练习模式的不同是不是会影响测试的
    效度呢?这个问题争论已久。本文在这一章就以解放军外语学院98级英
    语本科学员为对象,进行了一个实验。实验的假设是:测试模式的变化
    对写作能力测试的结果有有意义的影响,而对客观题(这里采用的是阅
    读理解的多项选择题)的影响却是可以忽略的。测试者以他们的专业四
    级成绩和对计算机的熟悉程度(通过问卷和打字速度测试)分成实验组
    和控制组,分别以笔答和上机的方式答同一份试卷,试卷包括二十道基
    于短篇阅读理解的多项选择题和一道写作题,机上的部分采用了
    AUTHQRWARE软件编写,在形式上尽量做到跟笔答试卷相同。两组的
    作文都以文本方式进行打分,以避免打分者的偏见(bias)。最后对结果
    进行了详细的数据分析,验证了假设。除此之外,还对写作题答案进行
    了文本分析,发现计算机答题能产生较长的文本和相对多的段落,并且
    通过性别分析,得出结论:计算机写作测试对女生的影响大于男生。这
    个实验表明我们在评估学生写作能力的时候要考虑到测试模式与练习模
    式不同对测试结果的影响,特别是当学生习惯笔试的情况下?
引文
Alderman,D.L.,& Holland,P.W.(1981) .Item performance across native language groups on the test of English as a foreign language.TOEFL Research Report No.9. Princeton,NJ: Educational Testing Service.
    Baker,F.B.(1989) .Computer technology in test construction and processing.In R.L.Linn Educational measurement (3rd ed.,pp.409-428) .London: Collier Macmillan.
    Barton,P.E.& Coley R.J.(1994) Testing in America's schools.Princeton,NJ Educational Testing Service Policy Information Center.
    Beaton,A.E.& Zwick,R.(1990) .The Effect of Changes in the National Assessment: Disentangling the NAEP 1985-86 Reading Anomaly.Princeton,NJ: Educational Testing Service,ETS.
    Bernhardt,E.(1996,March).If reading is reader-based,can there be a computer adaptive reading test? In M.Chaloub-Deville (Chair),Issues in computer-adaptive testing of second language reading proficiency (p.18) .Symposium conducted at the Center for Advanced Research on Language Acquisition of the University of Minnesota,Bloomington,MN.
    BestTest [Computer software].(1990) .Chicago,IL: WiseWare.
    Bock,R.D.,& Mislevy,R.J.(1982) .Adaptive EAP estimation of ability in a microcomputer environment.Applied Psychological Measurement,6,431-444. (CALT advantage,)
    Brown,J.D.(1992a).Technology and language education in the twenty-first century: Media,message,and method.Language Laboratory,29,l-22. (Technology and CAT,
    Brown,J.D.(1992b).Using computers in language testing.Cross Currents,19,92-99. (advantage and disadvantage of CAT,)
    
    
    Brown,J.D.(1996) .Testing in language programs.Upper Saddle River,NJ: Prentice Hall,(item banking,)
    Bunderson,C.V.,Inouye,D.K.& Olsen,J.B.(1989) .The four generations of computerized educational measurement.In Linn,R.L.,Educational Measurement (3rd Ed).Washington,D.C.: American Council on Education,pp.367-407.
    Camber,M.A.,& Cook,D.L.(1985) .Computer anxiety: Definition,measurement,and correlates.Journal of Educational Computing Research,1,37-54.
    Campbell,N.J.(1986) .Technical characteristics of an instrument to measure computer anxiety of upper elementary and secondary school students.Paper presented at the annual meeting of the National Council on Measurement in Education,San Francisco.
    Canale,M.(1986) .The promise and threat of computerized adaptive assessment of reading comprehension.In C.W.Stansfield (Ed.),Technology and language testing (pp.29-45) .Washington,DC: TESOL.(CALT's positive and negative aspects,)
    Chen,Z.,& Henning,G.(1985) .Linguistic and cultural bias in language proficiency tests.Language Testing,2,155-163.
    Cohen,D.(1990) .Reshaping the Standards Agenda: From an Australian's Perspective of Curriculum and Assessment.In P.Broadfoot,R.Murphy & H.Torrance (Eds.),Changing Educational Assessment: International Perspectives and Trends.London: Routledge.
    Cohen,J (1977) .Statistical power analysis for the behavioral sciences (rev.ed.) NY: Academic Press.
    Corel Paradox 7. 0 [Computer software].(1996) .Ottawa,Ontario,Canada: Corel Corporation
    
    
    Daiute,C.(1986) .Physical and cognitive factors in revising: insights from studies with computers.Research in the Teaching of English,20 (May),p.141-59.
    Darling-Hammond,L.,Acness,J.& Falk,B.(1995) .Authentic Assessment in Action.New York,NY: Teachers College Press.
    Drasgow,F.,Levine,M.V.,& McLaughlin,M.E.(1987) .Detecting inappropriate test scores with optimal and practical appropriateness indices.Applied Psychological Measurement,11,59-80.
    Dunbar,S.B.,Koretz,D.M.,& Hoover,H.D.(1991) .Quality Control in the Development and Use of Performance Assessments.Applied Measurement in Education,4(4) ,289-303.
    Educational Testing Service.(1996) .TOEFL: Announcing computer-based testing.Princeton,NJ: Educational Testing Service.
    Eignor,D.,Taylor,C.,Kirsch,I.,& Jamieson,J.(1997) .Development of a scale for assessing the level of computer familiarity of TOEFL examinees.Unpublished ms.Princeton,NJ: Educational Testing Service.(computer famalarity,)
    Flaugher,R.(1990) .Item pools.In H.Wainer,N.J.Dorans,R.Flaugher,B.F.Green,R.J.Mislevy,L.Steinberg,& D.Thissen (Eds.),Computerized adaptive testing: A primer (pp.41-63) .Hillsdale,NJ: Lawrence Erlbaum.
    Green,B.F.(1988) .Construct validity of computer-based tests.In H.Wainer & H.I.Braun (Eds.),Test validity (pp.77-86) .Hillsdale,NJ: Lawrence Erlbaum.(IRT'S problems)
    Glennan,T.K.,& Melmed,A.(1996) .Fostering the use of educational technology: Elements of a national strategy.Santa Monica,CA: RAND.
    Gressard,C.,& Loyd,B.H.(1984) .An investigation of the effects of math anxiety and sex on computer attitudes.Paper presented at the annual meeting of the American Educational Research Association,New Orleans.
    
    
    Griffin,P.E.(1985) .The use of latent trait models in the calibration of tests of spoken language in large-scale selection-placement programs.In Y.P.Lee,A.Fok,R.Lord,and G.Low (Eds),New directions in language testing (PP.149-161) .Oxford: Pergamon Press.
    Henning,G.(1986) .Item banking via dBase II: The UCLA ESL Proficiency Examination experience.In C.W.Stansfield (Ed.),Technology and language testing (pp.69-77) .Washington,DC: TESOL.
    Haas,C.& Hayes,J.R.(1986) .What Did I Just Say? Reading Problems in Writing with the Machine.Research in the Teaching of English,20(1) ,22-35.
    Hambleton,R.K.,& Swaminathan,H.(1985) .Item response theory: Principles and applications.Boston: Kluwer-Nijhoff.(IRT)
    Hancock,G.R.& Klockars,A.J.(1996) .The quest for G Developments in multiple comparison procedures in the quarter century since Games (1971) .Review of Educational Research,66(3) ,269-306.
    Hedges,L.V.& Olkin,I.(1985) Statistical methods for meta-analysis.San Diego: Academic Press.
    Henning,G.(1984) .Advantages of latent trait measurement in language testing.Language Testing,1,123-133.
    Henning,G.,& Davison,F.(1987) .Scalar analysis of composition ratings.In K.M.Bailey,T.L.Dale,and R.T.Clifford (Eds.),Language testing research (pp.24-38) .Monterey,CA: Defense Language Institute.
    Henning,G.(1987) .A guide to language testing: Development,evaluation,research.New York: Newbury House.(IRT,)
    Henning,G.(1991) .Validating an item bank in a computer-assisted or computer-adaptive test.In P.Dunkel (Ed.),Computer-assisted language learning and testing: Research issues and practice (pp.209-222) .New York: Newbury House.(IRT's problems concerning validity,computer anxiety,)
    
    
    Henning,G.,Johnson,P.J.,Boutin,A.J.,& Rice,H.R.(1994) .Automated assembly of pre-equated language proficiency tests.Language Testing,11,14-28.
    Hicks,M.(1989) .The TOEFL computerized placement test: Adaptive conventional measurement.TOEFL Research Report No.31. Princeton,NJ: Educational Testing Service.(CAT disadvantage,)
    Hulin,C.L.,Drasgow,F.,& Parson,C.K.(1983) .Item response theory and the assumption of unidimensionality for language tests.Language Testing,2,141-154.
    Kaya-Carton,E.,Carton,A.S.,& Dandonoli,P.(1991) .Developing a computer-adaptive test of French reading proficiency.In P.Dunkel (Ed.),Computer-assisted language learning and testing: Research issues and practice (pp.259-284) .New York: Newbury House.(CALTgd,)
    Kingsbury,G.,& Houser,R.(1993) .Assessing the utility of item response models: Computer adaptive testing.Educational Measurement: Issues and Practice.12,21-27.
    Kirsch,I.,Jamieson,J.,Taylor,C.,& Eignor,D.(1997) .Computer familiarity among TOEFL examinees.Unpublished manuscript.Princeton,NJ: Educational Testing Service.(computer famalarity,)
    Larson,J.W.,& Madsen,H.S.(1985) .Computerized adaptive language testing: Moving beyond computer-assisted testing.CALICO Journal 2,32-36,43. (CALTs,)
    Laurier,M.(1991) .What we can do with computerized adaptive testing...and what we cannot do! In S.Anivan (Ed.),Current developments in language testing (pp.244-255) .Singapore: Regional Language Centre.(CALTgd,)
    Laurier,M.(1996) .Using the information curve to assess language CAT efficiency.In A.Gumming & R.Berwick (Eds.),Validation in language testing (pp.111-123) .Clevedon,UK: Multilingual Matters.(CALTgd,)
    
    
    Levine,M.V.,& Dragow,F.(1983) .Appropriateness measurement: Validating studies and variable ability models.In D.J.Weiss (Ed.),New horizons in testing: Latent trait test theory and computerized adaptive testing (pp.109-131) .New York: Academic Press.
    Lord,F.M.(1968) .An analysis of the verbal scholarstic aptitude test using Birnbaum's three-parameter logistic model.Educational and Psychological Measurement,28,989-1020.
    Lord,F.M.(1980) .Applications of item response theory to practical testing problems.Hillsdale,NJ: Lawrence Erlbaum.(IRT)
    Lunz,M.E.,& Bergstrom,B.A.(1994) .An empirical study of computerized adaptive test administration conditions.Journal of Educational Measurement,31,251-263. (item omissions,)
    McNamara,T.(1996,March).Computer adaptive testing: An outsiders view.In M.Chaloub-Deville (Chair),Issues in computer-adaptive testing of second language reading proficiency (pp.19-23) .Symposium conducted at the Center for Advanced Research on Language Acquisition of the University of Minnesota,Bloomington,MN.
    Madsen,H.S.,& Larson,J.W.(1986) .Computerized Rasch analysis of item bias in ESL tests.In C.W.Stansfield (Ed.),Technology and language testing (pp.47-67) .Washington,DC: TESOL.(IRT&item bias,)
    Madsen,H.S.(1987) .Utilizing Rasch analysis to detect cheating on language examinations.In K.M.Bailey,T.L.Dale,and R,T.Clifford (Eds.),Language testing research (pp.11-23) .Monterey,CA: Defense Language Institute.
    Madsen,H.S.(1991) .Computer-adaptive testing of listening and reading comprehension.In P.Dunkel (Ed.).Computer-assisted language learning and testing: Research issues and practice (pp.237-257) .New York: Newbury House.(CALTs,CALTex,)
    McNamara,T.(1996,March).Computer adaptive testing: An outsiders view.In M.Chaloub-Deville (Chair),Issues in computer-adaptive testing of second
    
    language reading proficiency (pp.19-23) .Symposium conducted at the Center for Advanced Research on Language Acquisition of the University of Minnesota,Bloomington,MN.
    Mead,A.D & Drasgow,F.(1993) Equivalence of computerized and paper-and-pencil cognitive ability tests: A meta-analysis.Psychological Bulletin,114(3) ,449-458.
    Mead,R.J.(1981) .Basic ideas in item banking.Paper presented at the annual meeting of the National Council on Measurement in Education,Los Angeles.(ERIC Document Reproduction Service No.ED 208 029)
    MicroCAT Testing System [Computer software].(1984) .St.Paul,MN: Assessment Systems.
    Microsoft Access 2. 0 [Computer software].(1996) .Redmond,WA: Microsoft Corporation.
    Mislevy,R.J.,& Bock,R.D.(1986) .PC-BILOG: Item analysis and test scoring with,binary logistic models.Mooresville,IN: Scientific Software.(software for IRT,)
    Morocco,C.C.& Neuman,S.B.(1986) .Word processors and the acquisition of writing strategies.Journal of Learning Disabilities 19(4) ,243-248.
    Mourant,R.R,Lakshmanan,R.& Chantadisai,R.(1981) .Visual Fatigue and Cathode Ray Rube Display Terminals.Human Factors,23(5) ,529-540.
    Neu,J.,& Scarcella,R.(1991) .Word processing in the ESL writing classroom: A survey of student attitudes.In P.Dunkel (Ed.),Computer-assisted language learning and testing: Research issues and practice (pp.169-187) .New York: Newbury House.
    Nickell,Samila S."Computer-Assisted Writing Conferences." CALICO 1985 Symposium.Baltimore,2 Feb.1985.
    PARGrade 3. 0 [Computer software].(1990) .Costa Mesa,CA: Economics Research.
    
    
    PARScore 3. 0 [Computer software].(1990) .Costa Mesa,CA: Economics Research.
    PARTest 3. 0 [Computer software].(1990) .Costa Mesa,CA: Economics Research.
    Perkins,K.,& Miller,L.D.(1984) .Comparative analysis of English as a second language reading comprehension data: Classical test theory and latent trait measurement.Language Testing,1,21-32.
    Phinney,M.(1991) .Computer-assisted writing and writing apprehension in ESL students.In P.Dunkel (Ed.),Computer-assisted language learning and testing: Research issues and practice (pp.189-204) .New York: Newbury House.
    Reid,J.(1986) .Using the writer's workbench in composition teaching and testing.In C.W.Stansfield (Ed.),Technology and language testing (pp.167-188) .Washington,DC: TESOL.(testing for writing,)
    Robinson-Stavely,K.& Cooper,J.(1990) .The use of computers for writing: Effects on an English composition class.Journal of Educational Computing Research,6(1) ,41-48.
    Rosenthal,R.& Rubin,D.B.(1982) A simple,general purpose display of magnitude of experimental effect.Journal of Educational Psychology,74,166-169.
    Rosenthal,R.(1994) Parametric measures of effect size.In Cooper,H.& Hedges,L.The handbook of research synthesis.NY: Russell SAGE,pp.231-244
    Rubin,L.S.,& Mott,D.E.W.(1983) .Comparison of three different item banking methods for longitudinal test equating.Paper presented at the annual meeting of the American Educational Research Association,Montreal,Quebec.(ERIC Document Reproduction Service No.ED 228 315)
    Snyder,T.D.& Hoffman,C.(1990) .Digest of Education Statistics.Washington,DC: U.W.Department of Education.
    
    
    Snyder,T.D.& Hoffman,C.M.(1993) .Digest of Education Statistics.Washington,DC: U.S.Department of Education.
    Snyder,T.D.& Hoffinan,C.M.(1994) .Digest of Education Statistics.Washington,DC: U.S.Department of Education.
    Snyder,T.D.& Hoffinan,C.M.(1995) .Digest of Education Statistics.Washington,DC: U.S.Department of Education.
    Stenson,H.(1988) .Testat: A supplementary module for SYSTAT (version 2. 0) .Chicago,IL: SYSTAT.(software for IRT,)
    Stevenson,J.,& Gross,S.(1991) .Use of a computerized adaptive testing model for ESOL/bilingual entry/exit decision making.In P.Dunkel (Ed.),Computer-assisted language learning and testing: Research issues and practice (pp.223-235) .New York: Newbury House.(CALTex,)
    Stocking,M.L.(1992) .Controlling item exposure rates in a realistic adaptive testing paradigm (Research Report # 93-2) ,Princeton,NJ: Educational Testing Service.( logistical issues-item exposure,)
    Testmaster [Computer software].(1988) .Zurich,Switzerland: Eurocentres-Wright,B.D.,Linacre,J.M.,& Schulz,M.(1990) .BIGSTEPS: General-purpose Rasch analysis program (version 2. 00) .Chicago,IL: Mesa Press.s (software for IRT,)
    Tung,P.(1986) .Computerized adaptive testing: Implications for language test developers.In C.W.Stansfield (Ed.),Technology and language testing (pp.11-28) .Washington,DC: TESOL.(CALTs,)
    Urry,V.W.(1977) .Tailored testing: A successful application of latent trait theory.Journal of College Admissions,27,9-16.
    Wainer,H.(1983) .On item response theory and computerized adaptive tests.Journal of College Admissions,27,9-16.
    
    
    Wainer,H.C.,& Kiely,G.L.(1987) .Item clusters and computerized adaptive testing: A case for testlets.Journal of Educational Measurement,24,185-201. (item sets,)
    Wainer,H.,Dorans,N.J.,Green,B.F.,Mislevy,R.J.,Steinberg,L.,& Thissen,D.(1990) .Future challenges.In H.Wainer,N.J.Dorans,R.Flaugher,B.F.Green,R.J.Mislevy,L.Steinberg,& D.Thissen (Eds.),Computerized adaptive testing: A primer (pp.233-272) .Hillsdale,NJ: Lawrence Erlbaum.(IRT,CALTprimer,)
    Walliams,P.L.,& Slawski,E.J.(1980) .Applications of the Rasch model for the development of equivalent forms of criterion-referenced tests.Paper presented at the annual meeting of the American Educational Research Association,Boston.
    Weiss,D.J.(1983) .New horizons in testing: Latent trait test theory and computerized adaptive testing.New York: Academic Press.(CALT tech,)
    Williamson,M.L.& Pence,P.(1989) .Word processing and student writers.In B.K.Briten & S.M.Glynn (Eds.),Computer Writing Environments: Theory,Research,and Design (pp.96-127) .Hillsdale,NJ: Lawrence Erlbaum & Associates.
    Woods,A.,& Baker,R.(1985) .Item response theory.Language testing,2,119-140.
    Zandvliet,D.& Farragher,P.(1997) .A comparison of computer-administered and written tests.Journal of Research on Computing in Education,29(4) ,423-438.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700