题目难度分布和样本容量对两种CTT等值结果的影响
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
在测验研究领域内寻找测量同一心理品质的两个测验形式之间分数转换关系的统计技术,叫等值。等值来源于实际工作的需要,其目的是为了使得两个不同测验形式之间的分数具有可比性。
     迄今,学者们已经提出了多种等值方法,其中基于经典测验理论(CTT)的方法主要有线性等值和等百分位等值两种。不同的等值方法会产生不同的等值结果。于是,到底用哪种等值方法得到的结果更加精确,就成为学者们关注的问题。对此,国内外已经有过许多研究,但由于每个研究所采用的研究情境各不相同,因此结论也各不相同。
     本研究用蒙特卡洛模拟研究方法,用单组非锚测验设计,以真分数等值为依据,综合比较了各种题目难度分布条件下和各种样本容量条件下两种CTT等值方法的等值结果。
     研究结果表明,在本研究所设情境中:
     (1)线性等值的误差受题目难度分布影响较大,等百分位等值的误差几乎不受题目难度分布影响。
     (2)线性等值的误差几乎不受样本容量的影响,等百分位等值的误差受样本容量影响较大。
     (3)不论题目难度分布如何,只要样本容量足够大,等百分位等值的效果都比线性等值更好。
     本研究的结论和以往研究有一些不同之处,为此本文也进行了一些讨论。
The statistical techniques used for converting two different test scores into a comparable scale is called equating when the test scores serve to measure the same psychological trait. Equating comes from practical jobs with the purpose of making the scores of two different tests comparable.
     Until now, the researchers have developed many kinds of equating methods, among which, linear equating and equipercentile equating are the two most common which are based on Classical Test Theory (CTT). Different equating methods would lead to different equating results. Thus, scholars are much concerned about which method would produce the most accurate results. To this end, there are many researches conducted at home and abroad. However, due to different research contexts, the conclusions are not the same.
     Based on the true score equating and single group design without anchor test and employed Monte Carlo simulation method, this research comprehensively compared the two CTT equating methods in different difficulty distributions of test items and different sample sizes.
     The simulation results showed as follows:
     (1) The error of linear equating was much affected by difficulty distributions of test items, while the error of equipercentile equating was hardly affected.
     (2) The error of linear equating was hardly affected by sample sizes, while the error of equipercentile equating was much affected.
     (3) No matter how the difficulty distributions of test items were, equipercentile equating was better than linear equating as long as the sample sizes were large enough.
     The conclusions here were somewhat different from the previous results. They were also discussed in the paper.
引文
陈希镇. (2007,1月).不等信度下等值新公式.中国考试,(1), 22-25.
    戴海琦. (1999a).等值误差理论与我国高考等值的误差控制.江西师范大学学报(社会科学版),32(1), 29-35.
    戴海琦. (1999b).心理与教育测量.广州:暨南大学出版社.
    戴海琦,刘启辉. (2002).锚题题型与等值估计方法对等值的影响.心理学报, 35(4), 367-370.
    邓湘云. (1996). CTT与IRT等值方法比较研究.硕士学位论文.南昌:江西师范大学.
    焦丽亚,辛涛.(2006).基于CTT的锚测验非等组设计中四种等值方法的研究.心理发展与教育,22(1), 97-102.
    廖利国. (2010).湖南省高考理科数学的等值研究.硕士学位论文.长沙:湖南师范大学.
    刘启辉. (1999).经典理论等值方法与铆结构的交互作用研究.硕士学位论文.南昌:江西师范大学.
    罗莲. (2008a).基于HSK数据对核等值法与其他等值方法的比较研究.博士学位论文.北京:北京语言大学.
    罗莲. (2008b).一种新的等值方法:核等值法.心理学探新,28(2), 69-73.
    罗照盛. (1997).经典理论等值的误差研究.硕士学位论文.南昌:江西师范大学.
    漆书青,戴海琦,丁树良. (2002).现代教育与心理测量学原理.北京:高等教育出版社.
    谢小庆. (2000).对15种测验等值方法的比较研究.心理学报,32(2), 217-223.
    张敏强,胡晖.(1988).略论测验等值的理论、方法和应用.华南师范大学学报(社会科学版),(4), 113-118.
    中国高考等值研究课题组. (1998).中国高考等值研究报告.北京:教育部考试中心.
    Angoff, W. H. (1971). Scales, norms and equivalent scores. In: R. L. Thorndike (Ed.), Educational measurement (pp. 508-600). Washington DC: American Council on Education.
    Braun, H.I.,& Holland, P.W. (1982). Observed-score test equating: A mathematical analysis of some ETS equating procedures. In: P.W.Holland, & D.B.Rubin (Eds.). Test equating (pp. 9-49). New York: Academic.
    Dorans, N. J., Moses, T. P., & Eignor, D.R. (2010). Principles and Practices of Test Score Equating. ETS, 2010.
    Dorans, N. J., Moses, T. P., & Eignor, D. R. (2011). Equating Test Scores: Toward Best Practices. In: Statistical models for test equating, scaling, and linking. Statistics for Social and Behavioral Sciences, 21-42.
    Hanson, B. A. (1991). A note on Levine’s formula for equating unequally reliable tests using data from the common item nonequivalent groups design. Journal of Educational Statistics, 16(2), 93-100.
    Holland, P. W., & Dorans, N. J. (2006). Linking and equating. In R. L. Brennan (Ed.), Educational Measurement. Westport, CT: Praeger. 187-220. Kolen, M. J. (1990). Does matching in equating work? A discussion. Applied Measurement in Education, 3(1), 97-104.
    Kolen, M. J., & Brennan, R. L (1995). Test equating: methods and practices. New York :Springer-Verlag.
    Kolen, M. J., & Brennan, R. L. (2004). Test equating, linking, and scaling: Methods and practices. New York, NY: Springer-Verlag.
    Livingston, Samuel A. et al. (1990). What combination of sampling and equating methods works best? Applied Measurement in Education, 3(1), 73-95.
    Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Erlbaum.
    Marco, G. L., Petersen, N. S., & Stewart, E. E. (1979). A test of the adequacy curvilinear score equating models. Paper presented at the 1979 Computer Adaptive Testing Conference, Minneapolis.
    Moses, T. (2009). A Comparison of Statistical Significance Tests for Selecting Equating Functions. Applied Psychological Measurement, 33(4), 285-306. Petersen, N. S., Linda, L. C., & Matha, L. S. (1983). IRT versus conventional equating methods: A comparative study of scale stability. Journal of educational statistics, 8(2), 137-156.
    Samuel A. Livingston. (1993). Small-Sample equating with log-linear smoothing. Journal of Educational Measurement, 30(1), 23-39.
    
    Skaggs, G. (2004). Passing score stability when equating with very small samples. Paper presented at the meeting of the American Educational ResearchAssociation, San Diego, CA.
    Slinde, J. A., & Linn, R. L. (1977). Vertically equated tests: Fact or phantom? Journal of educational measurement, 14(1), 23-32.
    Stroud, T. W. F. (1990).关于“线性分数等值模型的适用性检验”的讨论.见:Paul W. Holland, & Donald B. Rubin主编,叶佩华等译.测验等值.广州:广东高等教育出版社.148-149.
    Thayer, D. T. (1983). Maximum likelihood estimation of the joint covariance matrix for sections of tests given to distinct samples with application to test equating. Psychometrika, 48(2), 293-297.
    van der Linden, W. J., & Luecht. R. M. (1998). Observed-score equating as a test assembly problem. Psychometrika, 63(4), 401-418.
    von Davier, A. A. (2007). Potential solutions to practical equating issues. In: N. J. Dorans, M. Pommerich, & P. W. Holland (Eds.), Linking and aligning scores and scales (pp.89-106). New York, NY: Springer-Verlag.
    von Davier, A. A. (2008). New results on the linear equating methods for the non-equivalent-groups design. Journal of Educational and Behavioral Statistics, 33(2), 186-203.
    von Davier, A. A., Holland, P. W., & Thayer, D. T. (2004a). The chain and post-stratification methods for observed-score equating: Their relationship to population invariance. Journal of Educational Measurement, 41(1), 15–32.
    von Davier, A. A., Holland, P. W., & Thayer, D.T. (2004b). The Kernel Method of Test Equating.New York: Springer—Verlag.
    Zeng, L., & Cope, R. T. (1995). Standard error of linear equating for the counterbalanced design. Journal Of Educational And Behavioral Statistics,20(4), 337-348.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700