计算机化多维测验中作答时间和作答精度数据的联合分析
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Joint Modeling for Response Times and Response Accuracy in Computer-based Multidimensional Assessments
  • 作者:詹沛达
  • 英文作者:Zhan Peida;Department of Psychology, College of Teacher Education, Zhejiang Normal University;Collaborative Innovation Center of Assessment toward Basic Education Quality, Beijing Normal University;
  • 关键词:项目反应理论 ; 多维项目反应理论 ; 题目作答时间 ; 计算机化测验 ; 联合建模 ; Rasch模型 ; 国际学生能力评估
  • 英文关键词:item response theory;;multidimensional item response theory;;item response times;;computer-based assessment;;joint modeling;;Rasch model;;PISA
  • 中文刊名:XLKX
  • 英文刊名:Journal of Psychological Science
  • 机构:浙江师范大学教师教育学院心理学系;北京师范大学中国基础教育质量监测协同创新中心;
  • 出版日期:2019-01-20
  • 出版单位:心理科学
  • 年:2019
  • 期:v.42;No.237
  • 语种:中文;
  • 页:XLKX201901026
  • 页数:9
  • CN:01
  • ISSN:31-1582/B
  • 分类号:172-180
摘要
随着心理与教育测量研究的发展和科技的进步,计算机化(大规模)测验逐渐受到人们的关注。为探究在计算机化多维测验中如何利用作答时间数据来辅助评估多维潜在能力,以及为我国义务教育阶段教育质量监测提供数据分析方法上的理论支持。本研究以2012年和2015年国际学生能力评估(PISA)计算机化数学测验数据为例,提出了一种可同时分析作答时间和作答精度数据的联合作答与时间的多维Rasch模型。根据新模型对PISA数据的分析结果,表明引入作答时间数据,不仅有助于提高多维Rasch模型参数的估计精度,还有助于数据分析者在今后利用被试的作答时间信息来做进一步的决策和干预(例如:对异常作答行为或预备知识的诊断)。
        With the advance in computerized tests, item response times(RTs) collection has become a routine activity in many large-scale tests. As a result, besides the traditional item response accuracy(RA) data, an additional source of information is available to test developers and data-analysts. Recorded RT may help to improve test design, aberrant response behavior detection, and item selection in computerized adaptive tests. For example, when respondents are not motivated in a low-stakes test, they may respond to items in a speeded manner, such responding behavior may not be easily identified only based on RA. Among several proposed RT modeling approaches, the hierarchical modeling framework(van der Linden, 2007) is one of the most flexible tools to explain the relationship between response speed and accuracy. This framework is generalized enough to integrate available measurement models for RA and RT. Currently, however, almost all RT studies only employ the unidimensional item response theory(IRT) models as a measurement model for RA. The unidimensional IRT models only provide a single overall ability score which may not meet the needs of providing multidimensional analysis and assessment results.To provide multidimensional analysis results with collateral information in RT, this study proposed a joint responses and times multidimensional Rasch model(JRT-MRM) for fitting RT and RA data simultaneously. In the JRT-MRM model, the multidimensional Rasch model(Adams, Wilson, & Wang, 1997) was employed as the measurement model and the lognormal RT model(van der Linden, 2006) was employed as the RT model, respectively. Model parameter estimation was explored using the Bayesian MCMC method via JAGS(Version 4.2.0)(Plummer, 2015). The PISA 2012 and 2015 computer-based mathematics data were analyzed. For simplicity, only the PISA 2012 data was introduced and mentioned here. This dataset contained 1582 participants' dichotomous RA and log RT data to 10 items. According to the 2012 PISA mathematics assessment framework(OECD, 2013) and the log-file databases for released computerized mathematics items, four mathematical content knowledge dimensions were assessed, namely,(θ1) change and relationships,(θ2) quantity,(θ3) space and shape, and(θ4) uncertainty and data. The test structure was a between-item multidimensional structure(Adams et al., 1997). To evaluate the advantages of introducing the information of RT(or the consequences of ignoring the information of RT) in the analysis, the JRT-MRM and the MRM were used to fit the data.For item parameters, the correlation between the estimated item intercept/easiness parameters of two models was 0.9997. In the JRT-MRM, the estimated item time-intensity parameters were ranged from 3.740 to 4.779. More importantly, the standard errors(the standard deviation of the posterior distribution) of the estimated item intercept/easiness parameters of the JRT-MRM were generally smaller than those of the MRM, which meant considering RT in the analysis would lead to a more precise estimation of item parameters. In the JRT-MRM, the estimated correlation among item intercept/easiness parameters and time-intensity parameters was –.422, which was consistent with previous studies that the more difficult items needed more time to be solved(e.g., Fox & Marianti, 2016; van der Linden, 2006; 2007). In addition, for person parameters, the correlation between the each estimated latent ability of two models was.989,.997,.985, and.953, respectively. In the JRT-MRM, the estimated person speed parameters were ranged from –.913 to 2.910. The estimated correlation between θ1 and person speed was –.351, between θ2 and person speed was –245, between θ3 and person speed was –.365, and between θ4 and person speed was –487, which meant moderate negative correlations existed between the multidimensional abilities and the person speed parameter. Although this result was not consistent with common sense that the more able respondents tended to work faster, some studies also have reported a negative correlation between the ability and speed parameters(e.g., Klein Entink, Fox et al., 2009; van der Linden & Fox, 2015). As a low-stakes test, PISA has limited for individual respondents(Huff & Goodman, 2007). Thus, a reasonable explanation could be that low ability respondents lacked motivation in taking the test(Wise & Kong, 2005), which led to shorter RT and a greater number of incorrect responses than high ability respondents.Overall, the proposed JRT-MRM worked well in real data analysis and implemented the analysis of RT data. The results indicated that incorporating RT in the multidimensional Rasch model would result in more accurate estimation of the model parameters and provide a chance and condition to data-analysts to using RT information to make further decisions and interventions.
引文
郭磊,尚鹏丽,夏凌翔.(2017).心理与教育测验中反应时模型应用的优势与举例.心理科学进展,25,701-712.
    孟祥斌.(2016).项目反应时间的对数偏正态模型.心理科学,39,727-734.
    詹沛达.(2017).多维对数正态作答时间模型--对潜在速度多维性的探究.第三届中国基础教育质量监测与评价学术年会暨博士生论坛,北京.
    Adams,R.J.,Wilson,M.,&Wang,W.C.(1997).The multidimensional random coefficients multinomial logit model.Applied Psychological Measurement,21,1-23.
    Bolsinova,M.,&Maris,G.(2016).A test for conditional independence between response time and accuracy.British Journal of Mathematical and Statistical Psychology,69,62-79.
    Brooks,S.P.,&Gelman,A.(1998).General methods for monitoring convergence of iterative simulations.Journal of Computational and Graphical Statistics,7,434-455.
    Dennis,I.,&Evans,J.St.B.T.(1996).The speed-error trade-off problem in psychometric testing.British Journal of Psychology,87,105-129.
    Fox,J.P.,&Marianti,S.(2016).Joint modeling of ability and differential speed using responses and response times.Multivariate Behavioral Research,51,540-553.
    Klein Entink,R.H.,Fox,J.P.,&van der Linden,W.J.(2009).A multivariate multilevel approach to the modeling of accuracy and speed of test takers.Psychometrika,74,21-48.
    Klein Entink,R.H.,van der Linden,W.J.,&Fox,J.P.(2009).A Box-Cox normal model for response times.British Journal of Mathematical and Statistical Psychology,62,621-640.
    Logan,S.,Medford,E.,&Hughes,N.(2011).The importance of intrinsic motivation for high and low ability readers'reading comprehension performance.Learning and Individual Differences,21,124-128.
    Luce,R.D.(1986).Response times:Their role in inferring elementary mental organization.New York:Oxford University Press.
    Man,K.W.,Jiao,H.,Zhan,P.D.,&Huang,C.Y.(2017).A conditional joint modeling approach for compensatory multidimensional item response model and response times.Paper presented at the annual meeting of the Modern Modeling Methods Conference,Storrs.
    Meng,X.B.,Tao,J.,&Chang,H.H.(2015).A conditional joint modeling approach for locally dependent item responses and response times.Journal of Educational Measurement,52,1-27.
    Molenaar,D.,Tuerlinckx,F.,&van der Maas,H.L.J.(2015).A generalized linear factor model approach to the hierarchical framework for responses and response times.British Journal of Mathematical and Statistical Psychology,68,197-219.
    OECD.(2013).PISA 2012 assessment and analytical framework:Mathematics,reading,science,problem solving and financial literacy.Paris:OECDPublishing.
    OECD.(2014).PISA 2012 technical report.Paris:OECD Publishing.
    OECD.(2016).PISA 2015 assessment and analytical framework:Science,reading,mathematic and financial literacy.Paris:OECD Publishing.
    Qian,H.,Staniewska,D.,Reckase,M.,&Woo,A.(2016).Using response time to detect item preknowledge in computer-based licensure examinations.Educational Measurement:Issues and Practice,35,38-47.
    Reckase,M.D.(2009).Multidimensional item response theory.New York:Springer.
    Tatsuoka,K.K.(1983).Rule space:An approach for dealing with misconceptions based on item response theory.Journal of Educational Measurement,20,345-354.
    van der Linden,W.J.(2006).A lognormal model for response times on test items.Journal of Educational and Behavioral Statistics,31,181-204.
    van der Linden,W.J.(2007).A hierarchical framework for modeling speed and accuracy on test items.Psychometrika,72,287-308.
    van der Linden,W.J.(2009).Conceptual issues in response-time modeling.Journal of Educational Measurement,46,247-272.
    van der Linden,W.J.,&Fox,J.P.(2016).Joint hierarchical modeling of responses and response times.In W.J.van der Linden(Ed.),Handbook of item response theory:Models(pp.481-502).Boca Raton,FL:Chapman&Hall/CRC.
    Wang,C.,Chang,H.H.,&Douglas,J.A.(2013).The linear transformation model with frailties for the analysis of item response times.British Journal of Mathematical and Statistical Psychology,66,144-168.
    Wang,C.,&Xu,G.J.(2015).A mixture hierarchical model for response times and response accuracy.British Journal of Mathematical and Statistical Psychology,68,456-477.
    Wang,T.Y.,&Hanson,B.A.(2005).Development and calibration of an item response model that incorporates response time.Applied Psychological Measurement,29,323-339.
    Wise,S.L.,&Kong,X.J.(2005).Response time effort:A new measure of examinee motivation in computer-based tests.Applied Measurement in Education,18,163-183.
    Zhan,P.D.,Jiao,H.,&Liao,D.D.(2018).Cognitive diagnosis modelling incorporating item response times.British Journal of Mathematical and Statistical Psychology,71,262-286.
    Zhan,P.,Jiao,H.,Wang,W.C.,&Man,K.(2018).A multidimensional hierarchical framework for modeling speed and ability in computer-based multidimensional tests.URL https://arxiv.org/abs/1807.04003
    Zhan,P.,Liao,M.&Bian,Y.(2018).Joint testlet cognitive diagnosis modeling for paired local item dependence in response times and response accuracy.Frontiers in Psychology,9,607.