基于论证的我国高考开发质量评价模型研究

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

基于论证的我国高考开发质量评价模型研究

详细信息本馆镜像全文| 推荐本文 | | 获取CNKI官网全文

英文题名：The Studies on the Evaluative Argument-based Model of CEEC Development Quality
副题名：以2010年上海高考政治开发为例
英文副题名：Take the Year 2010 ESST Development in Shanghai as an Example
作者：周群
论文级别：博士
学科专业名称：教育经济与管理
中文关键词：我国高考 ; 考试开发 ; 学科认知结构 ; 基于论证评价模型
英文关键词：The College Entrance Examinations of China (CEEC) ; The test development ; subject-specific cognitive construct ; The argument-based evaluative model
学位年度：2011
导师：陈玉琨
学科代码：120403
学位授予单位：华东师范大学
论文提交日期：2011-09-01

摘要

经验命题是我国高考命题质量问题的根源。近几年,考试界倾向于以量化技术取代经验,题库建设也从理论研究走向实践,应和了这种发展趋势。
     本文认为,考试开发缺乏教育测量学理论指导,才是影响我国高考开发质量的最大症结。长期以来,高考开发中存在考试设计注重行政化程序、考试命题忽视学科认知结构、考试评价以学科为中心,考试开发以“命题”偏盖“开发”等问题,这些问题的产生并非由于开发中缺少试测环节、数据分析等质量控制技术手段,而是源于考试开发者过份依赖个人主观意志,缺少对教育测量学规律的把握。
     当前,对经验的褒贬取舍不是解决问题的关键,关键在于建立一支具有教育测量学经验的考试开发者队伍,建立基于教育测量学的考试开发标准化流程、教育考试行业标准和考试评价体系。本文旨在构建一种适合于我国高考实际的考试评价模型,设想以评价反思经验,改进考试开发质量,实现从经验命题向以教育测量学为基础的考试开发的转变。
     本文分绪论、正文和结语三个部分。
     绪论部分通过对国内外考试开发比较研究,将我国高考开发中存在的问题概括为四个方面,以此为缘起,提出考试质量评价模型应当符合完整性、统一性和反思性等要求。通过进一步的文献探究,发现Kane的基于论证效度检验(argument-based validation)理论与模型寻求的理论需求相契合。
     以上述内容为基础,形成研究方案,第一步,提出考试开发质量基于论证评价的理论框架；第二步,应用理论框架,以2010年上海高考政治开发为例,对考试开发进行效度评价,在评价过程中构建模型；第三步,确立我国高考开发质量基于论证的评价模型。
     正文第一章提出解释论证。解释论证主要阐释考试分数解释与预设解释相一致所需要的一系列推断、假设及其证据。首先围绕考试分数的意义,论证2010年上海高考政治考试分数预设解释,并勾勒解释论证思路。其次以为什么要收集这些证据为主题,根据考试开发的逻辑线索,论证考试设计、评分、概化、外延、内涵及解释论证合理性等六个推断成立所需要满足的13个假设以及30个证据。根据理论框架,解释论证有一个从提出论证、证明论证合理、修改论证,直至论证合理性得到证明的收敛过程。
     正文第二、三章论述效度论证。效度论证是收集证据的过程,主要回答如何收集证据?证据能否证明假设?根据理论框架,论证分两个阶段完成。第一阶段收集解释论证合理性证据；第二阶段收集解释论证成立证据。证据收集采用了定性和定量相结合的方法。定性分析方法有,专家咨询法、文本比较分析法、问卷调查法、资料分析法；定量分析方法有,选项功能分析法、主观题评分规则分数等级分析法、评分误差研究法、CTT、G理论和IRT信度和测量标准误差研究法、DIF和DBF探测法、相关分析法、因子分析法。运用的统计软件包括SPSS、Parscale、Winsteps、Multilog、mGENOVA、DIF PACK等。
     正文第四章形成评价结论,包括考试开发质量和反思两个方面。评价质量的结论是解释论证的逻辑结果,围绕证据最终证明了什么,对实际考试分数能否且何种程度上解释为预设解释,对考试分数的误差来源作出终结性和诊断性评价。证据表明,2010年上海高考政治解释论证成立,80%考试分数方差可以解释为考生学科认知结构水平,20%误差主要由偏难偏易试题、选择题选项质量、主观题分数等级设计、全卷试题布局等问题造成。总体上,低端水平考生的考试分数被高估。反思性评价结论是对误差来源的反思结果。本文从考试设计和考试命题两个方面进行反思：提出从属性关系描述学科认知结构内涵,强调学科认知结构的各内容领域认知结构有机组成对考试设计的影响；从试题情景、设问、选项、评分方法及其评分规则等角度,提出命题技术改进建议。
     正文第五章以模型的确立作为研究成果。评价过程表明,本文提出的考试开发质量基于论证评价的理论框架适合于我国高考实践,抓住了我国高考开发质量控制的主要环节及其影响因素,基本框架符合完整性、统一性和反思性要求。模型由解释论证、效度论证和评价结论三个模块组成,解释论证以考试开发过程为逻辑线索；解释论证和效度论证具有互动关系；推断之间、假设之间和证据之间具有递推关系；解释论证是评价结论的逻辑主干；评价结论具有质量评价和开发反思双重特性。
     结语部分围绕模型和学科认知结构概念,提出了未来研究的两个方向,即运用模型控制考试开发过程的质量；运用学科认知结构模型数学表达方法设计命题蓝图
     本文创新之处在于：
     (1)突破传统效度检验框架,构建基于论证的我国高考开发质量评价模型,并提供一部完整的实证研究文本；
     (2)突破Kane考试分数解释四个推断框架,根据我国高考实践,提出解释论证中阐述考试分数预设解释和增加考试设计推断的必要；
     (3)以教育测量学视角详细阐述考试分数解释中每个推断成立需要的假设及其证据,系统论证证据与考试分数解释的联系；
     (4)整合国内外考试效度研究的各种定性和定量研究方法,构建适合于我国高考评价的效度论证方法体系；
     (5)借鉴认知科学理论“领域特殊性认知结构”,提出学科认知结构概念,并注意到学科认知结构内涵的不同描述方法对考试开发和评价的影响。
The reality that the College Entrance Examination in China (CEEC) faces is that test constructors mainly relies on their accumulated experiences to develop the tests, which is to blame for CEEC quality problems. In recent years, it is suggested that the tests development should be changed from mainly based on the experiences to quantitative techniques. The recent practice that some testing agencies begin to build the item banks is considered a response to this tendency.
     The author of the dissertation, however, believes that a lack of the guidance of educational measurement theory is the true crux that should be held accountable for CEEC development quality problem. For a long time, the CEEC development focuses on the administrative procedures and the test evaluation centers on the subject knowledge, neglecting the subject-specific cognitive construct. The origin of these problems is not mainly because the tests development lacks the means of quality control such as trial test, and its data analysis etc., but because it neglects the impacts of the educational measurement and excessively relies on the test developers' individual experiences.
     In this situation, the key to the tests development problems is to cultivate the test developers with the knowledge of educational measurement, establish the standards of the educational tests and the standard procedures of the test development based on the educational measurement, and build an evaluative system of tests. The purpose of this dissertation is to build an evaluative model of the educational tests adaptable to the CEEC development practice, to rethink the experiences through the evaluation and improve the tests development. This dissertation is divided into three parts, the introduction, the text and the conclusive remarks.
     The introduction summarizes four main problems with CEEC development, through comparing the procedures in the test development both in China and abroad. The author considers that the test quality evaluative model should be of the integrated, unified and reflective features and finds out that Kane's theory on argument-based validation is similar to the ideas and requirements of the proposed evaluative model. On the basis of the above-mentioned, a research program is formed. First, the research framework of the argument-based test evaluation is proposed. Second, the framework is applied to evaluate the Essential Social Science Test (ESST) for the year 2010 CEEC in Shanghai. Third, the argument-based evaluative model of CEEC development quality is established, after summarizing the studies on the research framework.
     The first chapter of the text is on the interpretive argument. It mainly explains in theory all the assumptions and correspondent evidences that the identity between the interpretations of the real test result and the predictive interpretations requires. First, this chapter discusses the score predictive interpretations of the Essential Social Science Test for the year 2010 CEEC in Shanghai according to the meanings of test score, and briefly describes the outline of interpretive argument. Second, it explains the six important inferences about the procedures of the test development as well as all the 13 assumptions and 30 pieces of the correspondent evidences supporting the inferences. The explanation for the interpretive argument is a repeated process. It experiences the process of the proposal of the interpretive argument, the evaluation, the revision and the evaluation again, until the evaluators consider the argument plausible.
     The next two chapter of the text is on the validity argument. The validity argument is the process of collecting all evidences to support the interpretation of the tests score. It mainly discusses how to collect the evidences and if the evidences support the correspondent assumptions. According to the research framework, the evidence'collecting is divided into two phases. In the first phase, the evidences for the plausibility of interpretive argument are collected, and in the second phase, the evidences for the tenability of interpretive argument are collected. Quantitative and qualitative methods are used to collect the evidences. The qualitative method includes experts consulting, file comparison and analysis, questionnaire surveys and data analysis. The quantitative method includes choice function analysis for objective items, rank analysis of score rubrics for subjective items, classic true score theory, multivariable generalization theory, item response theory, item and bundle functioning differences, correlation analysis and factor analysis etc. The statistical software such as SPSS, PARSCALE, Winsteps, Multilog, mGENOVA and DIF PACK etc are used in the evidences' collecting process.
     The fourth chapter of the text is on the evaluative conclusion. It gives the quality judgment of and the reflections on the test development. The evaluative conclusion on the test development quality is the logic results of the interpretive argument. It briefly explains if all the evidences support the interpretive argument on the test and gives the diagnostic conclusion on the origins of the score errors. The evidences shows that the interpretive argument for the Essential Social Science Test of the year 2010 CEEC is tenable, that 80 percent of the test score variability can be attributed to the test takers'subject-specific cognitive construct, and that 20 percent of score variability mainly result from the deviations of items difficulties, low quality of choices in some objective items, the unreasonable score levels designed for some subjective items as well as the items distributions among the different content domain and cognitive types. Overall, the scores of some test takers with low competence are a little bit overestimated. As the results of rethinking on the origins of the score errors, the reflective conclusions indicate that the test syllabus should describe the subject-specific cognitive construct in accordance with the content domains of the subject, combining the subject knowledge and cognitive skills instead of describing it separately from the two perspectives and that the CEEC tests should improve their qualities in choosing item stimulus, making questions, designing choices of objective items, making scoring rubrics etc. The fifth chapter is on the building of the evaluative model. The studies show that the suggested argument-based evaluative framework on CEEC development quality is suitable to the CEEC development practice, focusing on the main procedures for quality control of the test development and being of integrated, unified and reflective features. Therefore, we can build the Argument-based Evaluative model on the CEEC development quality on the basis of the evaluative framework and the evaluative practice of the year 2010 Essential Social Science Test of CEEC. The evaluative model consists of three modules, the interpretive argument, validity argument and evaluative conclusion. The interpretive argument explains the relations among the inferences, assumptions and evidences of the test development. It interplays with the validity argument that means they are affected and dependent on each other. There are the recursive relations among the inference, assumptions and evidences. The interpretive argument makes up the logic frameworks of the evaluative conclusion, and the evaluative conclusion have the dual nature of quality evaluation and reflection of the test development.
     In the concluding remarks, in terms of the evaluative model and the concept of subject-specific cognitive construct, the author proposes two fields for further explorations, namely, to apply the model to the quality control of the tests development of CEEC and to design tests development blueprint with the mathematic interpretation of subject-specific cognitive construct model.
     This dissertation makes an innovative contribution in five aspects to the educational evaluation.
     (1) It breaks through the current popular validation framework and establishes the argument-based evaluative model of CEEC development quality with the first version of the complete empirical research results.
     (2) Breaking through Kane's framework of inferences in the interpretation of test score, it proposes that the test design inference should be included in the educational test score interpretation based on the CEEC development practice.
     (3) It explains all the assumptions and corresponding evidences needed to prove the inferences tenable in details in perspective of the educational measurement and systematically presents the interconnections between all the evidences and the interpretation of the test score.
     (4) It establishes the method systems of validity argument adaptable to the CEEC evaluation, integrating the qualitative and quantitative methods of validation in both home and abroad.
     (5) It defines the concept of "subject-specific cognitive construct", using the concept of "domain-specific cognitive structure" for reference and point out the impact of the difference between the descriptions of "subject-specific cognitive construct" on the test development and evaluation.

引文

10杨学为.高考文献(1977-1999)上[M].北京：高等教育出版社,2003.580
    14教育部.2010年普通高等学校招生工作规定.[EB/OL]http://gaokao.chsi.com.cn/gkxx/ss/201003/20100326/68361300.html,2010-03-26/2010-08-15.
    15参考：ACT, Inc. TECHNICAL MANUAL[EB/OL]. http://www.act.org/aap/pdf/ACT_Technical_Manual.pdf, 2007/2010-07-23.
    16该词英语原文为the proposed interpretation,燕娓琴、谢小庆译作“测验分数的推荐性诠释”,为与本文提出的基于论证的评价模型中解释论证提法相一致,本文译作考试分数预设解释。
    23陈玉琨.教育评价学[M].上海：华东师范大学出版社,2005.7.
    24[美]吉尔伯特·萨克斯.教育和心理的测量与评价原理[Ml.王昌海等译.南京：江苏教育出版社,2002.327.
    25 Kane M. T. "Validation". In:R. L. Brennan (ed.). Educational Measurement (Fourth Ed.) [M]. Praeger: American Council on Education,2006.22.
    26[美]吉尔伯特·萨克斯.教育和心理的测量与评价原理[M].王昌海等译.南京：江苏教育出版社,2002.327.
    27参考：[美]美国教育研究协会、美国心理学协会、全美教育测量学会.教育与心理测试标准[M].燕娓琴,谢小庆译.沈阳：沈阳出版社,2003.12.
    28[美]美国教育研究协会、美国心理学协会、全美教育测量学会.教育与心理测试标准[M].燕娓琴,谢小庆译.沈阳：沈阳出版社.2003.13.
    29 Kane M. T. "Validation". In:R. L. Brennan (ed.). Educational Measurement (Fourth Ed.) [M]. Praeger: American Council on Educatio,2006.27.
    30参考：[美]美国教育研究协会、美国心理学协会、全美教育测量学会.教育与心理测试标准[M].燕娓琴,谢小庆译.沈阳：沈阳出版社,2003.12.
    44杨惠中、C.Weir.大学英语四、六级考试效度研究[M].上海：上海外语教育出版社,1998.
    45上海外国语大学TEM考试中心.TEM考试效度研究[M].上海：上海外语教育出版社,1998.
    46吴根洲.高考效度问题研究[J].教育学术月刊,2010,5：25-29.刘明珠.对我国高考效度研究的思考[J].教育与考试,2010,5：19-22.吴根洲.高考效度研究中效标的选择与质量[J].湖北招生考试,2010,6：34-36.吴根洲.高考效度研究文献述评[J].教育测量与评价(理论版),2009,2：49-51
    47吴根洲.2007.高考效度问题研究：[博士论文].福建：厦门大学
    48邹申.考试效度研究互动性——再论TEM4阅读测试项目的有效性：[博士论文].上海：上海外国语大学,2006.
    49邹申.考试效度研究互动性——再论TEM4阅读测试项目的有效性：[博士论文].上海：上海外国语大学,2006.
    50雷新勇.2004高考(上海卷)地理科考试评价[J].考试研究.2005.1：56-71
    51[美]美国教育研究协会、美国心理学协会、全美教育测量学会.教育与心理测试标准[M].燕娓琴,谢小庆译.沈阳：沈阳出版社,2003.23.
    52 Michael D. Guerrero. The unified validity of the Four Skills Exam:applying Messick's framework[J]. Language Testing,2000,179(4):397-421.
    53 Lee,Young-Ju. Construct validation of an integrated, process-oriented, and computerized English for academic purposes (EAP) placement test:A mixed method approach:[PhD dissertation]. University of Illinois at Urbana-Champaign,2005.
    54 Mircea-Pines, Walter J. An Examination of Reliability and Validity Claims of a Foreign Language Proficiency Test:[PhD dissertation]. George Mason University,2009.
    55 S. Brian Hood. Validity in Psychological Testing and Scientific Realism[J]. Theory & Psychology,2009,19(4): 451-473.
    56[美]吉尔伯特萨克斯.教育和心理的测量与评价原理[M].王昌海等译.南京：江苏教育出版社,2002.327.
    57 Kane M. T. "Validation". In:R. L. Brennan (ed.). Educational Measurement (Fourth Ed.) [M]. Praeger: American Council on Education.2006.22.
    58 Kane M. T. "Validation". In:R. L. Brennan (ed.). Educational Measurement (Fourth Ed.) [M]. Praeger: American Council on Education.2006.22.
    59 Krogstad, Finn. Evaluating the validity of research implications:[PhD dissertation]. University of Washington,2007.29.
    60 Tanner LeBaron Wallace. An argument-based approach to validity[J]. Evaluation,2011,17(3):233-246.
    61 Micheal E.Woolley, Gary L. Bowen & Natasha K. Bowen. An evaluative argument-based investigation of validity evidence for the Utah pre-algebra criterion-referenced test[J]. Research on Social Work Practice,2004,4 (3).191-200.
    62 Moulding. Louise Richards. An evaluative argument-based investigation of validity evidence for the Utah pre-algebra criterion-referenced test:[PhD dissertation]. Utah State University,2001.
    63 Stephen G. Schilling & Heather C. Hill. Assessing Measures of Mathematical Knowledge for Teaching:A Validity Argument Approach[J]. Measurement:Interdisciplinary Research & Perspective,2007,5:2,70-80.
    64 Pamela A Moss. Reconstructiong Validity[J]. Educational Researcher,2007,36:470-476.
    65 Norman L. Webb. Criteria for Alignment of Expectations and Assessments in Mathematics and Science Education[A]. In:NISE Research Monograph[C],2007.
    66 Daniel M. Koretz & Laura S. Hamilton. "Testing for Accountability in K-12". In R. L. Brennan (ed.). Educational Measurement (Fourth Ed.) [M]. Praeger:American Council on Education,2006.534.
    67一致性分析主要由CCSSO (Council of Chief State School Officers)承担,以CCSSO为代表。这里综述的三个模型是该委员会在大规模教育考试中使用的三个模型。资料来源：Ronald A. Berk. Test Alignment and Balancing with State Standards:Every 6 Months or 50 Items, Whichever Comes First. [EB/OL]. http://www.pearsonassessments.com/hai/images/NES_Publications/2005_03Berk_488_1.pdf,2011-09-20.
    68 Andrew C. Porter. Measuring the Content of Instruction:Uses in Research and Practice[J]. EDUCATIONAL RESEARCHER,2002,31(70:3-14 Gavin W. Fulmer. Estimating Critical Values for Strength of Alignment among Curriculum, Assessments, and Instruction[R]. Paper present at the 2010 annual meeting of the American Educational Research Association.
    69 Norman L. Webb. Issues Related to Judging the Alignment of Curriculum Standards and Assessment[R]. Annual Meeting of the American Educational Research Association Meeting, Montreal, Apria 11,2005.
    71 Karvonen, Meagan. Validity evidence for alternate assessment based on analysis of Individualized Education Programs and curriculum alignment:[PhD dissertation]. University of South Carolina,2004.
    72 Jimarez, Teresa. Does alignment of constructivist teaching, curriculum, and assessment strategies promote meaningful learning?:[PhD dissertation]. New Mexico State University,2005.
    73 Martone, Andrea. Exploring the impact of teachers' participation in an assessment-standards alignment study: [PhD dissertation]. New Mexico State University,2007.
    74刘学智.论评价与课程标准一致性建构：美国经验[J].全球教育展望,2006,35(9)35-39.
    75刘学智,马云鹏.美国"SEC"一致性分析范式的诠释与启示——基础教育中评价与课程标准一致性视角[J].比较教育,2007,5：64-68.
    76刘学智、张雷.学业评价与课程标准的一致性：韦伯模式本土化探究[J].外国教育研究.2009.36(12)：13-17.
    77范立双、刘学智.美国“成功分析模式”的诠释与启示——学业评价与课程标准一致性的视角[J].比较教育研究.2010.8：77-80.
    78刘学智.小学数学学业评价与课程标准一致性的研究：[博士论文].吉林：东北师范大学,2008.
    79雷新勇.基于标准的教育改革背景下学业水平考试内容改革的思考[J].上海教育科研,2008,8：12-16.
    80雷新勇、周群.从基于标准的基础教育改革的视角审视课程标准和学业水平考试[J].考试研究,2009,5(1)：46-51.
    81杨向东、张晓蕾.课程标准的开发与基于标准的学业水平考试的设计：美国的经验与启示[J].考试研究,2010,6(1)：109-125.
    82刘恩山、卢群、张颖之.2010年高考生物试卷与课程标准一致性分析[J].基础教育课程,2010,9：61-69.
    83陆葆谦、王祖浩.化学课程实施的“一致性”分析[M].上海：上海科学基础出版社,2010.
    84王少非.考试质量标准的重构：美国的经验[J].外国教育研究,2009,36(9)：16-21.
    85王少非.校内考试质量标准：一个框架构想[J].教育科学研究,2009,10：44-47.
    86教育部.普通高等学校招生全国统一考试标准化实施规划.见：杨学为.高考文献(1977-1999)下[M].北京：高等教育出版社,2003.418.
    87杨学为高考文献(1977-1999)下[M].北京：高等教育出版社,2003.418.
    88教育部.国家中长期教育改革和发展规划纲要(2010-2020年)[EB/OL].http://www.moe.edu.cn/edoas/websitel8/30/info1280446539090830.htm,2010-07-29/2010-08-15.
    89教育部.普通高等学校招生全国统一考试分省命题工作暂行管理办法.[EB/OL].http://ktjx.cersp.cn/zywj/jybwj/200604/5305.html,2006/2011-07-23.
    90参考：教育部.关于做好高等学校自主选拔录取改革试点工作的通知.[EB/OL].http://news.xinhuanet.com/edu/2006-03/29/content_4359658.htm,2003/2010-03-23.
    91参考：80高校自主招生教育部称与高考不存在替代关系.[EB/OL].http://www.jyb.cn/gk/zhzs/201103/t20110328_422181.html,2011/2011-07-23.
    92 The College board. Test Development[EB/OL]. http://professionals.collegeboard.com/higher-ed/recruitment/sat-test/dev,2010/2010-08-18.
    93 Steven M. Downing & Thomas M.Haladyna(ed). Handbook of Test Development[M]. London:Lawrence Erlbaum Associates. Inc.,2006.5.
    94 Cynthia B. Schmeiser & Catherine J. Welch. "Test development". In:R. L. Brennan (ed.). Educational Measurement (Fourth Ed.) [M]. Praeger:American Council on Education,2006.308-351.
    95[美]J.H·弗拉维尔,[美]P·H.米勒,[美]S·A·米勒.认知发展.第四版[M].邓锡平译.上海：华东师范大学出版社,2002.453-455.
    96邓锡平.儿童认知发展研究的沿革与新发展.见：[美]J·H·弗拉维尔,[美]P·H·米勒,[美]S·A·米勒.认知发展[M].邓锡平译.上海：华东师范大学出版社,2002.序22-24.
    97[美]M·P·德里斯克尔.学习心理学——面向教学的取向.王小明译.上海：华东师范大学出版社,2007.214.
    98参考：皮连生.当代心理科学与学校教育相结合的经典(代中译本序).见：[美]R·M·加涅.学习的条件和教学论.皮连生译.上海：华东师范大学出版社,1999.序10.
    99参考：[英]M·W·艾森克,[爱尔兰]M·T·基恩.认知心理学：第4版[Ml.高定国、肖晓云译.上海：华东师范大学出版社,2003.645参考：[美]L·W·安德森,[美]L·A·索斯尼克.布卢姆教育目标分类学——40年的回顾[M].谭晓玉袁文辉译.上海：华东师范大学出版社,1998.74
    100参考：[美]L·W·安德森.学习、教学和评估的分类学[M].皮连生主译.上海：华东师范大学出版社,2007.77参考：[美]班杜拉.思想和行动的社会基础：社会认知论[M].上海：华东师范大学出版社,2001.684.
    101参考：[美]L·W·安德森,[美]L·A·索斯尼克.布卢姆教育目标分类学——40年的回顾[M].谭晓玉、袁文辉译.上海：华东师范大学出版社,1998.74
    102[美]乔治·J·波斯纳.课程分析(第三版)[Ml.仇光鹏等译.上海：华东师范大学出版社,2007.116.
    103参考：[美]博里奇(Borich, G, D.)[美]汤伯里(Tombari, M, L.)中小学教育评价[Ml.国家基础教育课程改革“促进教师发展与学生成长的评价研究”项目组译.北京：中国轻工业出版社,2004.77参考：[美]奥恩斯坦(Ornstein,A.C.)等.当代课程问题Contemporary Issues Curriculum[M]影印本.北京：中国轻工业出版社,2004.3-9
    104[美]阿尼塔·伍德沃克.教育心理学：第八版[Ml.陈红兵等译.南京：江苏教育出版社,2005.381.
    105参考：吴庆麟.教育心理学：献给教师的书[M].上海：华东师范大学出版社,2003.196-215.
    106[美]伍德沃克.教育心理学(第八版)[M].陈红兵译.南京：江苏教育出版社,2005.381.
    107[美]莱斯利·P·斯特弗等.教育中的建构主义fMl.上海：华东师范大学出版社,2002.12.
    108[美]美国教育研究协会、美国心理学协会、全美教育测量学会.教育与心理测试标准[M].燕娓琴,谢小庆译.沈阳：沈阳出版社.2003.8.
    109 Robert J. Misley. "Cognitive Psychology and Educational Assessment". In:R. L. Brennan (ed.). Educational Measurement (Fourth Ed.)[M]. Praeger:American Council on Education,2006.269
    110参考：Rebecca Zwick. "Higher Education Admissions Testing". In:R. L. Brennan (ed.). Educational Measurement (Fourth Ed.)[M]. Praeger:American Council on Education,2006.652-653
    111 The College board. The SAT Program handbook [EB/OL]. http://www.collegeboard.com/prod_downloads/sat/sat-program-handbook.pdf,2009-10-01/2010-07-20.
    112 Richard C. Atkinson. Achievement versus Aptitude Tests in College Admissions. [EB/OL]. http://www.ucop.edu/pres/speeches/achieve.htm,2001/2009-09-12.
    113 The College board. The SAT Program handbook [EB/OL]. http://www.collegeboard.com/prod_downloads/sat/sat-program-handbook.pdf,2009/2010-07-20.
    114[加拿大]戴斯等.认知过程的评估：智力的PASS理论[M].杨艳云、谭和平译.上海：华东师范大学出版社,1999.
    115辞海编辑委员会.辞海[M].上海：上海辞书出版社,2010.1221.
    116[瑞典]胡森(Husen.T.)、[德]特尔威斯特(Thwaite.N.P.)教育测量与评价(简明国际教育百科全书)[M].许建钺等编译.北京：教育科学出版社,1992.347
    117参考：漆树清、戴海崎、丁树良.现代教育与心理测量学m].江西：江西教育出版社,1998.363-365参考：[美]麦坚泰(McIntire,S.A.),[美]米勒(Miller,L.A.)心理测量学[M].骆方、孙晓敏译.北京：中国轻工业出版社,2009.234-236
    118参考：Lee J Cronbach & Paul E. Meehl. Construct validity in psychological tests[J]. Psychological Bulletin, 1955,52:281-302. [EB/OL]. http://psychclassics.yorku.ca/Cronbach/construct.htm,2009-09-01.
    119漆树清、戴海崎、丁树良.现代教育与心理测量学[M].江西：江西教育出版社,1998.363.
    120参考：Lee J Cronbach & Paul E. Meehl. Construct validity in psychological tests[J]. Psychological Bulletin, 1955,52:281-302. [EB/OL]. http://psychclassics.yorku.ca/Cronbach/construct.htm,2009-09-01.
    121参考：Lee J Cronbach & Paul E. Meehl. Construct validity in psychological tests[J]. Psychological Bulletin, 1955,52:281-302. [EB/OL]. http://psychclassics.yorku.ca/Cronbach/construct.htm,2009-09-01.
    122 Osterlind, S. J. Constructing test items:Multiple-choice, constructed-response, performance, and other formats[M]. New York:Kluwer Academic,1998.
    123上海市教育考试院编.全国普通高等学校招生统一考试上海卷考试手册[M].上海：上海古籍出版社,2009.3.
    124上海市教育委员会.上海市教育委员会关于实施上海市普通高中学业水平考试的通知.沪教委基[2009]53号.
    125杨向东.教育测量在教育评价中的角色[J].全球教育展望,2007,11：15-25.
    126杨向东.教育测量在教育评价中的角色[J].全球教育展望,2007,11：15-25.
    127[瑞典]胡森(Husen.T.)[德]特尔威斯特(Thwaite.N.P.)教育测量与评价(简明国际教育百科全书)[M].许建钺等编译.北京：教育科学出版社,1992.14.
    128 Lee J Cronbach & Paul E. Meehl. Construct validity in psychological tests[J]. Psychological Bulletin,1955,52: 281-302. [EB/OL]. http://psychclassics.yorku.ca/Cronbach/construct.htm,2009-09-01.
    129陈玉琨.教育评价学[M].北京：人民教育出版社,1999.34.
    130陈玉琨.教育评价学[M].北京：人民教育出版社,1999.37.
    138参考：[美]L·W·安德森.学习、教学和评估的分类学[M].皮连生主译.上海：华东师范大学出版社, 2007.序9,前言16
    139[美]B·S·布卢姆.教育目标分类学第一分册认知领域[M].罗黎辉、丁证霖、石伟平、顾建明译,上海：华东师范大学出版社.1986.8.
    140参考：[美]B·S·布卢姆.教育目标分类学第一分册认知领域[M].罗黎辉、丁证霖、石伟平、顾建明译.上海：华东师范大学出版社,1986.7.
    141参考：[美]L·w·安德森,[美]L·A·索斯尼克.布卢姆教育目标分类学——40年的回顾[M].谭晓玉、袁文辉译.上海：华东师范大学出版社,1998.53-75
    142参考：[美]L·w·安德森.学习、教学和评估的分类学[M].皮连生主译.上海：华东师范大学出版社,2007.13.
    143参考：[美]L·W·安德森.学习、教学和评估的分类学[M].皮连生主译.上海：华东师范大学出版社,2007.前言15-16.
    144参考：[美]B·S·布卢姆.教育目标分类学第一分册认知领域[M].罗黎辉、丁证霖、石伟平、顾建明译.上海：华东师范大学出版社,1986.13-14.
    145[美]美国教育研究协会、美国心理学协会、全美教育测量学会.教育与心理测试标准[M].燕娓琴,谢小庆译.沈阳：沈阳出版社,2003.25.
    146该词英语原文为validation,燕娓琴、谢小庆译作“效度化”,本文采用“效度检验”之译。
    147[美]美国教育研究协会、美国心理学协会、全美教育测量学会.教育与心理测试标准[M].燕娓琴,谢小庆译.沈阳：沈阳出版社,2003.14.
    148参考：L.克罗克、J.阿尔吉纳.经典和现代测验理论导论[M].金瑜等译.上海：华东师范大学出版社,2004.106-107
    149[美]美国教育研究协会、美国心理学协会、全美教育测量学会.教育与心理测试标准[M].燕娓琴,谢小庆译.沈阳：沈阳出版社,2003.294、298.
    153辞海编辑委员会.辞海[M].上海：上海辞书出版社,2010.1221.
    154辞海编辑委员会.辞海[M].上海：上海辞书出版社,2010.1925.
    155国家教育委员会颁布.普通、成人高等学校本、专科招生全国统一考试工作则.教试[1991]见：杨学为.中国考试史文献集成.第8卷,中华人民共和国[M].北京：高等教育出版社,2003.175-179.
    156上海市教育考试院出版的“考试手册”表述为,我国高考既要有利于高等学校选拔合格新生,又要有利于中学实施素质教育和培养学生的创新精神和实践能力,促进中学教学改革。见：上海市教育考试院.2010年全国普通高等学校招生统一考试上海卷考试手册.上海：上海世纪出版股份有限公司、上海古籍出版社,2009.81.
    157李德顺.价值论[Ml.北京：中国人民大学出版社,2007.237.
    158李德顺.价值论[M].北京：中国人民大学出版社,2007.237.
    159 The College board. Science College Board Standards for College Success [EB/OL] http://professionals.collegeboard.com/profdownload/cbscs-science-standards-2009.pdf,2009/2010-08-10.
    160 ACT, Inc. COLLEGE READINESS STANDARDS For EXPLORE(?), PLAN(?), and the ACT(?) [EB/OL]. http://www.act.org/standard/pdf/CRS.pdf,2010/2010-07-25.
    161 ACT, Inc. Your Guide to the ACT [EB/OL]. http://www.act.org/aap/pdf/YourGuidetoACT.pdf, 2005/2010-07-29.
    162中华人民共和国全国人民代表大会.中华人民共和国教育法[EB/OL].http://www.moe.edu.cn/edoas/websitel8/32/info1432.htm,1995-03-18/2010-08-10.
    163教育部.2010年普通高等学校招生工作规定.[EB/OL].http://gaokao.chsi.com.cn/gkxx/ss/201003/20100326/68361300.html,2010-03-26/2010-08-15.
    164上海市中小学课程教材改革委员会办公室.上海市普通中小学课程方案.见：上海市中小学课程教材改革委员会办公室.上海市中学思想品德和思想政治课程标准(征求意见稿)[M].上海：上海教育出版社,2004.013.
    165参考：[美]朱迪思·A.麦克劳夫林.行为科学统计学入门[M].南京：江苏教育出版社,2005.146.
    184 Statistical Definitions. [EB/OL]. http://professionals.collegeboard.com/data-reports-research/sat/definitions, 7011/2011-08-02.
    185Kane M. T. "Validation". In:R. L. Brennan (ed.). Educational Measurement (Fourth Ed.). Praeger:American Council on Education,2006.38-39.
    188 Shealy R. & W. Stout, A Model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF[J]. Psychometrika,1993,58(2):159-194. Douglas J. A, Roussos L. A. & W. Stout. Item-bundle DIF hypothesis testing:identifying suspect bundles and assessing their differential functioning[J]. Journal of Educational Measurement.1996,33(4):465-484
    189[美]美国教育研究协会、美国心理学协会、全美教育测量学会.教育与心理测试标准[M].燕娓琴、谢小庆译.沈阳：沈阳出版社,2003.18.
    190参考：[美]美国教育研究协会、美国心理学协会、全美教育测量学会.教育与心理测试标准[M].燕娓琴、谢小庆译.沈阳：沈阳出版社,2003.19.
    191参考：[美]B·S·布卢姆.教育目标分类学第一分册认知领域[M].罗黎辉、丁证霖、石伟平、顾建明译.上海：华东师范大学出版社,1986.7,20.
    192[美]乔治·J·波斯纳.课程分析(第三版)[M].仇光鹏等译.上海：华东师范大学出版社,2007.101.
    197雷新勇.大规模教育考试：命题和评价[M].上海：华东师范大学出版社, 2006.160.
    198参考：[美]艾伦·C·奥恩斯坦,费朗西斯·P·汉金斯.课程：基础、原理和问题(第三版)[M].柯森主译.南京：江苏教育出版社,2002.290.
    207[美]美国教育研究协会、美国心理学协会、全美教育测量学会.教育与心理测试标准[M].燕娓琴、谢小庆译.沈阳：沈阳出版社,2003.63.
    208[美]美国教育研究协会、美国心理学协会、全美教育测量学会.教育与心理测试标准[M].燕娓琴、谢小庆译.沈阳：沈阳出版社,2003.63.
    209 Cizek, G. J., & O'Day, D. M. Further investigations of nonfunctioning options in multiple-choice test items[J]. Educational Psychological Measure,1994,54(4):861-872.
    210 Haladyna, T. M., & Downing, S. M. How many options is enough for a multiple-choice test item? [J]. Educational Psychological Measure,1993,53(4),999-1010.
    211 Delgado, A. R., & Prieto, G. Further evidence favoring three-option items in multiple-choice tests[J]. Educational Psychological Measure,1998,14(3):197-201.
    212 Tarrant, M. Ware, J. & Mohammed, A. M. An assessment of functioning and non-functioning distractors in multiple-choice questions:a descriptive analysis. [EB/OL]. http://www.biomedicentral.com/1472-6920/9/40, 2009.
    232杨志明、张雷.测评的概化理论及其应用[M].北京：教育科学出版社,2003.150-158.
    233 Embretson, S. E. & S. P. Reise. Item Response Theory For Psychologist[M]. London:LAWRENCE ERLBAUM ASSOCIATES,2000.184-185.
    234 Bickel, P. etc. On Maximizing Item Information and Matching Difficulty with Ability[J]. Psychometrika,2001, 66:69-77.
    235雷新勇.大规模教育考试：命题与评价[M].上海：华东师范大学出版社,2006.79.
    245[美]马里奥·F.·特里奥拉.初级统计学[M].刘新立译.北京：清华大学出版社,2004.449.
    [1]陈玉琨.赵永年.教育评价[M].北京：人民教育出版社,1989.
    [2]陈玉琨.教育评价学[M].上海：华东师范大学出版社,2005.
    [3]辞海编辑委员会.辞海[M].上海：上海辞书出版社,2010.
    [4]戴海崎、张锋、陈雪枫.心理与教育测量.修订本[M].广州：暨南大学出版社,2007.
    [5]冯平.评价论[M].北京：东方出版社,1995.
    [6]雷新勇.考试数据的统计分析和解释[M].上海：华东师范大学出版社,2007.
    [7]雷新勇.大规模教育考试：命题和评价[M].上海：华东师范大学出版社,2006.
    [8]雷新勇.基于标准的教育考试：命题与评价[M].上海：上海教育科学技术出版社,2011.
    [9]李德顺.价值论[M].北京：中国人民大学出版社,2007.
    [10]陆葆谦、王祖浩.化学课程实施的“一致性”分析[M].上海：上海科学基础出版社,2010.
    [11]漆树清、戴海崎、丁树良.现代教育与心理测量学[M].江西：江西教育出版社,1998.
    [12]漆树清.现代测量理论在考试中的应用[M].武汉：华中师范大学出版社,2003.
    [13]上海市教育考试院.2010年全国普通高等学校招生统一考试上海卷考试手册[M].上海：上海古籍出版社,2009.
    [14]上海市教育委员会.上海市中学思想品德和思想政治课程标准(征求意见稿)[M].上海：上海教育出版社.2005.
    [15]上海外国语大学TEM考试中心.TEM考试效度研究[M].上海：上海外语教育出版社,1998.
    [16]盛群力等.21世纪教育目标新分类[M].浙江：浙江教育出版社,2008.
    [17]吴庆麟.教育心理学：献给教师的书[M].上海：华东师范大学出版社,2003.
    [18]杨惠中、C.Weir.大学英语四、六级考试效度研究[M].上海：上海外语教育出版社,1998.
    [19]杨学为.中国高考史述论(1949-1999)[M].武汉：湖北人民出版社,2007.
    [20]杨学为.高考文献(1977-1999)上[M].北京：高等教育出版社,2003.
    [21]杨学为.中国考试史文献集成.第8卷,中华人民共和国[M].北京：高等教育出版社,2003.
    [22]杨志明、张雷.测评的概化理论及其应用[M].北京：教育科学出版社,2003
    [23]张民选.高校招生考试制度改革研究[M].上海：上海教育出版社,2008.
    [24][澳]科林·马什.理解课程的关键概念：第3版[M].徐佳、吴刚平译.北京：教育科学出版社,2009.
    [25][加拿大]戴斯等.认知过程的评估：智力的Pass理论[M].杨艳云、谭和平译.上海：华东师范大学出版社,1999.
    [26][美]阿特,[美]麦克塔尔.课堂教学评分规则[M].国家基础教育课程改革“促进教师发展与学生成长的评价研究”项目组译.北京：中国轻工业出版社,2004.
    [27][美]埃利奥特·W·艾斯纳.教育想象——学校课程设计与评价[M].李雁冰等译.北京：教育科学出版社,2008.
    [28][美]B·S·布卢姆.教育目标分类学第一分册认知领域[M].罗黎辉、丁证霖、石伟平、顾建明译.上海：华东师范大学出版社,1986.
    [29][美]J·H·弗拉维尔,[美]P·H·米勒,[美]S·A·米勒.认知发展.第四版[M].邓锡平译.上海：华东师范大学出版社,2002.
    [30][美]L.克罗克、J.阿尔吉纳.经典和现代测验理论导论[M].金瑜等译.上海：华东师范大学出版社,2004.
    [31][美]L·W·安德森,[美]L·A·索斯尼克.布卢姆教育目标分类学——40年的回顾[M].谭晓玉、袁文辉译.上海：华东师范大学出版社,1998.
    [32][美]L·W·安德森.学习、教学和评估的分类学[M].皮连生主译.上海：华东师范大学出版社,2007.
    [33][美]M·P·德里斯克尔.学习心理学——面向教学的取向[M].王小明译.上海：华东师范大学出版社,2007.
    [34][美]马里奥·F.·特里奥拉.初级统计学[M].刘新立译.北京：清华大学出版社,2004.
    [35][美]R·M·加涅.学习的条件和教学论[M].皮连生译.上海：华东师范大学出版社,1999.
    [36][美]阿尼塔·伍德沃克.教育心理学：第八版[M].陈红兵等译.南京：江苏教育出版社,2005.
    [37][美]艾伦·C·奥恩斯坦,费朗西斯·P·汉金斯.课程：基础、原理和问题(第三版)[M].柯森主译.南京：江苏教育出版社,2002.
    [38][美]班杜拉.思想和行动的社会基础：社会认知论[M].上海：华东师范大学出版社,2001.
    [39][美]博里奇,[美]汤伯里.中小学教育评价[M].国家基础教育课程改革“促进教师发展与学生成长的评价研究”项目组译.北京：中国轻工业出版社,2004.
    [40][美]吉尔伯特·萨克斯.教育和心理的测量与评价原理[M].王昌海等译.南京：江苏教育出版社,2002.
    [41][美]加里·T·亨利.实用抽样方法[M].沈崇麟译.重庆：重庆大学出版社,2008.
    [42][美]卡普兰.心理测验：原理、应用及问题[M].赵国祥等著.西安：陕西师范大学出版社,2005.
    [43][美]莱斯利·P·斯特弗等.教育中的建构主义[M].上海：华东师范大学出版社,2002.
    [44][美]林,[美]格朗伦德.教学中的测验与评价[M].国家基础教育课程改革“促进教师发展与学生成长的评价研究”项目组译.北京：中国轻工业出版社,2003.
    [45][美]麦坚泰,[美]米勒.心理测量学[M].骆方,孙晓敏译.北京：中国轻工业出版社,2009.
    [46][美]美国教育研究协会、美国心理学协会、全美教育测量学会.教育与心理测试标准[M].燕娓琴、谢小庆译.沈阳：沈阳出版社,2003.
    [47][美]墨菲,[美]大卫夏弗.心理测验：原理和应用(第6版)[M].张娜等译.[美].上海：上海社会科学出版社,2006.
    [48][美]乔治·J·波斯纳.课程分析(第三版)[M].仇光鹏等译.上海：华东师范大学出版社,2007.
    [49][美]韦伯.有效的学生评价[M].国家基础教育课程改革“促进教师发展与学生成长的评价研究”项目组译.北京：中国轻工业出版社,2003.
    [50][美]沃尔弗德,[美]安迪生.等级评分：学习和评价的有效工具[M].国家基础教育课程改革“促进教师发展与学生成长的评价研究”项目组译.北京：中国轻工业出版社,2004.
    [51][美]伍德沃克.教育心理学(第八版)[M].陈红兵译.南京：江苏教育出版社,2005.
    [52][瑞典]胡森,[德]特尔威斯特.教育测量与评价(简明国际教育百科全书)[M].许建钺等编译.北京：教育科学出版社,1992.
    [53][英]M·W·艾森克,[爱尔兰]M·T·基恩.认知心理学：第4版[M].高定国、肖晓云译.上海：华东师范大学出版社,2003.
    [54]范立双、刘学智.美国“成功分析模式”的诠释与启示——学业评价与课程标准一致性的视角[J].比较教育研究,2010,8：77-80.
    [55]雷新勇、周群.从基于标准的基础教育改革的视角审视课程标准和学业水平考试[J].考试研究,2009,5(1)：46-51.
    [56]雷新勇.2004高考(上海卷)地理科考试评价[J].考试研究,2005,1：56-71.
    [57]雷新勇.基于标准的教育改革背景下学业水平考试内容改革的思考[J].上海教育科研,2008,8：12-16.
    [58]刘恩山、卢群、张颖之.2010年高考生物试卷与课程标准一致性分析[J].基础教育课程,2010,9：61-69.
    [59]刘明珠.对我国高考效度研究的思考[J].教育与考试,2010,5：19-22.
    [60]刘学智、马云鹏.美国“SEC”一致性分析范式的诠释与启示——基础教育中评价与课程标准一致性视角[J].比较教育,2007,5：64-68.
    [61]刘学智、张雷.学业评价与课程标准的一致性：韦伯模式本土化探究[J].外国教育研究,2009,36(12)：13-17.
    [62]刘学智.论评价与课程标准一致性建构：美国经验[J].全球教育展望,2006,35(9)：35-39.
    [63]王少非.考试质量标准的重构：美国的经验[J].外国教育研究,2009,36(9)：16-21.
    [64]王少非.校内考试质量标准：一个框架构想[J].教育科学研究,2009,10：44-47.
    [65]吴根洲.高考效度研究文献述评[J].教育测量与评价(理论版),2009,2：49-51.
    [66]吴根洲.高考效度问题研究[J].教育学术月刊,2010,5：25-29.
    [67]吴根洲.高考效度研究中效标的选择与质量[J].湖北招生考试,2010,6：34-36.
    [68]杨向东、张晓蕾.课程标准的开发与基于标准的学业水平考试的设计.美国的经验与启示[J].考试研究,2010,6(1)：109-125.
    [69]杨向东.教育测量在教育评价中的角色[J].全球教育展望,2007,1：15-25.
    [70]周群.主观题评分标准研究[J].考试研究,2007,1：53-61.
    [71]刘学智.小学数学学业评价与课程标准一致性的研究：[博士论文].吉林：东北师范大学,2008.
    [72]吴根洲.高考效度问题研究：[博士论文].福建：厦门大学,2007.
    [73]邹申.考试效度研究互动性——再论TEM4阅读测试项目的有效性：[博士论文].上海：上海外国语大学,2006.
    [74]周群.上海市高校招生考试单独命题再思考——以上海市高考政治命题改革为例：[硕士论文].上海：华东师范大学,2005.
    [75]上海市教育委员会.上海市教育委员会关于实施上海市普通高中学业水平考试的通知.沪教委基[2009]53号.
    [76]教育部.关于做好高等学校自主选拔录取改革试点工作的通知.[EB/OL].http://news.xinhuanet.com/edu/2006-03/29/content_4359658.htm, 2003/2010-03-23.
    [77]教育部.2010年普通高等学校招生工作规定.[EB/OL].http://gaokao.chsi.com.cn/gkxx/ss/201003/20100326/68361300.html, 2010-03-26/2010-08-15.
    [78]教育部.国家中长期教育改革和发展规划纲要(2010—2020年)[EB/OL].http://www.moe.edu.cn/edoas/website18/30/info1280446539090830.htm, 2010-07-29/2010-08-15.
    [79]教育部.关于印发《普通高等学校招生全国统一考试分省命题工作暂行管理办法》的通知.[EB/OL]. http://www.moe.gov.cn/publicfiles/business/htmlfiles/moe/moe_1304/200701/18 439.html,2006/2011-09-20.
    [80]中华人民共和国全国人民代表大会.中华人民共和国教育法[EB/OL].http://www.moe.edu.cn/edoas/website18/32/info1432.htm, 1995-03-18/2010-08-10.
    [81]80高校自主招生教育部称与高考不存在替代关系.[EB/OL].http://www.jyb.cn/gk/zhzs/201103/t20110328_422181.htm,2011/2011-07-23.
    [82]Hair, J.F. Jr., Anderson, R.E., Tatham, R.L., & Black, W.C. Multivariate Data Analysis, (5th Edition) [M]. Upper Saddle River, NJ:Prentice Hall,1998.
    [83]Embretson, S. E & S. P. Reise. Item Response Theory For Psychologist[M]. London:LAWRENCE ERLBAUM ASSOCIATES, PUBLISHERS,2000.
    [84]James E. Purpura. Aessing Grammar[M]. Cambridge, U.K.:The Press of Syndicate of The University of Cambrige,2004.
    [85]Osterlind, S. J. Constructing test items:Multiple-choice, constructed-response, performance, and other formats[M]. New York:Kluwer Academic,1998.
    [86]R. L. Brennan (ed.). Educational Measurement (Fourth Ed.,) [M]. Praeger: American Council on Education,2006.
    [87]R. L. Brennan, Generalizability Theory[M]. New York:Springer,2001.
    [88]Steven M. Downing & Thomas M.Haladyna(ed). Handbook of Test Development[M]. Mahwah, NJ:Lawrence Erlbaum Associates,2006.
    [89]Thomas M. Haladyna. Writing Test Items to Evaluate Higher Order Thinking[M]. Needham Heights, MA:Allyn and Bacon.1997.
    [90][美]奥恩斯坦等.当代课程问题[M].影印本.北京：中国轻工业出版社,2004.
    [91][美]格雷戈里.心理测验：历史、原理及应用(第5版)[M].英文版.北京：人民邮电出版社,2008.
    [92][美]朱迪思·A.麦克劳夫林.行为科学统计学入门[M].南京：江苏教育出版社,2005.
    [93]Tanner LeBaron. Wallace. An argument-based approach to validity [J]. Evaluation, 2011,17(3):233-246.
    [94]Micheal E.Woolley, Gary L. Bowen & Natasha K. Bowen. An evaluative argument-based investigation of validity evidence for the Utah pre-algebra criterion-referenced test[J]. Research on Social Work Practice,2004,4 (3):191-200.
    [95]Stephen G. Schilling & Heather C. Hill. Assessing Measures of Mathematical Knowledge for Teaching:A Validity Argument Approach [J]. Measurement: Interdisciplinary Research & Perspective,2007,5(2)70-80.
    [96]Pamela A Moss. Reconstructiong Validity [J]. Educational Researcher,2007, 36:470-476.
    [97]Andrew C. Porter. Measuring the Content of Instruction:Uses in Research and Practice[J]. EDUCATIONAL RESEARCHER,2002,31(7):3-14.
    [98]Anne R. Fitzpatrick. The Meaning of Content Validity [J]. Applied Psychological Measurement,1983,7:3-12.
    [99]Robert W. Lissitz, Karen Samuelsen. A Suggested Change in Terminology and Emphasis Regarding Validity and Education[J]. Education Researcher,2007, 36(8):437-448.
    [100]Stephen G. Sireci. The Construct of Content Validity [J]. Socal Indicators Research,1998,45:83-117.
    [101]Shealy R. & W. Stout. A Model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF [J]. Psychometrika,1993,58(2):159-194.
    [102]Douglas J. A, Roussos L. A. & W. Stout, Item-bundle DIF hypothesis testing: identifying suspect bundles and assessing their differential functioning [J]. Journal of Educational Measurement,1996,33(4):465-484.
    [103]Furst, E. J. Bloom's taxonomy of educational objectives for the cognitive domain:Philosophical and educational issues [J]. Review of Educational Research,1981,51:441-453.
    [104]Stephen G. Sireci and Kurt F. Geisinger. Using Subject-Matter Experts to Assess Content Representation:An MDS Analysis [J]. Applied Psychological Measurement,1995,19:241-255.
    [105]Delgado, A. R., & Prieto, G. Further evidence favoring three-option items in multiple-choice tests [J]. Educational Psychological Measure,1998, 14(3):197-201.
    [106]Bickel, P. etc. On Maximizing Item Information and Matching Difficulty with Ability [J]. Psychometric,2001,66:69-77.
    [107]Haladyna, T. M., & Downing, S. M. How many options is enough for a multiple-choice test item? [J] Educational Psychological Measure,1993,53(4): 999-1010.
    [108]Cizek, G. J, O'Day, D. M. Further investigations of nonfunctioning options in multiple-choice test items [J]. Educational Psychological Measure,1994, 54(4):861-872.
    [109]Stephen G.Sireci. The Construct of Content Validity [J]. Social Indicators Research,1998,45:83-117.
    [110]Michael D. Guerrero. The unified validity of the Four Skills Exam:applying Messick's framework [J]. Language Testing,2000,17(4):397-421.
    [111]Linacre J.M. Optimizing Rating Scale category Effectiveness [J]. Journal of Applied Measurement,2002,3(1):85-106.
    [112]S. Brian Hood. Validity in Psychological Testing and Scientific Realism [J]. Theory & Psychology,2009,19(4):451-473.
    [113]Jimarez, Teresa. Does alignment of constructivist teaching, curriculum, and assessment strategies promote meaningful learning?:[PhD dissertation]. New Mexico State University,2005.
    [114]Karvonen, Meagan. Validity evidence for alternate assessment based on analysis of Individualized Education Programs and curriculum alignment:[PhD dissertation]. University of South Carolina,2004.
    [115]Krogstad, Finn. Evaluating the validity of research implications:[PhD dissertation]. University of Washington,2007.
    [116]Lee,Young-Ju. Construct validation of an integrated, process-oriented, and computerized English for academic purposes (EAP) placement test:A mixed method approach:[PhD dissertation]. University of Illinois at Urbana-Champaign,2005.
    [117]Louise Richard Moulding. An evaluative argument-based investigation of validity evidence for the utah pre-algebra criterion-referenced test:[PhD dissertation]. Utah State University,2001.
    [118]Martone, Andrea. Exploring the impact of teachers' participation in an assessment-standards alignment study:[PhD dissertation]. New Mexico State University,2007.
    [119]Mircea-Pines, Walter J. An Examination of Reliability and Validity Claims of a Foreign Language Proficiency Test:[PhD dissertation]. George Mason University,2009.
    [120]Moulding. Louise Richards. An evaluative argument-based investigation of validity evidence for the Utah pre-algebra criterion-referenced test:[PhD dissertation]. Utah State University,2001.
    [121]Gavin W. Fulmer. Estimating Critical Values for Strength of Alignment among Curriculum, Assessments, and Insteuction[R]. Paper present an the 2010 annual meeting of the American Educational Research Association.
    [122]Michael T. Kane. An Argument-based Approach to Validation[A]. ACT Research Report Series.1990.
    [123]Norman L. Webb. Issues Related to Judging the Alignment of Curriculum Standards and Assessment[R]. Annual Meeting of the American Educational Research Association Meeting, Montreal, Apria 11,2005.
    [124]Norman L. Webb. Criteria for Alignment of Expectations and Assessments in Mathematics and Science Education[A]. Published in:NISE Research Monograph.2007.
    [125]Robert Rothman, Jean B. Slattery, Jennifer L. Vranek, Lauran B. Resnick. Benchmarking and Aligment of Standards and Testing. CSE Technical Report 556. Center for the Study of Evaluation National Center for Research on Evaluation, Standards, and Student Testing Graduate School of Education & Information Stcdies University of California, Los Angele.2002
    [126]Linacre J.M. A User's Guide to WINSTEPS & MINISTEPS Rasch-Model Computer Programs[Z].Winsteps.com,2006.
    [127]Brennan, R. L. Manual for mGENOVA Version 2.1[Z]. The University of Iowa,2001.
    [128]M. D.Toit. IRT from SSI:BILOG-MG MULTILOG PARSCALE TESTFACT[Z]. Scientific Software International, Inc.,2003.
    [129]The Roussos-Stout Software Development Group. Dimensionality-Based DIF Analysis Package:Version 1.7 SIBTEST, Poly-SIBTEST, Crossing SIBTEST[Z].
    [130]ACT, Inc. COLLEGE READINESS STANDARDS For EXPLORE(?), PLAN(?), and the ACT(?). [EB/OL]. http://www.act.org/standard/pdf/CRS.pdf, 2010/2010-07-25.
    [131]ACT, Inc. TECHNICAL MANUAL[EB/OL]. http://www.act.org/aap/pdf/ACT_Technical_Manual.pdf,2007/2010-07-23.
    [132]ACT, Inc. Your Guide to the ACT [EB/OL]. http://www.act.org/aap/pdf/YourGuidetoACT.pdf,2005/2010-07-29.
    [133]Amy Hendrickson,Brian Patterson,Maureen Ewing. Developing Form Assembly Specifications for Exams With Multiple Choice and Constructed Response Items Balancing reliability and validity concerns. [EB/OL]. http://professionals.collegeboard.com/pro fdownload/pdf/FAS_for_mixed_format_exams_AERA2010_final.pdf,2010/2010-06-22.
    [134]Code of Fair Testing Practices in Education. [EB/OL]. http://www.apa.org/science/programs/testing/fair-code.aspx,1988/2011-07-3.
    [135]ETS. ETS Standards for Quality and Fairness. [EB/OL]. http://www.ets.org/Media/About_ETS/pdf/standards.pdf,1981/2011-07-3.
    [136]Guidelines for Constructed-Response and Other Performance Assessments. [EB/OL]. http://www.ets.org/Media/About_ETS/pdf/8561_ConstructedResponse_guideline s.pdf,2005/2009-09-12.
    [137]Lee J. Cornbach, Paul E. MeehlConstruct Validity in Psychological Tests. First published in Psychological Bulletin,1955,52:281-302. [EB/OL]. http://psychclassics.yorku.ca/Cronbach/construct.htm,2008-07-20.
    [138]Richard C. Atkinson. Achievement versus Aptitude Tests in College Admissions. [EB/OL]. http://www.ucop.edu/pres/speeches/achieve.htm, 2001/2009-09-12.
    [139]Ronald A. Berk. Test Alignment and Balancing with State Standards:Every 6 Months or 50 Items, Whichever Comes First. [EB/OL]. http://www.pearsonassessments.com/hai/images/NES_Publications/2005_03Berk_488_1.pdf,2004/2011-09-20.
    [140]Standardized test. [EB/OL]. http://en.wikipedia.org/wiki/Standardized_test, 2011/2011-09-17.
    [141]Statistical Definitions. [EB/OL]. http://professionals.collegeboard.com/data-reports-research/sat/definitions, 2011/2011-08-02.
    [142]Tarrant, M. Ware, J. & Mohammed, A. M.. An assessment of functioning and non-functioning destructors in multiple-choice questions:a descriptive analysis. [EB/OL]. http://www.biomedicentral.com/1472-6920/9/40, 2009/2010-07-02.
    [143]The College board. Science College Board Standards for College Success. [EB/OL]. http://professionals.collegeboard.com/profdownload/cbscs-science-standards-200 9.pdf,2009/2010-08-10.
    [144]The College board. Test Development. [EB/OL]. http://professionals.collegeboard.com/higher-ed/recruitment/sat-test/dev, 2010/2010-08-18.
    [145]The College board. The SAT Program handbook [EB/OL]. http://www.collegeboard.com/prod_downloads/sat/sat-program-handbook.pdf(20 09)[2010-07-20].
    [146]The School Examination and Assessment Council (SEAC). SEAC's Mandatory Code of Practice. [EB/OL]. http://quality.massey.ac.nz/, 1981/2010-06-25.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700