Can student self-ratings be compared with peer ratings? A study of measurement invariance of multisource feedback

详细信息查看全文

作者：Keng-Lin Lee ; Shih-Li Tsai ; Yu-Ting Chiu…
关键词：Professionalism ; Multisource feedback ; Self ; assessment ; Peer ; assessment ; Measurement invariance
刊名：Advances in Health Sciences Education
出版年：2016
出版时间：May 2016
年：2016
卷：21
期：2
页码：401-413
全文大小：420 KB
参考文献：Abbasi, K. (2011). A way forward for whistleblowing. Journal of the Royal Society of Medicine, 104, 275.CrossRef
Al Ansari, A., Donnon, T., Al Khalifa, K., Darwish, A., & Violato, C. (2014). The construct and criterion validity of the multi-source feedback process to assess physician performance: A meta-analysis. Advances in Medical Education and Practice, 5, 39–51.CrossRef
Al Khalifa, K., Al Ansari, A., Violato, C., & Donnon, T. (2013). Multisource feedback to assess surgical practice: A systematic review. Journal of Surgical Education, 70(4), 475–486.CrossRef
Allerup, P., Aspegren, K., Ejlersen, E., Jørgensen, G., Malchow-Møller, A., Møller, et al. (2007). Use of 360-degree assessment of residents in internal medicine in a Danish setting: A feasibility study. Medical Teacher, 29(2–3), 166–170.CrossRef
Andrews, J. J. W., Violato, C., Al Ansari, A., Donnon, T., & Pugliese, G. (2013). Assessing psychologists in practice: Lessons from the health professions using multisource feedback. Professional Psychology: Research and Practice, 44(4), 193–207.CrossRef
Archer, J. C., Norcini, J., & Davies, H. A. (2005). Use of SPRAT for peer review of paediatricians in training. British Medical Journal, 330(7502), 1251–1253.CrossRef
Arnold, E. L., Blank, L. L., Race, K. E. H., & Cipparrone, N. (1998). Can professionalism be measured? The development of a scale for use in the medical environment. Academic Medicine, 73(10), 1119–1121.CrossRef
Bolsin, S., Pal, R., Wilmshurst, P., & Pena, M. (2011). Whistleblowing and patient safety: The patient’s or the profession’s interests at stake? Journal of the Royal Society of Medicine, 104(7), 278–282.CrossRef
Campbell, J. L., Roberts, M., Wright, C., Hill, J., Greco, M., Taylor, M., et al. (2011). Factors associated with variability in the assessment of UK doctors’ professionalism: Analysis of survey results. British Medical Journal, 343, d6212.CrossRef
Chandler, N., Henderson, G., Park, B., Byerley, J., Brown, W. D., & Steiner, M. J. (2010). Use of a 360-degree evaluation in the outpatient setting: The usefulness of nurse, faculty, patient/family, and resident self-evaluation. Journal of Graduate Medical Education, 2(3), 430–434.CrossRef
Chen, F. F. (2007). Sensitivity of goodness of fit indexes to lack of measurement invariance. Structural Equation Modeling, 14, 464–504.CrossRef
Cheung, G. W., & Rensvold, R. B. (2002). Evaluating goodness-of-fit indexes for testing measurement invariance. Structural Equation Modeling, 9, 233–255.CrossRef
Cohen, J. J. (2006). Professionalism in medical education, an American perspective: From evidence to accountability. Medical Education, 40(7), 607–617.CrossRef
Cook, D. A., & Beckman, T. J. (2006). Current concepts in validity and reliability for psychometric instruments: Theory and application. American Journal of Medicine, 119(2), 166.e7–166.e16.CrossRef
Davies, H., Archer, J., Bateman, A., Dewar, S., Crossley, J., Grant, J., et al. (2008). Specialty-specific multi-source feedback: Assuring validity, informing training. Medical Education, 42(10), 1014–1020.CrossRef
Davis, D. A., Mazmanian, P. E., Fordis, M., Van Harrison, R., Thorpe, K. E., & Perrier, L. (2006). Accuracy of physician self-assessment compared with observed measures of competence: A systematic review. Journal of the American Medical Association, 296(9), 1094–1102.CrossRef
Deptula, P., & Chun, M. B. (2013). A literature review of professionalism in surgical education: Suggested components for development of a curriculum. Journal of Surgical Education, 70(3), 408–422.CrossRef
Donnon, T., Al Ansari, A., Al Alawi, S., & Violato, C. (2014). The reliability, validity, and feasibility of multisource feedback physician assessment: A systematic review. Academic Medicine, 89(3), 511–516.CrossRef
Donnon, T., McIlwrick, J., & Woloschuk, W. (2013). Investigating the reliability and validity of self and peer assessment to measure medical students’ professional competencies. Creative Education, 4(6A), 23–28.CrossRef
Evans, A. W., McKenna, C., & Oliver, M. (2002). Self-assessment in medical practice. Journal of the Royal Society of Medicine, 96(10), 511–513.CrossRef
Ginsburg, S., Regehr, G., Hatala, R., McNaughton, N., Frohna, A., Hodges, B., et al. (2000). Context, conflict, and resolution: A new conceptual framework for evaluating professionalism. Academic Medicine, 75(Suppl 10), S6–S11.CrossRef
Ginsburg, S., Regehr, G., & Lingard, L. (2004). Basing the evaluation of professionalism on observable behaviors: A cautionary tale. Academic Medicine, 79(Suppl 10), S1–S4.CrossRef
Gornall, J. (2009). The price of silence. British Medical Journal, 339, 1000–1004.
Ho, M. J., Lin, C. W., Chiu, Y. T., Lingard, L., & Ginsburg, S. (2012). A cross-cultural study of students’ approaches to professional dilemmas: Sticks or ripples. Medical Education, 46(3), 245–256.CrossRef
Ho, M. J., Yu, K. H., Hirsh, D., Huang, T. S., & Yang, P. C. (2011). Does one size fit all? Building a framework for medical professionalism. Academic Medicine, 86(11), 1407–1414.CrossRef
Hu, L. T., & Bentler, P. M. (1998). Fit indices in covariance structure modeling: Sensitivity to underparameterized model misspecification. Psychological Methods, 3(4), 424–453.CrossRef
Hu, L., & Bentler, P. M. (1999). Cut off criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6, 1–55.CrossRef
Jöreskog, K. G., & Goldberger, A. S. (1975). Estimation of a model with multiple indicators and multiple causes of a single latent variable. Journal of the American Statistical Association, 70, 631–639.
Jöreskog, K. G., & Sörbom, D. (2004). LISREL 8.7 for Windows. Lincolnwood, IL: Scientific Software International, Inc.
Legge, J. (translator). (1861). The Chinese classic, etc.: Vol I. Confucian analects, the great learning, and the doctrine of the mean. Hong Kong: Selbstverl.
Leung, K. K., Wang, W. D., & Chen, Y. Y. (2012). Multi-source evaluation of interpersonal and communication skills of family medicine residents. Advances in Health Sciences Education, 17, 717–726.CrossRef
Lockyer, J. (2003). Multisource feedback in the assessment of physician competencies. Journal of Continuing Education in the Health Professions, 23(1), 4–12.CrossRef
Lockyer, J. M., & Clyman, S. G. (2008). Multisource feedback (360 degree evaluation). In E. S. Holmboe & R. E. Hawkins (Eds.), Practical guide to the evaluation of clinical competence (1st ed., pp. 75–85). Philadelphia, PA: Mosby-Elsevier.
Lockyer, J. M., Violato, C., & Fidler, H. (2006a). The assessment of emergency physicians by a regulatory authority. Academic Emergency Medicine, 13(12), 1296–1303.CrossRef
Lockyer, J. M., Violato, C., & Fidler, H. (2006b). A multi source feedback program for anesthesiologists. Canadian Journal of Anesthesia, 53(1), 33–39.CrossRef
Meredith, W. (1993). Measurement invariance, factor analysis and factorial invariance. Psychometrika, 58, 525–543.CrossRef
Meredith, W., & Teresi, J. A. (2006). An essay on measurement and factorial invariance. Medical Care, 44, S69–S77.CrossRef
Miller, A., & Archer, J. (2010). Impact of workplace based assessment on doctors’ education and performance: A systematic review. British Medical Journal, 341, c5064.CrossRef
Motl, R. W., Dishman, R. K., Birnbaum, A. S., & Lytle, L. A. (2005). Longitudinal invariance of the Center for Epidemiological Studies Depression Scale (CES-D) among girls and boys in middle school. Educational and Psychological Measurement, 65, 90–108.CrossRef
Musick, D. W., McDowell, S. M., Clark, N., & Salcido, R. (2003). Pilot study of a 360-degree assessment instrument for physical medicine & rehabilitation residency programs. American Journal of Physical Medicine and Rehabilitation, 82(5), 394–402.
National Board of Medical Examiners. (2006). Assessment of professional behaviors. Retrieved 2010. https://www2.nbme.org/APB/Schools/APB/join.asp .
Overeem, K., Wollersheim, H. C., Arah, O. A., Cruijsberg, J. K., Grol, R. P. T. M., & Lombarts, K. M. J. M. H. (2012). Evaluation of physicians’ professional performance: An iterative development and validation study of multisource feedback instruments. BMC Health Services Research, 12, 80.CrossRef
Qu, B., Zhao, Y. H., & Sun, B. Z. (2012). Assessment of resident physicians in professionalism, interpersonal and communication skills: A multisource feedback. International Journal of Medical Sciences, 9(3), 228–236.CrossRef
Raee, H., Amini, M., Nasab, A. M., Pour, A. M., & Jafari, M. M. (2014). Team-based assessment of professional behavior in medical students. Journal of Advances in Medical Education and Professionalism, 2(3), 126–130.
Reise, S. P., Widaman, K. F., & Pugh, R. H. (1993). Confirmatory factor analysis and item response theory: Two approaches for exploring measurement invariance. Psychological Bulletin, 114, 552–566.CrossRef
Rhodes, R., & Strain, J. J. (2004). Whistleblowing in academic medicine. Journal of Medical Ethics, 30, 35–39.CrossRef
Richmond, M., Canavan, C., Holtman, M. C., & Katsufrakis, P. J. (2011). Feasibility of implementing a standardized multisource feedback program in the graduate medical education environment. Journal of Graduate Medical Education, 3(4), 511–516.CrossRef
Roberts, M. J., Campbell, J. L., Richards, S. H., & Wright, C. (2013). Self-other agreement in multisource feedback: The influence of doctor and rater group characteristics. Journal of Continuing Education in the Health Professions, 33(1), 14–23.CrossRef
Sargeant, J. M., Mann, K. V., Ferrier, S. N., Langille, D. B., Muirhead, P. D., Hayes, V. M., & Sinclair, D. E. (2003). Responses of rural family physicians and their colleague and coworker raters to a multi-source feedback process: A pilot study. Academic Medicine, 78(Suppl 10), S42–S44.CrossRef
Sargeant, J., Mann, K., Sinclair, D., van der Vleuten, C., & Metsemakers, J. (2007). Challenges in multisource feedback: Intended and unintended outcomes. Medical Education, 41(6), 583–591.CrossRef
Sargeant, J., Mann, K., van der Vleuten, C., & Metsemakers, J. (2008). “Directed” self-assessment: Practice and feedback within a social context. The Journal of Continuing Education in the Health Professions, 28(1), 47–54.CrossRef
Shrank, W. H., Reed, V. A., & Jernstedt, G. C. (2004). Fostering professionalism in medical education: A call for improved assessment and meaningful incentives. Journal of General Internal Medicine, 19(8), 887–892.CrossRef
Stefani, L. A. J. (1994). Peer, self and tutor assessment: Relative reliabilities. Studies in Higher Education, 19(1), 69–75.CrossRef
Tromp, F., Vernooij-Dassen, M., Kramer, A., Grol, R., & Bottema, B. (2010). Behavioural elements of professionalism: Assessment of a fundamental concept in medical care. Medical Teacher, 32(4), e161–e169.CrossRef
van Mook, W. N. K. A., Gorter, S. L., O’Sullivan, H., Wass, V., Schuwirth, L. W., & van der Vleuten, C. P. M. (2009). Approaches to professional behaviour assessment: Tools in the professionalism toolbox. European Journal of Internal Medicine, 20(8), e153–e157.CrossRef
van Mook, W. N., Van Luijk, S. J., Fey-Schoenmakers, M. J., Tans, G., Rethans, J. J., Schuwirth, L. W., & van der Vleuten, C. P. (2010). Combined formative and summative professional behaviour assessment approach in the bachelor phase of medical school: A Dutch perspective. Medical Teacher, 32(12), e517–e531.CrossRef
Vandenberg, R. J., & Lance, C. E. (2000). A review and synthesis of the measurement invariance literature: Suggestions, practices, and recommendations for organizational research. Organizational Research Methods, 3, 4–69.CrossRef
Veloski, J. J., Fields, S. K., Boex, J. R., & Blank, L. L. (2005). Measuring professionalism: A review of studies with instruments reported in the literature between 1982 and 2002. Academic Medicine, 80(4), 366–370.CrossRef
Violato, C., Lockyer, J., & Fidler, H. (2003). Multisource feedback: A method of assessing surgical practice. British Medical Journal, 326(7388), 546–548.CrossRef
Violato, C., Lockyer, J. M., & Fidler, H. (2006). Assessment of pediatricians by a regulatory authority. Pediatrics, 117(3), 796–802.CrossRef
Wilkinson, T. J., Wade, W. B., & Knock, L. D. (2009). A blueprint to assess professionalism: Results of a systematic review. Academic Medicine, 84(5), 551–558.CrossRef
Wilmshurst, P. (2013). No doctor should be untouchable. British Medical Journal, 346, f2338.CrossRef
Wood, J., Collins, J., Burnside, E., Albanese, M. A., Propeck, P. A., Kelcz, F., et al. (2004). Patient, faculty, and self-assessment of radiology resident performance: A 360-degree method of measuring professionalism and interpersonal/communication skills. Academic Radiology, 11(8), 931–939.
作者单位：Keng-Lin Lee (1)
Shih-Li Tsai (1)
Yu-Ting Chiu (1)
Ming-Jung Ho (1)

1. Department of Medical Education and Bioethics, National Taiwan University College of Medicine, No. 1, Ren-Ai Road, Section 1, Taipei, Taiwan
刊物类别：Humanities, Social Sciences and Law
刊物主题：Education
Medical Education
出版者：Springer Netherlands
ISSN：1573-1677

文摘

Measurement invariance is a prerequisite for comparing measurement scores from different groups. In medical education, multi-source feedback (MSF) is utilized to assess core competencies, including the professionalism. However, little attention has been paid to the measurement invariance of assessment instruments; that is, whether an instrument holds the same meaning across different rater groups. To examine the measurement invariance of the National Taiwan University professionalism MSF (NTU P-MSF) in order to determine whether medical students’ self-rating can be compared to their peers’ rating. An eight-factor model was specified for confirmatory factor analysis to examine the construct validity of the NTU P-MSF. Cronbach’s alpha was computed for the items of each domain to evaluate internal consistent reliability. The same eight-factor model was used for multi-group confirmatory factor analyses. Four hierarchical models were specified to test configural (i.e., identical factor–item relationship), metric (i.e., identical factor loadings), scalar (i.e., identical intercepts), and error variance across self-rating and peer rating groups. One hundred and twenty second-year medical students from weekly discussion groups conducted as part of a medical professionalism course agreed to use the NTU P-MSF to assess themselves or their discussion group peers. NTU P-MSF assessment scores were a good fit for the eight-factor model among self group and peer group. The Cronbach’s alpha coefficients of students’ NTU P-MSF scores and peers’ scores ranged from 0.76 to 0.89 and 0.84 to 0.91, respectively indicating that the NTU P-MSF scores also have good internal consistent reliability between both groups. In addition, same factor structure and similar factor loadings and intercepts of NTU P-MSF scores between both groups indicate that NTU P-MSF scores had configural, metric, and scalar invariance. Thus, students’ self-assessments and peer assessments can be compared in terms of the constructs of NTU P-MSF scores, change in NTU P-MSF scores, and its factor scores. This study demonstrates how to investigate the measurement invariance of a professionalism MSF and contributes to the discussion on self- and peer assessment in medical education.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700