Using a sample where 10 rater pairs had assessed the presence/absence of 188 environmental barriers by a systematic rating form, a raters ¡Á items data set was generated (N = 1,880). In addition to common agreement indices, relative shares of agreement variation were calculated. Multilevel regression analysis was carried out, using rater and item characteristics as predictors of agreement variation.
Following a conceptual decomposition, the agreement variation was statistically disentangled into relative shares. The raters accounted for 6-11 % , the items for 32-33 % , and the residual for 57-60 % of the variation. Multilevel regression analysis showed barrier prevalence and raters¡¯ familiarity with using standardized instruments to have the strongest impact on agreement.
Supported by a conceptual analysis, we propose an approach of in-depth examination of agreement variation, as a strategy for increasing the level of interrater agreement. By identifying and limiting the most important sources of disagreement, instrument reliability can be improved ultimately.