All patients admitted from July 2008 to December 2009 were evaluated for inclusion in the study. Standardized mortality ratios were calculated for all models. Calibration was assessed by the Hosmer-Lemeshow goodness-of-fit test. Discrimination was evaluated using the area under the receiver operator curve.
A total of 5780 patients were included. Inhospital mortality was 9.1 % . Discrimination was very good for all models (area under the receiver operator curve for APACHE IV, SAPS 3 and MPM0-III was 0.883, 0.855 and 0.840, respectively). APACHE IV showed better discrimination than SAPS 3 and MPM0-III (P < .001 for both comparisons). All models calibrated poorly and overestimated hospital mortality (Hosmer-Lemeshow statistic was 53.7, 134.2, 226.6 for APACHE IV, MPM0-III, and SAPS 3, respectively; P < .001 for all).
In this study, all models showed poor calibration, while discrimination was very good for all of them. As this has been a common finding in validation studies, caution is warranted when using prognostic models for benchmarking.