Recalibration of Genomic Risk Prediction Models in Prostate Cancer to Improve Individual-Level Predictions.
Year of Publication
Davicioni, E.; Choeurng, V.; Luo, B.; Yousefi, K.; Shin, H.; Haddad, Z.; Ross, A.; Schaeffer, E.M.; Den, R.B.; Dicker, A.; Karnes, J.; Thompson, D.J.S.
American Society of Clinical Oncology Annual Meeting
Background: Despite the obvious clinical importance, a test’s calibration is rarely described in clinical validation studies. Full descriptions of the development of qualitative risk categories for a test’s continuous score can be just as uncommon. This investigation demonstrates the process of recalibration and construction of clinically meaningful cut-points for a validated genomic test, Decipher, used for predicting post-surgical metastatic progression. The performance of the recalibrated test is evaluated in an external validation set. Methods: Decipher is recalibrated on a case-cohort set (n = 216) using a flexible proportional-hazards model, capable of accommodating typical departures from the proportionality assumption with performance assessed through a number of metrics; calibration-in-the-large, calibration slope, goodness-of-fit, modified Hosmer-Lemshow. Cut-points of the recalibrated score are developed using a resampling method to optimize the partial likelihood of a Cox model. An independent set of 139 patients is used for validation of the recalibrated scores and the developed cut-points. Results: The described methods are shown to provide acceptable calibration in both the training (p-value = 0.696) and validation sets (p-value = 0.487). Based on the Cox model, the optimized cut-points for Decipher were 0.45 and 0.60 when stratifying patients into three meaningful risk categories (p < 0.001). In the external validation set, patients with a Decipher score of 0.45-0.60 had a 2.7-fold increase in risk while those with scores > 0.60 had a 5.8-fold increase in risk when compared to patients whose Decipher scores were < 0.45. Conclusions: Prognostic genomic tests can be recalibrated on time-to-event data, to make the test’s individual-level predictions as accurate as possible and provide a statistical basis for clinically-interpretable cut-points. The methods are also shown to apply to series of prognostic tests regardless of their method of discovery or model.