The package checks whether there are considerable non-linear and interaction terms in the data, and quantifies their contributions to the models’ performance. Using repeated nested cross-validation (that is, a system of random data splits into the training and testing sets to evaluate prediction accuracy), the package:
Validates Cox Proportionate Hazards model, or Cox Lasso depending on the user’s input. This step employs
survival::coxph()
(Therneau, 2023);glmnet::glmnet(..., family="cox")
(Simon, Friedman,
Hastie & Tibshirani , 2011).Validates Survival Random Forest ensembled with the baseline Cox model. To fit the ensemble, CoxPH’s out-of-sample predictions are added to the list of orignial predictors, which are then used to fit SRF, using
randomForestSRC::rfsrc()
(Ishwaran & Kogalur,
2023).Performs statistical testing of whether the Survival Random Forest ensemble outperforms the Cox model.
It does NOT handle missing data at the moment.
Predictive performance is evaluated by averaging different measures across all train-test splits. The following measures are computed:
Discrimination measures: Harrell’s concordance index, time-dependent AUCROC.
Calibration measures: calibration slope, calibration alpha.
Overall fit: Brier score, Scaled Brier score.
Performance metrics’ description and definitions can be found, for example, in Steyerberg & Vergouwe (2014).
Therneau T (2023). A Package for Survival Analysis in R. R package version 3.5-7, https://CRAN.R-project.org/package=survival.
Simon N, Friedman J, Tibshirani R, Hastie T (2011). “Regularization Paths for Cox’s Proportional Hazards Model via Coordinate Descent.” Journal of Statistical Software, 39(5), 1–13. doi:10.18637/jss.v039.i05.
Ishwaran H, Kogalur U (2023). Fast Unified Random Forests for Survival, Regression, and Classification (RF-SRC). R package version 3.2.2, https://cran.r-project.org/package=randomForestSRC
Steyerberg EW, Vergouwe Y. (2014). Towards better clinical prediction models: seven steps for development and an ABCD for validation. European heart journal, 35(29), 1925-1931 https://doi.org/10.1093/eurheartj/ehu207
survcompare
results?There are two possible outcomes: 1) “Survival Random Forest ensemble has NOT outperformed CoxPH”, or 2) “Survival Random Forest ensemble has outperformed CoxPH by … in C-index”.
If there is no outperformance, the test can justify the employment of CoxPH model and indicate a negligible advantage of using a more flexible model such as Survival Random Forest.
In the case of the ensemble’s outperformance of the Cox model, a researcher can:
Before interpreting the results, ensure that a sufficient number of repetitions (repeat_cv) has been used. Parameter repeat_cv should be at least 5, but for a heterogeneous data 20-50 may be needed. Otherwise, the results may unreliable and subject to chance.
The ensemble of CoxPH and SRF takes CoxPH’s predictions and adds them to the list of predictors to train SRF. This way, we make sure that linearity is captured by the ensembled SRF at least as good as in the CoxPH, and hence its outperformance can be fully attributed to the qualities of SRF that CoxPH does not have, that is, handling data non-linearity.
The first example takes simulated data that does not contain any complex terms, and CoxPH is expected to perform as good as Survival Random Forest.
mydata <- simulate_linear()
mypredictors <- names(mydata)[1:4]
compare_models <- survcompare(mydata, mypredictors)
#> [1] "Cross-validating Cox PH model using 2 repeat(s), 3 outer, 3 inner loops).For SRF inner CV is not used if oob = TRUE (default)."
#>
|
| | 0%
|
|= | 2%
|
|============ | 17%
|
|======================= | 33%
|
|=================================== | 50%
|
|=============================================== | 67%
|
|========================================================== | 83%
|
|======================================================================| 100%
#> Time difference of 1.191734 secs
#> [1] "Cross-validating CoxPH and SRF Ensemble using 2 repeat(s), 3 outer, 3 inner loops).For SRF inner CV is not used if oob = TRUE (default)."
#>
|
| | 0%
|
|= | 2%
|
|============ | 17%
|
|======================= | 33%
|
|=================================== | 50%
|
|=============================================== | 67%
|
|========================================================== | 83%
|
|======================================================================| 100%
#> Time difference of 7.5005 secs
#>
#> Internally validated test performance of CoxPH and Survival Random Forest ensemble:
#> T C_score AUCROC BS BS_scaled Calib_slope Calib_alpha
#> CoxPH 8.5678 0.7240 0.7638 0.1475 0.1497 1.1592 0.1614
#> SRF_Ensemble 8.5678 0.6972 0.7290 0.1488 0.1403 1.1451 0.0798
#> Diff 0.0000 -0.0269 -0.0348 0.0013 -0.0094 -0.0141 -0.0816
#> pvalue NaN 0.8910 0.9292 0.5304 0.6168 0.5289 0.9675
#> sec
#> CoxPH 1.19
#> SRF_Ensemble 7.50
#> Diff 6.31
#> pvalue NaN
#>
#> Survival Random Forest ensemble has NOT outperformed CoxPH with mean c-index difference of-0.0269.
#> The difference is not statistically significant with the p-value = 0.891.
#> The data may NOT contain considerable non-linear or cross-term dependencies,
#> that could be captured by the Survival Random Forest.
#> C-score:
#> CoxPH 0.724(95CI=0.685-0.7697;SD=0.0369)
#> SRF_Ensemble 0.6972(95CI=0.6527-0.7312;SD=0.0292)
#> AUCROC:
#> CoxPH 0.7638(95CI=0.7137-0.805;SD=0.038)
#> SRF_Ensemble 0.729(95CI=0.6728-0.7642;SD=0.037)
summary(compare_models)
#>
#> Internally validated test performance of CoxPH and Survival Random Forest ensemble:
#> T C_score AUCROC BS BS_scaled Calib_slope Calib_alpha
#> CoxPH 8.5678 0.7240 0.7638 0.1475 0.1497 1.1592 0.1614
#> SRF_Ensemble 8.5678 0.6972 0.7290 0.1488 0.1403 1.1451 0.0798
#> Diff 0.0000 -0.0269 -0.0348 0.0013 -0.0094 -0.0141 -0.0816
#> pvalue NaN 0.8910 0.9292 0.5304 0.6168 0.5289 0.9675
#> sec
#> CoxPH 1.19
#> SRF_Ensemble 7.50
#> Diff 6.31
#> pvalue NaN
#>
#> Survival Random Forest ensemble has NOT outperformed CoxPH with mean c-index difference of-0.0269.
#> The difference is not statistically significant with the p-value = 0.891.
#> The data may NOT contain considerable non-linear or cross-term dependencies,
#> that could be captured by the Survival Random Forest.
#> C-score:
#> CoxPH 0.724(95CI=0.685-0.7697;SD=0.0369)
#> SRF_Ensemble 0.6972(95CI=0.6527-0.7312;SD=0.0292)
#> AUCROC:
#> CoxPH 0.7638(95CI=0.7137-0.805;SD=0.038)
#> SRF_Ensemble 0.729(95CI=0.6728-0.7642;SD=0.037)
# Other objects in the output object
names(compare_models)
#> [1] "results_mean" "results_mean_train" "return_models"
#> [4] "test" "train" "difftest"
#> [7] "main_stats" "randomseed" "useCoxLasso"
The second example takes simulated data that contains non-linear and cross-terms, and an outperformance of the tree model is expected. We will increase the default number of cross-validation repetitions to get a robust estimate, and choose CoxLasso as our baseline linear model.
mydata2 <- simulate_crossterms()
mypredictors2 <- names(mydata)[1:4]
compare_models2 <- survcompare(mydata2,mypredictors2)
#> [1] "Cross-validating Cox PH model using 2 repeat(s), 3 outer, 3 inner loops).For SRF inner CV is not used if oob = TRUE (default)."
#>
|
| | 0%
|
|= | 2%
|
|============ | 17%
|
|======================= | 33%
|
|=================================== | 50%
|
|=============================================== | 67%
|
|========================================================== | 83%
|
|======================================================================| 100%
#> Time difference of 0.2443171 secs
#> [1] "Cross-validating CoxPH and SRF Ensemble using 2 repeat(s), 3 outer, 3 inner loops).For SRF inner CV is not used if oob = TRUE (default)."
#>
|
| | 0%
|
|= | 2%
|
|============ | 17%
|
|======================= | 33%
|
|=================================== | 50%
|
|=============================================== | 67%
|
|========================================================== | 83%
|
|======================================================================| 100%
#> Time difference of 8.165306 secs
#>
#> Internally validated test performance of CoxPH and Survival Random Forest ensemble:
#> T C_score AUCROC BS BS_scaled Calib_slope Calib_alpha
#> CoxPH 8.0774 0.6103 0.6303 0.1667 0.0105 0.8752 0.2008
#> SRF_Ensemble 8.0774 0.6687 0.6717 0.1576 0.0640 0.7725 0.2045
#> Diff 0.0000 0.0584 0.0415 -0.0091 0.0535 -0.1026 0.0037
#> pvalue NaN 0.0072 0.0107 0.3152 0.0109 0.8000 0.4706
#> sec
#> CoxPH 0.24
#> SRF_Ensemble 8.17
#> Diff 7.93
#> pvalue NaN
#>
#> Survival Random Forest ensemble has outperformed CoxPH by 0.0584 in C-index.
#> The difference is statistically significant with the p-value 0.0072**.
#> The supplied data may contain non-linear or cross-term dependencies,
#> better captured by the Survival Random Forest.
#> C-score:
#> CoxPH 0.6103(95CI=0.5584-0.6862;SD=0.0495)
#> SRF_Ensemble 0.6687(95CI=0.6094-0.7177;SD=0.0436)
#> AUCROC:
#> CoxPH 0.6303(95CI=0.5669-0.7032;SD=0.054)
#> SRF_Ensemble 0.6717(95CI=0.6006-0.7422;SD=0.0522)
summary(compare_models2)
#>
#> Internally validated test performance of CoxPH and Survival Random Forest ensemble:
#> T C_score AUCROC BS BS_scaled Calib_slope Calib_alpha
#> CoxPH 8.0774 0.6103 0.6303 0.1667 0.0105 0.8752 0.2008
#> SRF_Ensemble 8.0774 0.6687 0.6717 0.1576 0.0640 0.7725 0.2045
#> Diff 0.0000 0.0584 0.0415 -0.0091 0.0535 -0.1026 0.0037
#> pvalue NaN 0.0072 0.0107 0.3152 0.0109 0.8000 0.4706
#> sec
#> CoxPH 0.24
#> SRF_Ensemble 8.17
#> Diff 7.93
#> pvalue NaN
#>
#> Survival Random Forest ensemble has outperformed CoxPH by 0.0584 in C-index.
#> The difference is statistically significant with the p-value 0.0072**.
#> The supplied data may contain non-linear or cross-term dependencies,
#> better captured by the Survival Random Forest.
#> C-score:
#> CoxPH 0.6103(95CI=0.5584-0.6862;SD=0.0495)
#> SRF_Ensemble 0.6687(95CI=0.6094-0.7177;SD=0.0436)
#> AUCROC:
#> CoxPH 0.6303(95CI=0.5669-0.7032;SD=0.054)
#> SRF_Ensemble 0.6717(95CI=0.6006-0.7422;SD=0.0522)
Detailed results can be extracted from the output object. For example, test performance for each data split (across all cross-validations and repetitions).
# Test performance of the ensemble:
print(round(compare_models2$test$SRF_ensemble,4))
#> T AUCROC BS BS_scaled C_score Calib_slope Calib_alpha test
#> 1 8.0774 0.5931 0.1860 0.0007 0.6053 0.3219 0.3983 1
#> 2 8.0774 0.7473 0.1575 0.1583 0.7188 1.1482 0.4151 1
#> 3 8.0774 0.7065 0.1213 0.0818 0.7100 1.2563 -0.2601 1
#> 4 8.0774 0.6526 0.1390 0.0618 0.6831 0.6430 0.1879 1
#> 5 8.0774 0.6592 0.1847 0.0346 0.6386 0.6200 0.5382 1
#> 6 8.0774 0.6716 0.1570 0.0467 0.6562 0.6458 -0.0524 1
# Mean test performance of the Cox-PH and SRF ensemble:
print(round(compare_models2$results_mean,4))
#> T C_score AUCROC BS BS_scaled Calib_slope Calib_alpha
#> CoxPH 8.0774 0.6103 0.6303 0.1667 0.0105 0.8752 0.2008
#> SRF_Ensemble 8.0774 0.6687 0.6717 0.1576 0.0640 0.7725 0.2045
#> Diff 0.0000 0.0584 0.0415 -0.0091 0.0535 -0.1026 0.0037
#> pvalue NaN 0.0072 0.0107 0.3152 0.0109 0.8000 0.4706
#> sec
#> CoxPH 0.24
#> SRF_Ensemble 8.17
#> Diff 7.93
#> pvalue NaN
# Main stats
round(compare_models2$main_stats,4)
#> mean sd 95CILow 95CIHigh
#> C_score_CoxPH 0.6103 0.0495 0.5584 0.6862
#> C_score_SRF_Ensemble 0.6687 0.0436 0.6094 0.7177
#> AUCROC_CoxPH 0.6303 0.0540 0.5669 0.7032
#> AUCROC_SRF_Ensemble 0.6717 0.0522 0.6006 0.7422
survcompare
to GBSG dataNow, lets apply the package to a real life data. We will use GBSG data from the survival package (https://rdrr.io/cran/survival/man/gbsg.html).
library(survival)
#prepare the data
mygbsg <- gbsg
mygbsg$time <- gbsg$rfstime / 365
mygbsg$event <- gbsg$status
myfactors <-
c("age", "meno", "size", "grade", "nodes", "pgr", "er", "hormon")
mygbsg <- mygbsg[c("time", "event", myfactors)]
sum(is.na(mygbsg[myfactors])) #check if any missing data
# run survcompare
survcompare(mygbsg, myfactors, randomseed = 42, repeat_cv = 5)
# [1] "Cross-validating Cox PH model using 5 repeat(s), 3 outer, 3 inner loops)"
# |===============================================================| 100%
# Time difference of 1.142473 secs
# [1] "Cross-validating CoxPH and SRF Ensemble using 5 repeat(s), 3 outer, 3 inner loops)"
# |===============================================================| 100%
# Time difference of 49.34405 secs
#
# Internally validated test performance of CoxPH and Survival Random Forest ensemble:
# T C_score AUCROC BS BS_scaled Calib_slope
# CoxPH 4.1797 0.6751 0.7203 0.2182 0.1489 1.2529
# SRF_Ensemble 4.1797 0.6938 0.7279 0.2162 0.1567 1.3961
# Diff 0.0000 0.0187 0.0076 -0.0020 0.0078 0.1432
# pvalue NaN 0.0000 0.0744 0.4534 0.0919 0.0229
# Calib_alpha sec
# CoxPH 0.4193 1.14
# SRF_Ensemble 0.4186 49.34
# Diff -0.0007 48.20
# pvalue 0.5410 NaN
We got the following results (depending on a random seed, it may slightly differ):
Survival Random Forest ensemble has outperformed CoxPH by 0.0187 in C-index. The difference is statistically significant with the p-value 1.347e-05***. The supplied data may contain non-linear or cross-term dependencies, better captured by the Survival Random Forest.
C-score:
CoxPH 0.6751(95CI=0.6417-0.7137;SD=0.0231)
SRF_Ensemble 0.6938(95CI=0.6596-0.7346;SD=0.0212)
AUCROC:
CoxPH 0.7203(95CI=0.6693-0.7708;SD=0.0356)
SRF_Ensemble 0.7279(95CI=0.6797-0.7781;SD=0.0311)
See other items as x$item. Items available: results_mean, results_mean_train, return_models, test, train, difftest, main_stats, randomseed, useCoxLasso
This example illustrates a situation when outperformance of the non-linear model is statistically significant, but not large in absolute terms. The ultimate model choice will depend on how critical such an improvement is for the model stakeholders.