Developing the optimal model for identifying colorectal cancer stage in claims A simple model may be superior when it comes to interpretability and performance. Summary Background Methods For more information on a specific study or to connect with the Actionable Insights Committee, Sponsor: Carelon Research, Inc., a subsidiary of Elevance Health.
Domain(s): Oncology, quality of care
Researchers developed seven statistical models to predict whether a patient had stage IV CRC (versus stages I-III), one based on metastatic cancer diagnosis codes alone, while others included details like patient age and clinical variables such as other health conditions, symptoms, tests done, other diagnoses, doctor visits, and treatments received. These analyses estimated the importance of these factors in predicting a patient's cancer stage, using both simpler and more advanced statistical methods like logistic regression and machine learning models (such as elastic net, random forest, gradient boosted, and SuperLearner ensemble).
Models were developed using data from a random sample of 75% of patients (a "training" dataset) and then tested using the remaining 25%. Researchers measured how well the models performed by measuring sensitivity (the proportion of true cases identified by the model), specificity (the proportion of non-cases excluded by the model), and graphs to visualize the accuracy of the models using a receiver operating characteristic (ROC) curve. A score was calculated from the ROC curve from 0 to 1 (area under the curve, AUC), with higher scores indicating better model performance in accurately identifying stage IV cases compared to stages I-III.
Results
Within the claims data, the most important predictor of stage IV cancer was diagnosis of metastatic cancer at distant sites, with an AUC score of 0.87.
Adding additional information on symptoms, diagnostic tests, treatments, and survival enhanced our ability to distinguish between stage IV and other cancer stages, but a multivariable logistic regression model performed as well as the other machine learning models, achieving the highest AUC score across all models of 0.96.
Key Takeaways
Publications
Carelon Research project team: Valerie Haley, Maria I. Van Rompay, Joseph L. Smith*, Shiva Chaudhary, Kevin Schott, Michael Mack, Shiva K. Vojjala, Lauren E. Parlett
*Carelon Research associates at the time the study was conducted.
contact us at [email protected].
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut.
"Et harum quidem rerum facilis est et expedita distinctio!"
"Nam libero tempore, cum soluta nobis est eligendi."
"Temporibus autem quibusdam et aut officiis debitis!"