Limits in Generalizability Plague Lung Cancer Predictive Models

Study highlights challenges and strategies for improving lung nodule prediction


Thomas Z. Li, PhD
Li
Xiaonan Shao, MD
Shao

Predictive models for lung cancer often fall short when applied beyond the clinical settings in which they were developed, especially when evaluating biopsied lung nodules.

A recent study in Radiology: Artificial Intelligence highlights this challenge and offers guidance for improving models’ generalizability across institutions and clinical settings with strategies such as image harmonization and fine-tuning models on local patient populations.

More than 1.5 million Americans have at least one pulmonary nodule detected either incidentally at routine chest CT or during lung cancer screening every year. Nodule biopsy carries risks, costs and anxiety for patients. With 95% of indeterminate pulmonary nodules found to be benign, clinical guidelines recommend risk-stratifying nodules before resorting to invasive percutaneous or surgical interventions.

“We want to diagnose these pulmonary nodules earlier and noninvasively, and we want to avoid performing a biopsy on benign nodules,” said study lead author Thomas Z. Li, PhD, from the Medical-image Analysis and Statistical Interpretation (MASI) Lab at Vanderbilt University in Nashville, TN. “Better noninvasive diagnostic tools can help us do that.”

Statistical models for predicting lung cancer have the potential to improve risk stratification, aiding in earlier diagnosis of malignancy as well as reducing the risk of morbidity, costs and unnecessary anxiety associated with the workup of benign disease. Several models have been validated, but a systematic analysis of their performance is lacking.

To learn more, Dr. Li and colleagues evaluated eight validated predictive models developed to stratify pulmonary nodules. The models consisted of clinical prediction models, cross-sectional or longitudinal AI models, and multimodal approaches. The researchers evaluated the models on nine patient cohorts in three clinical settings: nodules detected during screening, incidentally detected nodules and pulmonary nodules deemed suspicious enough to warrant a biopsy.

“We wanted to know, in these three clinical settings, how do the models that have been developed so far perform?” Dr. Li said.

Analysis revealed that the eight lung cancer prediction models failed to generalize well across clinical settings and sites outside of their training distributions.

The findings show that a single external validation set is not enough to guarantee generalization performance, Dr. Li noted.

“You’re training the model on one group, which is a healthy screening population, and then you’re trying to apply it into a different group, and what we see is that it doesn’t work,” he said. “We need the model to be evaluated across multiple different institutions, and we need it to be evaluated in different clinical settings.”

Lung Disease

AI Model Fit Depends on Case Type

While no single predictive model emerged as the highest-performing model across all cohorts, certain models performed better in specific clinical contexts. Single-time-point chest CT AI performed well for nodules detected during screening but did not generalize well to other clinical settings.

Longitudinal imaging models performed better than single chest CT in incidental nodule settings. The multimodal models also demonstrated comparatively good performance on incidentally detected nodules. When applied to biopsied nodules, all models showed low performance, likely because they are inherently difficult to diagnose, the researchers said.

Within the AI approaches, Dr. Li noted that longitudinal imaging, when available, leads to performance gains across most of the cohorts by allowing the model to consider how imaging features change over time. The use of data from multiple modalities also appears to be effective, as imaging findings are often interpreted in the context of the patient’s clinical risk factors. The improved performance of longitudinal AI and multimodal AI in the study suggests that combining the two approaches is a promising direction.

Other promising methods include fine-tuning, the process of taking a pre-trained model and adjusting it to better fit the data, and image harmonization.

“Those involved in model deployment should consider fine tuning models within the cohort that matches the site of the clinical studies, and then using image harmonization to mitigate variations across different scanners and imaging protocols” Dr. Li said.

“Combining clinician experience with interpretable AI that doesn’t just give answers but also explains how it got there, will significantly boost diagnostic accuracy and confidence. This collaborative approach holds great promise for becoming part of standard clinical practice.”

— XIAONAN SHAO, MD

Enhancing Model Accuracy

Combining AI models with traditional statistical methods will boost the accuracy and interpretability of predictive models, according to Xiaonan Shao, MD, chief physician in the Department of Nuclear Medicine at the Third Affiliated Hospital of Soochow University in Taiwan, China.

In a commentary accompanying the study, Dr. Shao and his colleague, Rong Niu, MD, suggested transfer and few-shot learning as ways to enhance model accuracy.

Transfer learning involves adapting AI models pre-trained on large-scale datasets to new clinical tasks or different imaging environments, thereby improving the model’s generalizability across diverse institutions and scanning protocols. Few-shot learning enables AI models to quickly achieve robust performance even when only a small amount of labeled data is available.

“Recent research has demonstrated that both methods substantially enhance the performance and stability of lung nodule prediction models across multiple clinical scenarios,” Dr. Shao said.

The future of AI-assisted lung nodule assessment will emphasize human-AI interaction, Dr. Shao predicted, integrating clinical expertise with explainable AI rather than relying on either human judgment or machine automation.

“Combining clinician experience with interpretable AI that doesn’t just give answers but also explains how it got there, will significantly boost diagnostic accuracy and confidence,” he said. “This collaborative approach holds great promise for becoming part of standard clinical practice.”

For More Information

Access the Radiology: Artificial Intelligence study, “Performance of Lung Cancer Prediction Models for Screening-detected, Incidental, and Biopsied Pulmonary Nodules,” and the related commentary, “Bridging Artificial Intelligence Models to Clinical Practice: Challenges in Lung Cancer Prediction.”

Read previous RSNA News stories on medical imaging AI: