NEAUA - A Machine Learning Approach for Predicting Prostate Cancer Gleason Grade Group at Radical Prostatectomy

Back to 2025 Abstracts

A Machine Learning Approach for Predicting Prostate Cancer Gleason Grade Group at Radical Prostatectomy
Lampros Pantazis, MD¹, Hersh H. Bendre, MD¹, Madhur Nayan, MD, PhD², Constantine Velmahos, MD¹, Clemens An, BS¹, Alexandra Hunter, BA¹, Douglas M. Dahl, MD¹, Chin-Lee Wu, MD, PhD¹, Matthew Wszolek, MD¹, Keyan Salari, MD, PhD¹, Adam S. Feldman, MD, MPH¹.
¹Massachusetts General Hospital, Boston, MA, USA, ²New York University Medical Center, New York, NY, USA.

BACKGROUND: Prostate cancer grade misclassification on biopsy remains common and can significantly impact risk stratification (RS) and management. The objective of this study was to develop a machine learning (ML) model to predict prostate cancer grade group (GG) at radical prostatectomy (RP).
METHODS: We identified patients in our institutional TRUS fusion biopsy database who underwent RP. We defined a 3-class ordinal outcome measure: GG1, GG2-3, or GG4-5 on RP. The cohort was split into training and test sets (70/30%). Important predictors were selected using the Boruta algorithm. 3 ML models were trained and tuned with 5-fold cross validation: Ordinal Forest, Ordinal Logistic Regression, and Ordinal CART decision tree. Model performance was evaluated on the test set and compared to biopsy GG and NCCN RS, where low, intermediate, and high risk predicted GG1, GG2-3, and GG4-5, respectively. Performance was assessed with the quadratic weighted kappa (QWK), macro area under receiver operating curve (ROCAUC), accuracy, macro/micro F1 scores.
RESULTS: 559 patients were identified, of which 85 (15.2%) had GG1, 391 (69.9%) GG2-3, and 83 (14.8%) GG4-5 prostate cancer at RP, with 25% (140/559) biopsy misclassification. Important predictors identified by Boruta included highest biopsy GG, NCCN Risk Group, primary biopsy target GG, PSA, and a weighted score calculated as the sum of biopsy regions per GG, each multiplied by its GG value. The best performing model was the Ordinal Forest with 0.588(95%CI:0.359-0.664) QWK, 0.826(0.774-0.891) ROCAUC. NCCN RS had 0.512(0.367-0.633) QWK(Table).
CONCLUSIONS: Our models outperformed biopsy GG and NCCN RS in predicting prostate cancer GG at RP and demonstrated strong predictive ability. These findings suggest that ML algorithms can significantly enhance RS and help guide prostate cancer management.

Performance metrics for predicting prostate cancer Gleason grade group at radical prostatectomy

Back to 2025 Abstracts