Deep Learning of Videourodynamics to Classify Bladder Dysfunction Severity
John Weaver, MD1, Madalayne Martin-olenski, BS2, Joseph Logan, MS2, Reiley Broms, BS2, Maria Antony, BS2, Jason Van Batavia, MD2, Dana Weiss, MD2, Christopher Long, MD2, Ariana Smith, MD3, Stephen Zderic, MD2, Yong Fan, PhD3, Gregory Tasian, MD2.
1Rainbow Babies and Children's Hospital/Case Western Reserve University, Cleveland, OH, USA, 2Children's Hospital of Philadelphia, Philadelphia, PA, USA, 3University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA.
Introduction:A videourodynamics (VUDS) study is rich in detail, but laborious for an individual to understand and its interpretation has high interobserver variability. We applied deep learning models of VUDS pressure tracings and fluoroscopic images to identify informative features associated with severity of bladder decompensation. We hypothesized that these deep learning features would correctly classify children with preserved bladder function from those with decompensated bladders.
Methods:We performed a cross-sectional study of 306 VUDS studies of children with spina bifida (SB) evaluated at our institution from 2019-2021. We excluded children with a history of a bladder augmentation or imperforate anus. The outcome was degree of bladder decompensation, defined as an ordinal variable: mild, moderate, and severe. Degree of bladder decompensation, which was the ground truth for our models, was defined by a panel of five expert reviewers (four pediatric urologists who regularly care for SB patients and an adult urologist fellowship trained in female pelvic medicine and reconstructive surgery). Factors considered in determining bladder dysfunction severity were those that increase the risk of upper tract injury, such as low compliance, detrusor overactivity, high detrusor leak point pressure, or detrusor sphincter dyssynergia (DSD). We built a random forest model to predict severity of bladder decompensation using prospectively collected clinical data (e.g. presence of leak, reflux, DSD, pressure at expected bladder capacity (EBC), bladder shape, percent EBC achieved). We also built a 1-dimensional convolutional neural network of raw data from the volume-pressure recordings and a deep learning imaging model of fluoroscopic images to predict severity of bladder decompensation. An ensemble model was generated by averaging the risk probabilities of these two deep learning models.
Results: One-hundred ten (35.9%) studies were classified as mild, 152 (49.7%) moderate, and 44 (14.4%) severe. The accuracy of the model using clinical data was 61% (95% CI 55%, 66%) and had a kappa score of 0.32 indicating fair agreement (Figure 1). Two hundred forty-four studies were completed when the bladder filled to at least 75% of the EBC; these were included in our urodynamic tracing model. The accuracy of the model was 68% (95% CI 64%, 73%) and had a kappa score of 0.46 indicating moderate agreement. The accuracy of the imaging model was 60% (95% CI 56%, 65%) and had a kappa score of 0.35 indicating fair agreement. The ensemble model (combination of the urodynamic tracing model and the imaging model) had an accuracy of 70% (95% CI 66%, 76%) and a kappa score of 0.49 indicating moderate agreement (Figure 2).
Conclusion:
Deep learning models built from urodynamics tracings and fluoroscopic images were able to automatically classify bladder dysfunction severity.
VUDS Flowsheet Model | ||||
Mild | Moderate | Severe | ||
Expert Panel | Mild | 64 | 34 | 1 |
Moderate | 35 | 65 | 4 | |
Severe | 2 | 12 | 9 | |
Figure 1. VUDS Flowsheet Model predictions compared to expert panel predictions. Clinical Flowsheet Model agreed with the reviewers' consensus ratings (ground truth) with an accuracy of 61% (95% CI 55%,66%) and a kappa score of 0.32 (fair agreement). |
Ensemble Model | ||||
Mild | Moderate | Severe | ||
Expert Panel | Mild | 69 | 30 | 2 |
Moderate | 23 | 85 | 3 | |
Severe | 0 | 12 | 15 | |
Figure 2. Ensemble Model predictions compared to expert panel predictions. 75% EBC Ensemble Model agreed with reviewers' consensus ratings (ground truth) with an accuracy of 70% (95% CI 66%, 76%) and had a kappa score of 0.49 indicating moderate agreement. |
Back to 2022 Abstracts