How Robust are the Results of Urologic Studies? Applying the P-value Fragility Index to Urologic Studies
Kristian Stensland, MD, MPH1; Navneet Ramesh, BA2; Mark Broadwin, BA2; Jonathan Batty, MBChB, MPH3, David Canes, MD1
1Lahey Hospital and Medical Center, Burlington, MA; 2Tufts University School of Medicine, Boston, MA; 3University Hospitals Coventry and Warwickshire, Coventry, United Kingdom
BACKGROUND: In the urologic literature, the publishability of a study and the weight attached to study findings too often hinges on attaining the threshold "statistically significant" p value of less than 0.05. Unfortunately, this reliance on and often misapplication of frequentist statistical principles promotes overconfidence in the results of studies. Often, the occurrence or non-occurrence of only a few events can switch a finding from "statistically significant" to not, or vice versa. A tool to measure this number, entitled the "fragility index" or FI, is a valuable addition to interpreting the robustness of p values and by extension the findings of studies. It is defined as the hypothetical number of events needed to change to non-events for a p value to become insignificant (p>0.05). Herein, we applied the concept of the fragility index to studies published in the urologic literature.
METHODS: All issues of five major urologic journals from the last 3 years were examined for studies including two groups with count data. Information from each study including manuscript descriptors, statistical methodology, and numbers of events/non-events in each of the two groups were extracted. Using a custom Python script, the fragility index was then calculated for each study comparison.
RESULTS: A total of 4,086 unique studies comprising 4,364 unique comparisons were extracted from the literature, of which 768 had count data. Of these, 732 comparisons reported significant p values (p < 0.05), and fragility index was able to be calculated for 715. Of these studies, 89 (12.4%) had a FI of 1; 193 (27.0%) had FI 2-5; 78 (10.9%) had FI 6-10; 81 (11.3%) had FI 11-20; and 274 (38.3%) had FI greater than 20. The median number of patients in the included comparisons was 347 (IQR 139-1,081). Studies in the lower half of enrollment (fewer than 347 patients) had FI <10 in 77% of studies (n=275/357). Studies in the upper half of enrollment (at least 347 patients) had FI < 10 in 24% of studies (n=85/358).
CONCLUSIONS: Despite a high average number of enrolled patients, roughly 40% of urologic studies would not have conventionally statistically significant results if only a few events (5 or fewer) had not occurred. The statistical results, and "significance" of these results, should be tempered by an understanding of the limits of the p value and frequentist statistics, and that these studies may be difficult to replicate. Care must be taken in interpreting the results of studies in urology, particularly prior to altering clinical practice. The fragility index can be applied to aid in interpreting the robustness of urologic studies and confidence in their results.
Back to 2018 Program