Development and validation of prediction model for incident overactive bladder: The Nagahama study

Objectives We aimed to develop models to predict new‐onset overactive bladder in 5 years using a large prospective cohort of the general population. Methods This is a secondary analysis of a longitudinal cohort study in Japan. The baseline characteristics were measured between 2008 and 2010, with follow‐ups every 5 years. We included subjects without overactive bladder at baseline and with follow‐up data 5 years later. Overactive bladder was assessed using the overactive bladder symptom score. Baseline characteristics (demographics, health behaviors, comorbidities, and overactive bladder symptom scores) and blood test data were included as predictors. We developed two competing prediction models for each sex based on logistic regression with penalized likelihood (LASSO). We chose the best model separately for men and women after evaluating models' performance in terms of discrimination and calibration using an internal validation via 200 bootstrap resamples and a temporal validation. Results We analyzed 7218 participants (male: 2238, female: 4980). The median age was 60 and 55 years, and the number of new‐onset overactive bladder was 223 (10.0%) and 288 (5.8%) per 5 years in males and females, respectively. The in‐sample estimates for C‐statistic, calibration intercept, and slope for the best performing models were 0.77 (95% confidence interval 0.74–0.80), 0.28 and 1.15 for males, and 0.77 (95% confidence interval 0.74–0.80), 0.20 and 1.08 for females. Internal and temporal validation gave broadly similar estimates of performance, indicating low optimism. Conclusion We developed risk prediction models for new‐onset overactive bladder among men and women with good predictive ability.


Introduction
OAB is defined as "a symptom characterized by urinary urgency, with or without urgency incontinence, usually with urinary frequency and nocturia in the absence of infection or other obvious pathology." 1,2 This is one of the common conditions among the general population: the prevalence of OAB is estimated to be around 10% to 20% and increases with age. [3][4][5] Though not a life-threatening disorder, OAB symptoms reduce HRQOL and lead to higher healthcare costs. 6,7 All over the world, and especially in aging societies, the prevalence of OAB is expected to further increase, worsening the associated HRQOL and health care costs to worsen.
Population-based prediction models are expected to support population health planning and policy decision-making. 8 With regard to OAB, some behaviors, such as healthy eating habits, keeping a healthy weight, quitting smoking, and performing pelvic floor muscle exercise are recommended to keep the bladder as healthy as possible. 9 If a reliable predictive model is developed, high-risk subjects would be identified, and then, we could encourage them to keep such good habits early on, which may potentially prevent incident OAB and save health care costs associated with drug therapies. Making such model accessible online could further facilitate clinical decision making by health-care providers and potential patients together.
However, to the best of our knowledge, no such model has been reported to predict new-onset of OAB. This may be due to the lack of sufficient data to develop a predictive model. This would require a large dataset, collected using a prospective design. We have recently reported a longitudinal analysis of voiding dysfunction using a large prospective cohort data from the general population. 10,11 These data can be used to develop appropriate models for new-onset OAB in the general population.
In this study, we aim to develop and validate models to predict incident OAB in 5 years using a large prospective cohort from the general population in Japan. In addition, as the mechanism of OAB onset is different between male and female due to factors such as the prostate, menopause, and delivery, we a priori had decided to develop a different model for each sex. To make the model easier to use, we aimed to build a web-based application to visualize the predicted results interactively.

Methods
This study followed the TRIPOD statement (Fig. S1). 8 The study protocol has been published elsewhere. 12 Study design and source of data This is a secondary analysis of the Nagahama study, a prospective population-based cohort study in Japan. This cohort project is conducted by the Kyoto University, the Nagahama City Office, and the non-profit organization Zeroji Club, and the details of the Nagahama study are reported elsewhere. 10,11 Recruitment took place between November 28, 2008 and November 28, 2010, and the participants were followed up once every 5 years after baseline assessment. The follow-up assessment was conducted from July 28, 2013, to February 10, 2016. The Nagahama City Office managed the personal information, and each participant was given a research ID and anonymized. The cohort study was approved by the ethics committee of the Kyoto University Graduate School of Medicine (no. G278) and by the Nagahama Municipal Review Board. Written informed consent was obtained from all participants.

Study population
Participants were recruited from the general community residents of Nagahama city in central Japan. Inclusion criteria were as follows: age 30 to 74 years, ability to independently participate in health examinations, no difficulties in communicating in Japanese, no serious diseases or other health issues, and voluntary participation. From all participants, we excluded those who have been diagnosed with OAB at baseline, based on the definition of the OABSS. 13

Study outcome
The outcome was new-onset OAB at the 5-year follow-up assessment. We used OABSS, a self-report measure assessing urinary urgency during the past week (Appendix S1). The questionnaire consists of the following items: (i) daytime frequency, (ii) nighttime frequency, (iii) urgency, and (iv) urgency incontinence score. OAB was defined as a total OABSS ≥3, with an urgency score (iii) ≥2. 13

Candidate predictor variables
Based on the literature, expert opinions, and the permissible number of variables estimated from sample size calculation (Appendix S2), we pre-selected predictor variables and developed two models for each sex in the protocol. 11 A total of 21 and 25 parameters of variables were included in Model 1 and Model 2, respectively, for males, and 21 and 24 parameters were included in Model 1 and Model 2, respectively, for females. Appendix S3 shows the details of the predictors.

Statistical analysis
Models 1 and 2 were developed separately for men and women using the logistic regression model, with penalized likelihood using the LASSO penalty to avoid "overfitting" of data and reduce the predictors. 14 It is desirable to use further penalization methods to avoid extreme predictions. Ridge, LASSO, and elastic net regression are all valid and popular penalization approaches. We selected the LASSO approach in this study, because LASSO can reduce the number of predictors, which can make it easier for a model to be applied in clinical practice. Note that LASSO performs variable selection. To find the optimal hyperparameters (k) required for LASSO, we followed a 10-fold cross-validation. We evaluated models' performance in both discrimination and calibration. 15 Model discrimination, i.e., the ability to distinguish the participants at high-risk and those at low-risk, was evaluated using the area under the ROC curve (AUC, equivalent to C-statistic). Model calibration, which measures the agreement between the predictions and the observed outcomes, was evaluated with calibration intercepts and slopes and was visualized with calibration plots. Good calibration is indicated by a calibration intercept near 0 and a calibration slope near 1. 16 To evaluate and compare the net benefit between Models 1 and 2, DCA was performed. 17 When evaluating the model performance with the data used to develop the model, we run the risk of optimism, closely related to overfitting. 18 To decide between the two models while avoiding optimism, we performed an internal validation via 200 bootstrap resamples to calculate optimism-corrected C-statistics, calibration intercept, and slope. In addition to that, we also performed a temporal validation by splitting the sample into 3 sets according to the year of baseline assessment (i.e., 2008, 2009, and 2010). We used the first 2 sets (2008 and 2009) as the training set, and the 2010 set as the testing set, to evaluate discrimination and calibration. We selected the final model after comparing cross-validation performance. If performance was deemed to be similar across the models, we adopted the simpler one. We used the "glmnet" package in R (version 4.1.2) for our analyses. 19 All code used for our analysis is provided in https://github.com/SatoshiFunada/2021OAB_prediction_ model. After deciding on the final model, we programmed a Shiny application in R to present the prediction results interactively. 20 There was a minor change with respect to the study's protocol. We did not use multiple imputation to address missing data and decided to go for a complete case analysis, as the missing data was less than 5% for all variables. 21 Otherwise, we adhered to the study protocol in data cleaning, model performance evaluation, and model validation. 12 Results Baseline characteristics Figure 1 shows the study flow chart. From the total 9764 participants (male: 3208, female: 6556) at baseline, we excluded 1475 participants who did not attend the follow-up assessment, and 912 participants with OAB and two with missing data for OAB at baseline (Table S1). We also excluded those with missing predictors (51 males [2.2%] and 106 females [2.1%]) and included 7218 participants (male: 2238, female: 4980) as a complete case data set. Table 1 shows the baseline characteristics excluding baseline OAB participants, and the median ages were 60 and 55 years, respectively. The number of new-onset OAB at follow-up assessment was 223 (10.0%) and 288 (5.8%) per 5 years in males and females, respectively (Table S2). As noted above, the data was divided into three sets according to the year of baseline assessment for a temporal validation. There were no apparent differences between the 2008 and 2009 cohort and the 2010 cohort (Table S3).

Model development
We did not detect problematic multicollinearity between predictor valuables by checking scatter plot matrix and calculating variance inflation factor (Table S4). Table 2 shows the covariates selected by LASSO from whole sample and the corresponding estimates of the coefficients of covariates, for all models. For both males and females, age, OABSS question 1, 2, 3, 4, HbA1c, eGFR, and BNP were selected as predictors. Smoking and diabetes were selected as a predictor for males, but both were not for females. On the other hand, BMI, alcohol habit, ischemic heart disease, sleep disturbance, and OSA were selected for females, but not for males. Prostate disease and PSA were selected for males, and delivery was selected for females.   Figure S2 and Table 3 show the ROC curves and the apparent C-statistic, i.e., the C-statistic calculated using the whole dataset for both training and testing using the LASSO models. Models 1 and 2 demonstrated similarly good discrimination for males and females, with an apparent C-statistic ranging from 0.76 to 0.78 in all instances. Figure S3 and  Table 3 provide calibration plots, the calibration intercept, and slope respectively. Models 1 and 2 also demonstrated similar and relatively good calibration as can be seen both visually and also judging by the value of the calibration intercept and slope for males. For females, Model 2 showed better calibration than Model 1 (intercepts were 0.20 vs 0.06 and slopes were 1.08 vs 1.03, for Model 1 vs 2 respectively). Figure S4 showed DCAs, and there were no apparent differences between Models 1 and 2 in both male and female.

Model validation
We performed an internal validation using 200 bootstrap resamples (Table 3). Models 1 and 2 demonstrated good discrimination, and the optimism-corrected C-statistic ranged from 0.75 to 0.76 in males and females, only slightly worse than the apparent C-statistic in most cases. In males, Models 1 and 2 showed equal performance in the optimism-corrected calibration (intercepts were 0.21 vs 0.18 and slopes were 1.10 vs 1.09, for Model 1 vs 2, respectively). On the other hand, Model 2 showed better calibration than Model 1 in females (the optimism-corrected calibration intercepts were 0.20 vs 0.05 and slopes 1.08 vs 1.02, for Model 1 vs 2, respectively). Next, we performed a temporal validation of Models 1 and 2 in both male and female ( Fig. S5; Table 3). Models 1 and 2 demonstrated good discrimination in both males and females, with a C-statistic from 0.77 to 0.78. In males, Model 2 showed much better calibration than Model 1 (calibration intercepts 1.12 vs 0.31 and slopes 1.48 vs 1.11, respectively). In females, Models 1 and 2 showed similar calibration (intercept 0.27, slope 1.16 vs 1.17, respectively). Based on results after the internal and temporal validation, we selected Model 2 as our final model for males. For females, Model 2 performed slightly better than Model 1. However, given that differences were small, and also given that Model 1 was a simpler model, we selected Model 1 as our final model for females. We created an interactive web-based application, in which baseline characteristics can be selected as the input, and the corresponding predicted probability of new-onset OAB 5 years later can be generated (Fig. 2a,b and https:// hxrfnn-satoshi-funada.shinyapps.io/OAB_prediction_model/).

Discussion
We developed risk prediction models of new-onset OAB for male and female in 5 years and performed internal and temporal validation using a large prospective cohort of the general population in Japan. The selected best performing prediction model for male included questionnaire assessment and blood test results as predictors, accounting for the anatomical complexity of male compared to female. On the other hand, only questionnaire assessment but no blood tests were included for female, which makes it easier to use in daily practice. Based on internally and temporally validated estimates of model performance, we deemed that both models, for men and women, had good predictive abilities.  In the model development stage, we included age, OABSS questionnaires, HbA1c, eGFR, and BNP for both males and females; however, other predictors were totally different. This is probably because the etiology of OAB is different between males and females; 4 therefore, it was reasonable to develop prediction models separately for males and females. In terms of sex-specific predictors, prostate disease and PSA had increased the risk of the incident OAB in male, which is consistent with previous reports. 22,23 On the other hand, our study indicated that delivery was shown to reduce incident OAB in females, which is different from previous reports. 24,25 This study was performed in a rural area, and most female participants (92%) have experienced delivery at baseline. When we compared the females with or without delivery experience, females without delivery were younger, but had a higher percentage of smokers and more comorbidities of cancer and depression than those with delivery. Therefore, females without delivery (8.5%) were a minority and may be unhealthy participants among the young general population in Nagahama cohort. This could explain why the lack of experience of delivery at baseline, in turn, increased the risk of incident OAB.
We found that for males, the prediction model including results from blood tests in the predictor list (i.e., Model 2) was better than that with only the questionnaire data. In general, among models with equal performance, the simpler the prediction model, the better they were ("Occam's razor"). Considering the ease of use, the model not including blood tests (Model 1) would perhaps be instead recommended for males. However, in a clinical setting, serum PSA is often tested to screen for prostate cancer and is useful to predict prostate volume and lower urinary symptoms in male. 26 Given this situation in clinical practice, we consider that the current model is acceptable to be used in clinical practice for males. Assuming that blood test could not be measured, we created web-based applications for Models 1 and 2 for both males and females (Fig. 2a,b).
Our prediction models have some implications for clinicians and policy makers. Our models can help identify highrisk populations of incident OAB in 5 years. This may help clinicians and policy makers deliver early interventions to such people to prevent new-onset OAB, including encouraging them to keep healthy eating habits and maintain a healthy weight, and to performing pelvic floor muscle exercise. 9 Since there is no established prevention strategy yet, future studies are needed to investigate the benefit of potential interventions in preventing OAB among high-risk subjects.
This study has several strengths. First, to the best of our knowledge, these are the first published prediction models for incident OAB. Second, we used a large prospective cohort data with high follow-up rate (85%) and few missing data (2.1%) compared with previous follow-up studies about urinary symptoms. 27,28 Third, we developed and validated prediction models according to TRIPOD guidelines and followed a study protocol. Following the prespecified analysis plan reduces the risk of selective reporting bias. 29 Fourth, we developed risk prediction models with good predictive ability and developed a web-based application to increase the accessibility by a wide range of people. Our models are expected to support population healthplanning and policy decision-making regarding the prevention of OAB and hopefully prevent the incidence.
There were several limitations in our study. First, as the study participants were healthy volunteers instead of a random sample, there may be some concerns about lack of representativeness. Second, our models did not include several possible predictors, such as prostate volume, use of some drugs such as anticholinergics, frailty, neurological disorders, and pelvic organ prolapse, which could have an influence on OAB symptoms. However, it is not pragmatic to measure all these clinical/biological markers in a large epidemiological study. Moreover, had we measured them, it would not serve the purpose of prediction in the general population either: widely informative and applicable prediction models should use easily measurable characteristics. Third, we defined new-onset OAB only according to the criteria by OABSS at follow-up assessment without frequency-volume chart. As OAB symptoms may be influenced by the treatment and fluctuate over time, we may have misclassified some newonset OAB patients during the 5 years. Information about treatment and further follow-up study is expected to strengthen the model accuracy. Fourth, our data were not enough to evaluate possible interactions and non-linear relationships to improve the model performance. Fifth, the participants were between the ages of 35 and 70, and our models may not be extrapolated to other age groups. Sixth, although we examined temporal validity, the Nagahama cohort is a single cohort, and we did not perform neither geographic validation in Japan nor global external validation with a fully independent external cohort outside Japan. To evaluate the general applicability of the models, future studies are needed to demonstrate the external validity of the models with other cohort data.
In conclusion, we have developed risk prediction models for new-onset OAB in the general population with good performance. Future studies are necessary to evaluate the generalizability of the models and develop new models with better performance, possibly including some additional strong predictors. We expect that our models will help identify high-risk populations for incident OAB, so that we could start prevention earlier. for other works not related to this study. SA has a research grant from Astellas, grants from Astra Zeneca, grants from Tosoh. SA receives honoraria from Janssen, Astra Zeneca, Astellas, and Sanofi outside of the submitted work. TAF reports grants and personal fees from Mitsubishi-Tanabe, personal fees from MSD, personal fees from Shionogi, personal fees from Sony, outside the submitted work. In addition, TAF has a patent 2018-177 688 concerning smartphone CBT apps pending, and intellectual properties for Kokoro-app licensed to Mitsubishi-Tanabe. OE was supported by the Swiss National Science Foundation (Ambizione grant number 180083). None of the contributing authors have any conflict of interest, including specific financial interests or relationships and affiliations relevant to the subject matter or materials discussed in the manuscript.

Approval of the research protocol by an Institutional Reviewer Board
Approval of the research protocol by an Institutional Reviewer Board: G278.

Informed consent
Informed consent was obtained from all participants.

Registry and the Registration No. of the study/trial
Not applicable.

Supporting information
Additional Supporting Information may be found in the online version of this article at the publisher's web-site: Figure S1. TRIPOD checklist.   Table S1. Clinical characteristics of the participants at baseline. Table S2. OABSS at baseline and follow-up. Table S3. Clinical characteristics of the participants excluding OAB at baseline. Table S4. VIF of predictor valuables. Appendix S1. Overactive bladder symptom score. Appendix S2. Sample size calculation. Appendix S3. Candidate predictor variables.