Skip to main content

A validated multivariable machine learning model to predict cardio-kidney risk in diabetic kidney disease

Abstract

Background

Individuals with diabetic kidney disease (DKD) often suffer cardiac and kidney events. We sought to develop an accurate means by which to stratify risk in DKD.

Methods

Clinical variables and biomarkers were evaluated for their ability to predict the adjudicated primary composite endpoint of CREDENCE (Canagliflozin and Renal Events in Diabetes with Established Nephropathy Clinical Evaluation) by 3 years. Using machine learning techniques, a parsimonious risk algorithm was developed.

Results

The final model included age, body-mass index, systolic blood pressure, and concentrations of N-terminal pro-B type natriuretic peptide, high sensitivity cardiac troponin T, insulin-like growth factor binding protein-7 and growth differentiation factor-15. The model had an in-sample C-statistic of 0.80 (95% CI = 0.77–0.83; P < 0.001). Dividing results into low, medium and high risk categories, for each increase in level the hazard ratio increased by 3.43 (95% CI = 2.72–4.32; P < 0.001). Low risk scores had negative predictive value of 94%, while high risk scores had positive predictive value of 58%. Higher values were associated with shorter time to event (log rank P < 0.001). Rising values at 1 year predicted higher risk for subsequent DKD events. Canagliflozin treatment reduced score results by 1 year with consistent event reduction across risk levels. Accuracy of the risk model was validated in separate cohorts from CREDENCE and the generally lower risk Canagliflozin Cardiovascular Assessment Study.

Conclusions

We describe a validated risk algorithm that accurately predicts cardio-kidney outcomes across a broad range of baseline risk.

Trial registration

CREDENCE (Canagliflozin and Renal Events in Diabetes with Established Nephropathy Clinical Evaluation; NCT02065791) and CANVAS (Canagliflozin Cardiovascular Assessment Study; NCT01032629/NCT01989754).

Graphical abstract

Persons with diabetic kidney disease (DKD) are at riskfor progressive kidney failure and cardiovascular (CV) events. Using datafrom the CREDENCE trial of patients with type 2 diabetes and DKD,machine learning techniques were applied to create a highly accuratealgorithm to predict progressive DKD and adverse CV outcomes. Thealgorithm was validated both within an internal CREDENCE cohort andexternally in the CANVAS trial.

Research insights

What is currently known about this topic?

  • Persons with diabetic kidney disease are at high risk for cardio-kidneyevents. Predicting such events would be expected to allow for moretargeted intervention and efficient risk reduction.

What is the research question?

  • Can an inductive statistical approach using clinical and biomarkervariables be leveraged to produce a more accurate and validatedapproach for risk stratification of diabetic kidney disease in theCREDENCE trial?

What is new?

  • Machine learning was used to create a model that included clinical andbiomarker variables. The model was highly accurate for predicting abroad range of cardio-kidney outcomes and was validated both internallyin CREDENCE and externally in the CANVAS program.

How might this study influence clinical practice?

  • These findings could lead to a more personalized approach for reducingrisk in diabetic kidney disease.

Individuals with diabetes mellitus (DM) complicated by nephropathy represent a high-risk population, prone to progressive kidney failure and major cardiovascular events [1,2,3,4]. Despite this well-defined risk, a broad range of hazard exists within this population and ability of current tools to accurately predict cardio-kidney outcomes is limited. Because of this, an emphasis has been placed by major societies on the development of more granular approaches for the evaluation and management of diabetic kidney disease (DKD) [5, 6]. An improved ability to discriminate presence and severity of cardio-kidney syndrome in DKD would allow for more refined application of the growing list of treatments proven to reduce progression of disease and lower incident cardiac events. Effective treatments for DKD now include renin-angiotensin inhibitors, sodium/glucose cotransporter-2 inhibitors, glucagon-like peptide-1 receptor agonists and non-steroidal mineralocorticoid receptor antagonists, all of which are under-utilized and often applied in lower risk populations [4].

Recent work within the Canagliflozin and Renal Events in Diabetes with Established Nephropathy Clinical Evaluation (CREDENCE) trial dataset along with similar efforts in the Canagliflozin Cardiovascular Assessment Study (CANVAS; a trial of patients with type 2 DM and cardio-kidney risk) has clarified means by which to evaluate baseline risk for various cardio-kidney events among those in each trial. Such approaches have included the creation of risk prediction equations with clinical variables and circulating biomarkers to independently predict risk for outcomes such as progressive kidney failure or cardiovascular (CV) death [7,8,9,10,11,12,13]. Despite great interest to produce accurate tools to better predict outcomes, shortcomings to risk prediction models include lack of parsimony for variable inclusion and an inability to emphasize relative variable significance to the model. Overcoming these limitations would be expected to be provide greater discrimination and calibration for risk discrimination.

In other CV and kidney disease states, machine learning approaches have been used to develop accurate, validated risk models for diagnosis and/or prognosis [14,15,16,17]. Tools developed in these prior efforts used multiple data inputs including biomarker results and clinical variables with output that allowed for categorization of affected individuals as low, medium, or higher risk for the outcomes in question. Employing a similar methodology, the present work sought to develop a novel and comprehensive algorithm for the prediction of events in DKD. To do so, baseline clinical and all available biomarker variables from CREDENCE [18] were utilized to derive and internally validate a new panel; these results were then externally validated among study participants in CANVAS [19].

Methods

The design and results of the CREDENCE (NCT02065791) trial and CANVAS (NCT01032629 and NCT01989754) program have been previously published [18,19,20]. All study procedures for CREDENCE and CANVAS and subsequent analyses were approved by local ethics committees. Written informed consent was obtained for participation in both studies, including analyses of biomarkers. Study investigators had full access to all the data in the study and take responsibility for its integrity and the data analysis. The data sharing policy of Janssen Pharmaceutical Companies of Johnson & Johnson is available at https://www.janssen.com/clinical-trials/transparency. As noted on this site, requests for access to the study data can be submitted through Yale Open Data Access (YODA) Project site at http://yoda.yale.edu.

Study design and participant population

CREDENCE was a placebo-controlled trial of canagliflozin 100 mg versus placebo in 4401 participants with DKD at a high risk of progression: study participants with type 2 DM and DKD (according to the presence of estimated glomerular filtration rate [eGFR] between 30 and 90 mL/min/1.73 m2 and urinary albumin:creatinine ratio [UACR] of > 300) were enrolled. In this analysis, the 2711 study participants with available baseline plasma for analysis of biomarkers were included. The study was performed in 34 countries with significant European representation.

Plasma samples were collected at baseline and stored at–80 °C. The biomarkers in this analysis were planned and analyzed across both programs allowing evaluation across a wide range of canagliflozin-treated study participants. All biomarkers were measured via standard analytical methods in an independent laboratory by personnel. Markers evaluated included N-terminal pro–B-type natriuretic peptide (NT-proBNP), high-sensitivity cardiac troponin T (hs-cTnT), growth differentiation factor-15 (GDF-15), insulin-like growth factor binding proteins 1, 3 and 7 (IGFBP1, IGFBP3, and IGFBP7), placental growth factor (PlGF), soluble FMS-like tyrosine-kinase-1 (sFLT1), angiopoietin-2, and vascular endothelial growth factor-A (VEGF-A).

Goals of the present analysis

The goals of the present analysis were to develop a multivariable algorithm using baseline clinical variables and biomarker data to predict the primary composite endpoint of the CREDENCE trial by the 3-year time horizon of the trial. This primary composite endpoint included adjudicated endpoints of end-stage kidney disease (dialysis, transplantation, or a sustained eGFR of < 15 mL/min/1.73 m2), doubling of the serum creatinine level, renal death or CV death. Given importance of albuminuria as a predictor of the primary composite endpoint, we also evaluated performance of the developed algorithm among study participants as a function of baseline UACR. In addition, as progression of CKD in type 2 diabetes is frequently accompanied by heightened cardiovascular risk, we sought to evaluate association between the algorithm (developed to predict the primary composite endpoint of CREDENCE) and major cardio-kidney outcomes including end-stage kidney disease, doubling of serum creatinine, CV death, all-cause death, heart failure (HF) hospitalization and the composite of CV death/HF hospitalization.

Lastly, we sought to validate the performance of the algorithm among study participants in CANVAS, a population with type 2 diabetes, heightened cardio-kidney risk (but much lower baseline prevalence of DKD) and with availability of the same baseline features required by the developed risk algorithm as well as the same adjudicated composite endpoints. Much as with CREDENCE, the CANVAS program enrolled participants in 30 countries including significant European representation.

Statistical analyses

To directly identify predictors of a primary composite endpoint event by 3 years, only those CREDENCE trial participants with available follow up data and no missingness were included; this resulted in a sample size with complete data of 1117 study participants. These study participants in CREDENCE were then randomly split into training (60%; N = 671) and internal validation sets (40%; N = 446). Baseline characteristics were compared between those with and without the primary composite outcome of the CREDENCE trial; dichotomous variables were compared by using 2-sided Fisher exact tests, and continuous clinical variables were compared by using 2-sided 2-sample Student t tests. The biomarkers compared were tested with the Wilcoxon rank sum test.

All studies for biomarker selection and the development of the prognostic algorithm were conducted exclusively on the training set. To facilitate the predictive analysis, the concentration values for all proteins were transformed as follows: 1) they were log-transformed to achieve a normal distribution; 2) outliers were clipped at the value of 3 times the median absolute deviation; and 3) the values were rescaled to a distribution with a zero mean and unit variance. The starting sets of variables consisted of all clinical factors in the CREDENCE dataset along with concentrations of all available biomarkers. Candidate panels of proteins and clinical features were selected via least angle regression [21]. In this method, factors are selected one at a time and evaluated for predictive performance and goodness of fit at each step in that if the new variable improves the score performance, that variable is retained in the panel, and another variable is added. If the new variable does not improve the performance, it is removed from the panel, and the next-best variable is selected. This method is repeated until there are no options left that satisfy the algorithm’s goodness-of-fit requirements.

With this panel of interest, predictive analyses were run on the training set by using least absolute shrinkage and selection operator (LASSO) with logistic regression [22]; LASSO is a statistical and machine learning technique that improves accuracy of models by reducing overfitting. The present goal was creation of a model to predict the primary composite endpoint of CREDENCE, using only the variables in the panel of interest. From the analysis results, the LASSO’s shrinkage performance was used to determine when a given variable was not contributing significantly to the model; in those cases, we removed the variable and repeated the analysis.

Candidates were then subjected to assessment of improvement in calibration from their addition through minimization of the Akaike or Bayesian information criteria and goodness of fit in Hosmer–Lemeshow testing. For each variable, the fraction of new information was calculated using logistic regression with McFadden’s pseudo-R2 to determine the likelihood ratios.

Subsequently, the final algorithm was evaluated with the internal validation set: to do so, we generated the score distribution within the validation cohort, followed by C-statistic generation. Operating characteristics of the algorithm result were calculated, with sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) generated. An optimum binary prognostic cut-off using Youden’s index was determined. Following, the model results were rescaled to a range from 0 to10 using min–max normalization. After this, the range of the prognostic algorithm (a continuous variable) was then partitioned into 3 different risk levels, corresponding to multiple levels of risk. The partitions were determined according to PPV and NPV > 90% in the training set, and the validation set was evaluated against these partitions. participants were accordingly categorized by low risk (< 3), medium risk (3 to < 5) and higher risk (≥ 5).

To evaluate prognostic meaning of results from the risk algorithm, Cox proportional hazards analyses adjusted for algorithm result were performed to evaluate predictive value for the primary composite endpoint; hazard ratios (HRs) were estimated. To do so, we evaluated HR at the optimal binary threshold, as well as per-unit score increases; additionally, a three-tiered approach of low-, medium-, and high-risk score was also explored. HRs were provided with 95% confidence intervals (CI). Time to first primary composite endpoint as a function of elevated DKD score was calculated, displayed as Kaplan–Meier survival curves, and compared by using log-rank testing.

Given the importance of UACR as a predictor of the primary composite endpoint of CREDENCE, the performance of the final panel was compared to UACR for predicting the endpoint in question. Furthermore, risk for the primary composite endpoint across UACR levels was evaluated.

Lastly, performance of the DKD risk algorithm was further validated among study participants in CANVAS with available baseline clinical variables and biomarker information.

All statistics were performed by using R software, version 4.3.1 (R Foundation for Statistical Computing, Vienna, Austria). P values are 2-sided, with a value < 0.05 considered significant.

Results

A study flow diagram for the CREDENCE study participants included in the present analysis is detailed in Supplemental Fig. 1. From a starting population of 2711 in the CREDENCE trial, there were 1117 study participants with sufficient follow up and complete data from which analyses were performed. Supplemental Table 1 details comparisons between baseline characteristics of those included versus those not included from the CREDENCE trial dataset. This does show some differences between those included and those not included, with a pattern suggesting those included in the derivation and validation had higher concentrations of UACR and cardio-kidney stress biomarkers.

Characteristics of the CREDENCE trial derivation and validation sets

The baseline characteristics of the included study participants from the CREDENCE training set are detailed in Table 1, broken down as a function of incident primary composite endpoint during the trial. This demonstrates several noteworthy differences associated with the development of the primary endpoint, including worse kidney function, higher blood pressure, and a more elevated UACR (1867 vs. 740 mg/g; P = 0.001) among those suffering a DKD complication. Furthermore, those destined to suffer the primary endpoint had higher median (Q1, Q3) concentrations of NT-proBNP (380 [163, 975] vs. 175 [81, 396]; P = 0.001), hs-cTnT (27.8 [17.7, 42.9] vs. 18.5 [12.2, 26.7]; P = 0.001), IGFBP7 (140.7 [123.4, 161.1] vs. 119.4 [104.4, 137.1]; P < 0.001), and GDF-15 (3269 [2340, 4930] vs. 2456 [1840, 3421]; P < 0.001). Generally similar patterns were seen in the CREDENCE trial internal validation set (Supplemental Table 2).

Table 1 Baseline characteristics of the study population in the CREDENCE derivation cohort

Predictors of 3-year DKD events in the CREDENCE trial: Derivation Cohort

Over a median follow up of 1165 days, study participants in the derivation cohort experienced 187 primary endpoint events. Variables predictive of the primary composite endpoint of the CREDENCE trial were identified using machine learning. In order of predictive importance in the final model, these included: age (lower age associated; P < 0.001), IGFBP-7 (higher concentrations associated; P < 0.001), NT-proBNP (higher concentrations associated; P < 0.001), BMI (lower BMI associated; P < 0.001), hs-cTnT (higher concentrations associated; P < 0.001), GDF-15 (higher concentrations associated; P = 0.001), and systolic blood pressure (higher pressures associated; P = 0.01). This final model was then fitted into a proprietary risk algorithm, weighting the individual components based on their a) relative importance to the model and b) their numerical value.

Distribution of the DKD risk algorithm results among individuals in the derivation cohort with and without the primary composite endpoint outcome of adjudicated end-stage kidney disease, doubling of the serum creatinine level, or renal death or CV death are demonstrated in Supplemental Fig. 2, which shows excellent separation of those who did or did not experience the primary endpoint.

In analyses of discrimination for the primary endpoint by 3 years following enrollment, the DKD risk algorithm had an in-sample C-statistic of 0.80 (95% CI = 0.77–0.83; P < 0.001) (Fig. 1). In logistic regression, the continuous result from the algorithm was associated with a HR of 2.03 (95% CI = 1.83–2.26; P < 0.001). Using a single optimal cut-point selected using the Youden approach yielded a HR for the primary composite endpoint of 5.97 (95% CI = 4.25–8.37; P < 0.001). The optimal cutoff had sensitivity of 76%, specificity of 72%, PPV of 51% and NPV of 89%. Comparing the DKD risk algorithm (which included different weights to the constituent variables) to an unweighted Cox model with each of the constituent variables (the “null” model), the C-statistics were considerably different; the DKD risk algorithm C-statistic of 0.80 was significantly higher than the null model (0.68; 95% CI = 0.63–0.72; P < 0.001 for difference).

Fig. 1
figure 1

Receiver operator characteristic curve analysis showing accuracy of the DKD risk algorithm at baseline to predict the primary composite endpoint 3 years from enrollment. The risk model had excellent discrimination in the A internal derivation and B internal validation cohort from CREDENCE as evidenced by high area under the curve (AUC)

Expanding the algorithm into 3 tiers of risk (< 3, 3 to < 5, ≥ 5), in the derivation cohort, there were 176 study participants at risk level 1 (corresponding to a numerical value of 0 to < 3), 321 in risk level 2 (corresponding to a numerical value of 3 to < 5), and 174 in risk level 3 (corresponding to a numerical value of 5–10). For each increase in risk level, the HR was 3.43 (95% CI = 2.72–4.32; P < 0.001). The operating characteristics in the derivation cohort demonstrated that those with risk level 1 (corresponding to a DKD algorithm result of < 3) had NPV of 94%, while those with risk level 3 (corresponding to a DKD algorithm result of ≥ 5) had PPV of 58%.

In Supplemental Table 3, results demonstrate the algorithm was well-calibrated in the derivation cohort with each individual variable showing minimization of the AIC/BIC and with negative Hosmer–Lemeshow P values. The fraction of new information added by each variable is also shown in Supplemental Table 4.

As shown in the Kaplan–Meier curves detailed in Fig. 2, the three-tiered DKD risk algorithm had considerable ability to discriminate different times to onset of the primary composite endpoint of end-stage kidney disease, doubling of the serum creatinine level, renal death or CV death (Log-rank P value < 0.001). Notably, risk curves diverged prior to the first year after study entry and continued to separate over the duration of time in the study.

Fig. 2
figure 2

Kaplan–Meier curves detailing time to first primary composite outcome event in the A derivation and B internal validation cohorts from CREDENCE. Those with higher DKD risk algorithm scores had shorter time to first events compared to lower scores (Log-rank P value < 0.001)

Internal validation in CREDENCE

The DKD risk algorithm was then evaluated within an internal validation set of 446 CREDENCE trial participants who experienced 123 events during a median follow up of 1163 days. Distribution of the risk model results are shown in Supplemental Fig. 2 again showing good separation of those who did or did not experience the primary endpoint.

Also shown in Fig. 1, the DKD risk algorithm had an internally validated C-statistic of 0.80 (95% CI = 0.76–0.84; P < 0.001); this result is nearly identical to the derivation cohort. The sensitivity, specificity, PPV, and NPV of the optimal cutoff in the internal validation cohort were 76%, 74%, 52%, and 89%; all are also similar to the derivation cohort. A comparison of the DKD risk algorithm to an unweighted Cox model containing the same variables showed a lower C-statistic in the Cox model (0.66, 95% CI = 0.61–0.72; P < 0.001 for difference).

There were 128, 205, and 113 study participants in validation risk levels 1, 2, and 3 respectively. At risk level 1, the NPV for the primary composite endpoint was 95%, while risk level 3 had a PPV of 58%. The continuous DKD risk algorithm had a HR of 2.07 per risk level increment (95% CI = 1.81–2.36; P < 0.001) while the optimal cut-point selected using the Youden method had a HR for the primary composite endpoint of 6.04 (95% CI = 4.00–9.12; P < 0.001). Considered in groupings of low, medium, and high risk, for each risk level increment the HR for the primary composite endpoint increased with a HR of 3.64 (95% CI = 2.74–4.84; P < 0.001). As shown in Supplemental Table 3, the model was well-calibrated in the internal validation cohort with each individual variable showing minimization of the AIC/BIC and with negative Hosmer–Lemeshow P values.

In the internal validation cohort, Kaplan–Meier analyses demonstrate excellent discrimination for time to first event of end-stage kidney disease, doubling of the serum creatinine level, or renal death or CV death (Fig. 2) comparable to the derivation cohort.

Canagliflozin treatment

The CREDENCE derivation and validation cohorts were pooled to evaluate impact of canagliflozin on the primary composite endpoint, and to examine change in the DKD risk algorithm score from baseline to year 1 as a function of treatment assignment.

Across DKD algorithm results at baseline, the impact of canagliflozin to reduce the primary composite endpoint was generally consistent, without heterogeneity. Impact of canagliflozin on the rate difference per 100 patient-years in the primary composite endpoint was similar across all scores whether examined as a continuous variable (Supplemental Fig. 3; P for interaction = 0.85) or in score tertiles (P for interaction = 0.71; Supplemental Fig. 4).

Fig. 3
figure 3

Time to first primary composite endpoint in the external validation set of CANVAS. Those with higher DKD risk algorithm scores had shorter time to first event compared to lower scores (Log rank P value < 0.001)

From baseline to 1 year, the overall change in DKD algorithm result was a median (Q1, Q3) change of − 0.20% (− 11.35%, + 11.56%). In those treated with canagliflozin median change in the algorithm result was a decrease of − 2.7% (− 12.1%, + 9.5%); this was significantly different than those treated with placebo, who experienced an increase of + 1.8% (− 9.5%, 14.2%; P value for difference = 0.001).

From baseline to 1 year, study participants’ change in DKD algorithm result were categorized into quartiles; compared to the lowest quartile of 1 year change as referent, the HR for subsequently suffering the primary composite endpoint was 0.83 (P = 0.83), 1.60 (P = 0.03), and 4.23 (P < 0.001) for DKD algorithm change quartiles 2, 3, and 4.

The association between a DKD risk algorithm score above the optimal threshold value and various cardio-kidney outcomes in the CREDENCE internal validation cohort are detailed in Supplemental Table 5. This shows elevated score were strongly and statistically significantly predictive of end-stage kidney disease (N = 88 events), doubling of serum creatinine (N = 103 events), CV death (N = 76 events), all-cause death (N = 85 events), HF hospitalization (N = 44 events) and the composite of CV death/HF hospitalization (N = 110 events).

External validation in CANVAS

Following derivation, characterization, and internal validation of the DKD risk algorithm in CREDENCE, the score was then applied to study participants in CANVAS. In this external validation cohort, the risk for the same primary endpoint of end-stage kidney disease, doubling of the serum creatinine level, renal death or CV death was lower; of 3265 study participants at baseline, there were 115 adjudicated primary composite endpoint events during an average 2217 days of follow up. The C-statistic for the DKD risk algorithm in CANVAS was 0.72 (95% CI 0.68–0.76; P < 0.001). Despite the lower C-statistic in CANVAS, the HR for the continuous score was in CANVAS was 2.17 (95% CI = 1.87–2.51; P < 0.001) per score increment for the primary composite endpoint, which was similar to that in CREDENCE (HR = 2.03). Similarly, the optimal single cutoff performance showed a HR of 6.57 (95% CI = 4.42–9.78; P < 0.001) for the primary endpoint in CANVAS. The 3-level risk model was associated with a HR of 3.41 (95% CI = 2.66–4.38; P < 0.001) per risk level increase (from low to medium or medium to high) for the primary composite endpoint. In CANVAS, as with the CREDENCE study participants, Kaplan–Meier analyses show significant separation of low, medium, and high risk patients with respect to time to first primary composite endpoint event (Fig. 3).

To the extent that study participants in CANVAS differed from those in CREDENCE in that nephropathy was not required for entry to the former program, we then examined performance of the DKD risk algorithm for predicting the primary composite endpoint in those with UACR < 30, 30–300, and > 300 mg/g (the latter being the threshold for inclusion to CREDENCE). In these UACR groupings the C-statistics for the DKD risk algorithm in CANVAS were 0.63 (95% CI 0.57–0.70; P < 0.001), 0.74 (95% CI 0.66–0.81; P < 0.001), and 0.70 (95% CI 0.61–0.80; P = 0.02) respectively).

Urinary albumin to creatinine ratio

Although UACR is a known risk factor for progression of DKD, this variable was not selected by the inductive methodology to predict the primary endpoint of CREDENCE in the presence of the other variables that were selected. Nonetheless, UACR represents both a diagnostic and prognostic variable for DKD. Given established use of UACR to prognosticate cardio-kidney outcomes, how the DKD risk algorithm predicts the primary endpoint versus UACR was therefore of interest.

In the CREDENCE derivation and validation cohorts, the C-statistic for UACR was 0.75 (95% CI = 0.71–0.78) and 0.72 (95% CI = 0.67–0.77) respectively, while in CANVAS it was 0.66 (95% CI = 0.61–0.70); in each case, the C-statistic for UACR was significantly lower than the DKD algorithm. To further understand model importance relative to UACR, we next examined implication of a low numerical DKD risk algorithm result (< 3) across baseline tertiles of UACR; as shown in Table 2, in the setting of a low DKD risk result the HR across UACR was < 1, implying a lower risk for the primary endpoint regardless of UACR, while an elevated DKD risk algorithm score was consistently associated with higher risk for the primary endpoint across all strata of UACR.

Table 2 Implications of the DKD risk algorithm across baseline tertiles of UACR. The HR for the primary composite endpoint is expressed as a function of low or high DKD risk within tertiles of UACR. In the setting of a low-risk DKD algorithm result, the HR for the primary composite endpoint was low and without difference across UACR tertiles. Conversely, in the setting of high risk DKD algorithm result, risk for the primary composite endpoint was present even in lower UACR tertiles

Discussion

In this analysis from the CREDENCE trial, we report several findings (Graphical Abstract). First, using LASSO, a form of machine learning, available data from the trial program was fitted into a novel algorithm to predict or exclude risk for the development of the carefully adjudicated primary composite endpoint of the trial. This algorithm includes only those variables that provide the most parsimonious and accurate discrimination for the event, but then weights the variables based on their relative importance. The variables identified in this model included clinical variables as well as results of highly refined automatic immunoassays to yield a prognostic model for DKD that has considerable accuracy. When subjected to an internal validation within a hold-out group in the CREDENCE trial dataset, the risk algorithm performed consistently. Across scores at baseline, there was no heterogeneity for response to canagliflozin with respect to benefit on future DKD events, although arguably the relative benefits appeared more obvious in those with intermediate or higher scores. Additionally, increase in the score over time was associated with higher likelihood for DKD events; less increase was seen in those treated with canagliflozin. Lastly, the algorithm was externally validated in the generally lower-risk CANVAS program, remaining prognostic for DKD events even among those without prevalent DKD at baseline.

Although characterized by a progressive decline in eGFR, DKD has a variable prognosis and unpredictable trajectory [3]. Although clinical variables may be useful to predict the course of DKD, accuracy of such variables may vary. Nonetheless, the detection of risk in this population is identified by major kidney and diabetes societies as a top priority [5, 6], particularly as numerous therapies to reduce events in higher risk individuals with DKD are now available. To this extent, a risk-based approach to accelerated application of therapies with proven benefit in the diagnosis has been proposed [4] but a great need thus exists for accurate tools to accurately estimate likelihood of cardio-kidney events; such knowledge would be expected to assist in more cost-effective therapeutic decision-making. For example, given the staged nature of treatment for DKD, elevated risk scores would be expected to facilitate more precise administration of therapies such as renin-angiotensin inhibitors, SGLT2 inhibitors, non-steroidal mineralocorticoid receptor antagonists or glucagon like peptide-1 receptor agonists.

Prior studies from our group have utilized machine learning to create algorithms for use in other disease states that combine key variables to improve upon standard modeling [14,15,16,17, 23]. Using completely new datasets and focused on an outcome (DKD) not previously considered with this methodology, the novel risk model described in this work has considerably higher discrimination than standard modeling and retains calibration for predicting the primary composite endpoint of CREDENCE. A major limitation to most statistical models, generally speaking, is the application of non-inductive approaches and inclusion of all equally weighted variables into models that are often non-parsimonious. The advantage of the risk algorithm described in the present report is the use of methodology that leverages machine learning to select the most parsimonious model without bias, while also weighting each variable based on its relative importance to the outcome measure examined. In this analysis, the new DKD risk algorithm had substantially better discrimination than using the same selected variables in a traditional Cox model.

Among individuals with DKD in CREDENCE, depending on the cut-point applied, the DKD risk algorithm had high NPV and PPV, showing that the score provides potential utility to either exclude or identify risk for progression of DKD. Furthermore, the performance of the derived model was validated with equal performance in an internal hold-out validation cohort in CREDENCE but also in a separate validation in the CANVAS program where the same variables were available. The C-statistic of the model for DKD events was lower in CANVAS. In this population, however this population only 17.5% of the population had significant proteinuria. Despite the difference in baseline characteristics and risk between the trials, the DKD risk algorithm predicted hazard for major cardio-kidney events in both trials.

Scores from the DKD algorithm result appeared to change over time, with rising values associated with higher risk for incident events. Although baseline results for the score did not predict subsequent response to canagliflozin, randomized allocation to the drug did result in lower DKD risk algorithm results at 1 year. Lower score results at 1 year were associated with lower risk. This therefore implies that sodium-glucose cotransporter-2 inhibitor treatment of individuals might be expected to attenuate progressive increase of the score, and in turn this finding is associated with lower risk for DKD events. More data are needed regarding the utility of the risk model described for longitudinal monitoring of DKD risk and interaction with its treatment.

The constituent variables in the developed risk algorithm include clinical characteristics and circulating biomarkers, each contributing to the discrimination and calibration of the final risk model. Concentrations of NT-proBNP and hs-cTnT predict risk through the identification of individuals with cardiomyocyte stress and necrosis; both predict HF in this setting, a common and severe complication among individuals with DKD. In a similar manner, both IGFBP-7 and GDF-15 have been associated with progressive cardiac and kidney dysfunction, and both are independently linked to adverse cardiovascular outcomes. It would be expected that the various components of the risk model could be fit into an automated algorithm to rapidly calculate a score for the patient. Notably, NT-proBNP and hs-cTnT are already commercially-available, while highly-refined, large throughput automated immunoassays are available for research use of IGFBP-7 and GDF-15; theoretically if these latter two biomarkers were finally brought to clinical use, all 4 biomarkers could be rapidly run from the same blood sample and on the same instrument. Thus, together with easily ascertained clinical features and advances in data management, it is reasonable to expect rapid generation of a DKD risk result.

Although UACR is a validated biomarker in DKD with prior studies illustrating prognostic meaning in those with elevated UACR results, it is noteworthy that UACR was not selected by the machine learning methodology. Furthermore, in post hoc comparisons, the DKD risk algorithm was clearly superior to UACR for predicting future DKD events, with higher primary composite endpoint event rates in those with a high DKD algorithm result despite low UACR. One argument might be the discrimination of UACR is sufficient and the low cost of this biomarker makes this difference in prognostic accuracy acceptable. First, the results imply the risk model can prognosticate across a wide range of cardio-kidney risk, identifying risk in those with or without abnormal UACR. Second, the difference in discrimination in this study was not small, substantially favoring the model without UACR. Third, despite the potential added costs from different biomarkers than UACR, arguably the more refined ability to predict risk and intervene in a more targeted manner would be expected to be highly cost effective.

This study illustrates the applicability of the methods used in this analysis for other disease states where pre-existing datasets might be available with already-developed and/or commercialized biomarkers.

Limitations

Although this study demonstrates the feasibility of novel techniques to translate results from prognostic modeling into a clinically useful prognostic tool for the care of patients with risk for DKD events, there are limitations. Both the derivation and validation cohorts in this analysis involve participants of clinical trials who may be different from the real-world population; this may have both advantage and limitations. On the one hand, it is an advantage to have a clearly defined clinical profile that avoids unexpected confounding from excluded comorbidities. On the other hand, it may limit applicability of the findings of this analysis to larger populations of those with DKD. Thus, although validated both internally in CREDENCE and externally in CANVAS (considerably more validation than many risk algorithms are subjected to), the DKD risk algorithm should be further evaluated in other datasets. Second, we constrained the model derivation and internal validation only to those study participants in CREDENCE with complete variables and follow up. Although this allowed for confidence in the development of the risk algorithm without imputation of missing variables, the study participants that were included for the analysis had more advanced DKD compared to those who were not included. Use of imputation might address this issue, however imputation of outcome data (the main cause of missingness) is argued to increase noise to statistical estimates. Prior analyses have focused on similar biomarkers for prognosticating in CREDENCE [9]. The present analysis not only uses completely different methodology to develop a prognostic algorithm, it demonstrates superiority of this model compared to standard Cox modeling and further validates the results in CANVAS, something that was not previously done. Despite the development of highly precise automated immunoassays for GDF-15 and IGFBP7, these tests remain research use only at present but are far more advanced in their development than most biomarkers identified in translational analyses such as those identified using proteomics. Given how advanced these two research use assays are, theoretically, the panel identified in the present analysis could be commercialized in a rapid fashion, either as a lab-developed test or through regulatory clearance. Lastly, the endpoints examined in this analysis included a mix of cardiac and kidney outcomes. We actually view this as a strength as the focus on cardio-kidney-metabolic outcomes continues to grow; furthermore, these results were carefully adjudicated by a blinded endpoints committee in each of the two trials, which is also a great strength.

In conclusion, we developed a highly accurate and validated risk algorithm to predict risk for cardio-kidney events including progressive kidney disease, renal death or CV death, the primary composite endpoint of the CREDENCE trial. The DKD risk algorithm was assembled using inductive machine learning allowing it to be considerably more accurate for predicting risk than a standard Cox proportional hazards model. The DKD risk algorithm provided utility not only to exclude risk but also to predict it, and results from the algorithm appeared to change over time in parallel with risk, suggesting potential value for monitoring patients serially. When validated in lower-risk study participants in CANVAS, the DKD risk algorithm not only performed similarly in those patients that were more like those in CREDENCE (with established DKD at baseline) but also predicted incident DKD events in those without the diagnosis at baseline. Despite being eligible for treatment, a substantial minority with cardio-metabolic-kidney disease receive SGLT2 inhibitors [24]. Accordingly, the development of this risk tool responds to the challenge articulated for newer and more refined tools to judge risk in DKD [5, 6] and suggests that novel techniques such as the one utilized to develop this risk algorithm might be more widely applied to improve cardio-metabolic diagnostics and prognostics.

Data availability

The data sharing policy of Janssen Pharmaceutical Companies of Johnson & Johnson is available at https://www.janssen.com/clinical-trials/transparency. As noted on this site, requests for access to the study data can be submitted through Yale Open Data Access (YODA) Project site at http://yoda.yale.edu.

References

  1. Naaman SC, Bakris GL. Diabetic nephropathy: update on pillars of therapy slowing progression. Diabetes Care. 2023;46(9):1574–86.

    Article  PubMed  PubMed Central  Google Scholar 

  2. Oshima M, Shimizu M, Yamanouchi M, Toyama T, Hara A, Furuichi K, et al. Trajectories of kidney function in diabetes: a clinicopathological update. Nat Rev Nephrol. 2021;17(11):740–50.

    Article  PubMed  Google Scholar 

  3. Selby NM, Taal MW. An updated overview of diabetic nephropathy: diagnosis, prognosis, treatment goals and latest guidelines. Diabetes Obes Metab. 2020;22(Suppl 1):3–15.

    Article  PubMed  Google Scholar 

  4. Neuen BL, Tuttle KR, Vaduganathan M. Accelerated risk-based implementation of guideline-directed medical therapy for type 2 diabetes and chronic kidney disease. Circulation. 2024;149(16):1238–40.

    Article  PubMed  Google Scholar 

  5. Kidney Disease: Improving global outcomes diabetes work G. KDIGO 2022 clinical practice guideline for diabetes management in chronic kidney disease. Kidney Int. 2022;102(5S):S1-S127.

  6. de Boer IH, Khunti K, Sadusky T, Tuttle KR, Neumiller JJ, Rhee CM, et al. Diabetes management in chronic kidney disease: a consensus report by the American diabetes association (ADA) and kidney disease: improving global outcomes (KDIGO). Diabetes Care. 2022;45(12):3075–90.

    Article  PubMed  PubMed Central  Google Scholar 

  7. Januzzi JL Jr, Butler J, Sattar N, Xu J, Shaw W, Rosenthal N, et al. Insulin-like growth factor binding protein 7 predicts renal and cardiovascular outcomes in the canagliflozin cardiovascular assessment study. Diabetes Care. 2021;44(1):210–6.

    Article  PubMed  Google Scholar 

  8. Januzzi JL Jr, Liu Y, Sattar N, Yavin Y, Pollock CA, Butler J, et al. Vascular endothelial growth factors and risk of cardio-renal events: Results from the CREDENCE trial. Am Heart J. 2024;271:38–47.

    Article  CAS  PubMed  Google Scholar 

  9. Januzzi JL, Mohebi R, Liu Y, Sattar N, Heerspink HJL, Tefera E, et al. Cardiorenal biomarkers, canagliflozin, and outcomes in diabetic kidney disease: the CREDENCE trial. Circulation. 2023;148(8):651–60.

    Article  PubMed  Google Scholar 

  10. Januzzi JL Jr, Xu J, Li J, Shaw W, Oh R, Pfeifer M, et al. Effects of canagliflozin on amino-terminal pro-B-type natriuretic peptide: implications for cardiovascular risk reduction. J Am Coll Cardiol. 2020;76(18):2076–85.

    Article  CAS  PubMed  Google Scholar 

  11. Mohebi R, Liu Y, Hansen MK, Yavin Y, Sattar N, Pollock CA, et al. Insulin growth factor axis and cardio-renal risk in diabetic kidney disease: an analysis from the CREDENCE trial. Cardiovasc Diabetol. 2023;22(1):176.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Mohebi R, Liu Y, Hansen MK, Yavin Y, Sattar N, Pollock CA, et al. Associations of angiopoietin 2 and vascular endothelial growth factor-A concentrations with clinical end points. Clin J Am Soc Nephrol. 2024;19(4):429–37.

    Article  PubMed  Google Scholar 

  13. Tangri N, Ferguson TW, Bamforth RJ, Leon SJ, Arnott C, Mahaffey KW, et al. Machine learning for prediction of chronic kidney disease progression: validation of the Klinrisk model in the CANVAS Program and CREDENCE trial. Diabetes Obes Metab. 2024;26(8):3371–80.

    Article  PubMed  Google Scholar 

  14. Ibrahim NE, Januzzi JL Jr, Magaret CA, Gaggin HK, Rhyne RF, Gandhi PU, et al. A clinical and biomarker scoring system to predict the presence of obstructive coronary artery disease. J Am Coll Cardiol. 2017;69(9):1147–56.

    Article  PubMed  Google Scholar 

  15. Ibrahim NE, McCarthy CP, Shrestha S, Gaggin HK, Mukai R, Magaret CA, et al. A clinical, proteomics, and artificial intelligence-driven model to predict acute kidney injury in patients undergoing coronary angiography. Clin Cardiol. 2019;42(2):292–8.

    Article  PubMed  PubMed Central  Google Scholar 

  16. McCarthy CP, Ibrahim NE, van Kimmenade RRJ, Gaggin HK, Simon ML, Gandhi P, et al. A clinical and proteomics approach to predict the presence of obstructive peripheral arterial disease: from the catheter sampled blood archive in cardiovascular diseases (CASABLANCA) study. Clin Cardiol. 2018;41(7):903–9.

    Article  PubMed  PubMed Central  Google Scholar 

  17. Mohebi R, van Kimmenade R, McCarthy CP, Magaret CA, Barnes G, Rhyne RF, et al. Performance of a multi-biomarker panel for prediction of cardiovascular event in patients with chronic kidney disease. Int J Cardiol. 2023;371:402–5.

    Article  PubMed  Google Scholar 

  18. Perkovic V, Jardine MJ, Neal B, Bompoint S, Heerspink HJL, Charytan DM, et al. Canagliflozin and renal outcomes in type 2 diabetes and nephropathy. N Engl J Med. 2019;380(24):2295–306.

    Article  CAS  PubMed  Google Scholar 

  19. Neal B, Perkovic V, Mahaffey KW, de Zeeuw D, Fulcher G, Erondu N, et al. Canagliflozin and cardiovascular and renal events in type 2 diabetes. N Engl J Med. 2017;377(7):644–57.

    Article  CAS  PubMed  Google Scholar 

  20. Jardine MJ, Mahaffey KW, Neal B, Agarwal R, Bakris GL, Brenner BM, et al. The canagliflozin and renal endpoints in diabetes with established nephropathy clinical evaluation (CREDENCE) study rationale, design, and baseline characteristics. Am J Nephrol. 2017;46(6):462–72.

    Article  CAS  PubMed  Google Scholar 

  21. Efron B, Hastie T, Johnstone I, Tibshirani R. Least angle regression. Ann Statist. 2004;32(2):407–99.

    Article  Google Scholar 

  22. Tibshirani R. Regression shrinkage and selection via the lasso. J Roy Stat Soc: Ser B (Methodol). 1996;58(1):267–88.

    Article  Google Scholar 

  23. McCarthy CP, Neumann JT, Michelhaugh SA, Ibrahim NE, Gaggin HK, Sorensen NA, et al. Derivation and external validation of a high-sensitivity cardiac troponin-based proteomic model to predict the presence of obstructive coronary artery disease. J Am Heart Assoc. 2020;9(16): e017221.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Shin JI, Xu Y, Chang AR, Carrero JJ, Flaherty CM, Mukhopadhyay A, et al. Prescription patterns for sodium-glucose cotransporter 2 Inhibitors in U.S. health systems. J Am Coll Cardiol. 2024;84(8):683–93.

    Article  CAS  PubMed  Google Scholar 

Download references

Funding

Dr. Januzzi is supported in part by the Adolph Hutter Professorship.

Author information

Authors and Affiliations

Authors

Contributions

JJ, CM, and RR contributed to the conception or design of the work. MH contributed to the acquisition of data while JJ, CM, and YL contributed to the analysis. All authors contributed to interpretation of data for the work. JJ drafted the manuscript and all authors reviewed and critically revised the manuscript. All gave final approval and agree to be accountable for all aspects of work ensuring integrity and accuracy.

Corresponding author

Correspondence to James L. Jr. Januzzi.

Ethics declarations

Ethics approval and consent to participate

All study procedures for CREDENCE and CANVAS and subsequent analyses were approved by local ethics committees. Written informed consent was obtained for participation in both studies, including analyses of biomarkers.

Competing interests

Dr. Januzzi reports equity holdings in Imbria Pharma, Jana Care, and Fibrosys, current/recent grant support from Abbott, Applied Therapeutics, AstraZeneca, BMS, Novartis Pharmaceuticals, consulting income from Abbott Diagnostics, Beckman-Coulter, Jana Care, Janssen, Novartis, Prevencio, Quidel, and Roche Diagnostics and serves on clinical endpoint committees/data safety monitoring boards for Abbott, AbbVie, Amgen, CVRx, Medtronic, Pfizer, and Roche Diagnostics; Dr. Sattar has consulted for and/or received speaker honoraria from Abbott Laboratories, AbbVie, Amgen, AstraZeneca, Boehringer Ingelheim, Eli Lilly, Hanmi Pharmaceuticals, Janssen, Menarini-Ricerche, Novartis, Novo Nordisk, Pfizer, Roche Diagnostics, and Sanofi; he has received grant support paid to his university from AstraZeneca, Boehringer Ingelheim, Novartis, and Roche Diagnostics outside the submitted work; Dr. Vaduganathan has received research grant support, served on advisory boards, or had speaker engagements with American Regent, Amgen, AstraZeneca, Bayer AG, Baxter Healthcare, BMS, Boehringer Ingelheim, Chiesi, Cytokinetics, Fresenius Medical Care, Idorsia Pharmaceuticals, Lexicon Pharmaceuticals, Merck, Milestone Pharmaceuticals, Novartis, Novo Nordisk, Pharmacosmos, Relypsa, Roche Diagnostics, Sanofi, and Tricog Health, and participates on clinical trial committees for studies sponsored by AstraZeneca, Galmed, Novartis, Bayer AG, Occlutech, and Impulse Dynamics; Mr. Magaret and Ms. Rhyne are employees of Prevencio, Inc; Dr. Masson is an employee of Roche Diagnostics, Inc; Dr. Butler is a consultant to Abbott, American Regent, Amgen, Applied Therapeutic, AskBio, Astellas, AstraZeneca, Bayer, Boehringer Ingelheim, Boston Scientific, Bristol Myers Squibb, Cardiac Dimension, Cardiocell, Cardior, CSL Bearing, CVRx, Cytokinetics, Daxor, Edwards, Element Science, Faraday, Foundry, G3P, Innolife, Impulse Dynamics, Imbria, Inventiva, Ionis, Levator, Lexicon, Lilly, LivaNova, Janssen, Medtronics, Merck, Occlutech, Owkin, Novartis, Novo Nordisk, Pfizer, Pharmacosmos, Pharmain, Prolaio, Pulnovo, Regeneron, Renibus, Roche, Salamandra, Salubris, Sanofi, SC Pharma, Secretome, Sequana, SQ Innovation, Tenex, Tricog, Ultromics, Vifor, and Zoll. Dr. Hansen is an employee of Janssen Research & Development. Ms. Liu has no disclosures.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Januzzi, J.L.J., Sattar, N., Vaduganathan, M. et al. A validated multivariable machine learning model to predict cardio-kidney risk in diabetic kidney disease. Cardiovasc Diabetol 24, 213 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12933-025-02779-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12933-025-02779-5

Keywords