Abstract
Type 1 diabetes (T1D) is a chronic, for now, incurable multifactorial disease caused by the immune-mediated destruction of insulin-producing pancreatic β-cells, causing devastating and costly acute and chronic complications, despite lifelong insulin treatment. Abrupt clinical onset is preceded by an asymptomatic disease phase of highly variable duration which is marked by the sequential appearance of various types of β-cell autoantibodies (AAbs). Optimized predictions of time to clinical onset facilitate early diagnosis which is also key to reducing the incidence of inaugural life-threatening diabetic ketoacidosis and planning novel prevention trials in the asymptomatic stage. Research in first-degree relatives of known T1D patients has shown that disease progression can be predicted by genetic and immune biomarkers, but these predictions are limited by using the traditional statistical approaches such as Cox regression models. This explorative study aims to uncover the potential of random forest machine learning algorithms as survival models within the biomedical context of T1D. Two random forest survival models were constructed in R. The first constructed model predicts how long it will take for individuals to go from single to multiple AAb positivity (AAb+), a crucial step in T1D development. The second model predicts the transition from multiple AAb+ to the onset of T1D. This paper demonstrates that our random forest survival models outperform traditional Cox regression methods; we conduct a detailed analysis of variable importance to uncover novel biomarker interactions; and we establish a refined framework for precise measurement and risk stratification of T1D, paving the way for earlier and more targeted intervention.
Original language | English |
---|---|
Article number | 1000211 |
Number of pages | 11 |
Journal | IEEE Open Journal of Instrumentation and Measurement |
Volume | 4 |
DOIs | |
Publication status | Published - 17 Mar 2025 |
Bibliographical note
Publisher Copyright:© 2022 IEEE.
Keywords
- Biomarkers
- biomedical computing
- biostatistics
- computer-assisted diagnosis
- decision trees
- diabetes
- ensemble learning
- random forests