RT Journal Article
SR Electronic
T1 Machine Learning Approach for the Prediction of Age-Specific Probability of SCA3 and DRPLA by Survival Curve Analysis
JF Neurology Genetics
JO Neurol Genet
FD Lippincott Williams & Wilkins
SP e200075
DO 10.1212/NXG.0000000000200075
VO 9
IS 3
A1 Yuya Hatano
A1 Tomohiko Ishihara
A1 Sachiko Hirokawa
A1 Osamu Onodera
YR 2023
UL http://ng.neurology.org/content/9/3/e200075.abstract
AB Background and Objectives As the number of repeats in the expansion increases, polyglutamine diseases tend to show at a younger age. From this relationship, attempts have been made to predict age at onset by parametric survival analysis. However, a method for a more accurate prediction has been desirable. In this study, we examined 2 methods for survival analysis using machine learning and 6 conventional methods for parametric survival analysis of spinocerebellar ataxia (SCA)3 and dentatorubral-pallidoluysian atrophy (DRPLA).Methods We compared the performance of 2 machine learning methods of survival analysis (random survival forest [RSF] and DeepSurv) and 6 methods of parametric survival analysis (Weibull, exponential, Gaussian, logistic, loglogistic, and log Gaussian). Training and evaluation were performed using the leave-one-out cross-validation method, and evaluation criteria included root mean squared error (RMSE), mean absolute error (MAE), and the integrated Brier score. The latter was used as the primary end point, and the survival analysis model yielding the best result was used to predict the asymptomatic probability.Results Among the models examined, the RSF and DeepSurv machine learning methods had a higher prediction accuracy than the parametric methods of survival analysis. For both SCA3 and DRPLA, RSF had a higher accuracy than DeepSurv for the assessment of RMSE (SCA3: 7.37, DRPLA: 10.78), MAE (SCA3: 5.52, DRPLA: 8.17), and the integrated Brier score (SCA3: 0.05, DRPLA: 0.077). Using RSF, we determined the age-specific probability distribution of age at onset based on CAG repeat size and current age.Discussion In this study, we have demonstrated the superiority of machine learning methods for predicting age at onset of SCA3 and DRPLA using survival analysis. Such accurate prediction of onset will be useful for genetic counseling of carriers and for devising methods to verify the effects of interventions for unaffected individuals.DRPLA=dentatorubral-pallidoluysian atrophy; MAE=mean absolute error; RMSE=root mean squared error; RSF=random survival forest; SCA=spinocerebellar ataxia