Cumulative Genetic Score and C9orf72 Repeat Status Independently Contribute to Amyotrophic Lateral Sclerosis Risk in 2 Case-Control Studies
Citation Manager Formats
Make Comment
See Comments
This article has a correction. Please see:

Abstract
Background and Objectives Most patients with amyotrophic lateral sclerosis (ALS) lack a monogenic mutation. This study evaluates ALS cumulative genetic risk in an independent Michigan and Spanish replication cohort using polygenic scores.
Methods Participant samples from University of Michigan were genotyped and assayed for the chromosome 9 open reading frame 72 hexanucleotide expansion. Final cohort size was 219 ALS and 223 healthy controls after genotyping and participant filtering. Polygenic scores excluding the C9 region were generated using an independent ALS genome-wide association study (20,806 cases, 59,804 controls). Adjusted logistic regression and receiver operating characteristic curves evaluated the association and classification between polygenic scores and ALS status, respectively. Population attributable fractions and pathway analyses were conducted. An independent Spanish study sample (548 cases, 2,756 controls) was used for replication.
Results Polygenic scores constructed from 275 single-nucleotide variation (SNV) had the best model fit in the Michigan cohort. An SD increase in ALS polygenic score associated with 1.28 (95% CI 1.04–1.57) times higher odds of ALS with area under the curve of 0.663 vs a model without the ALS polygenic score (p value = 1 × 10−6). The population attributable fraction of the highest 20th percentile of ALS polygenic scores, relative to the lowest 80th percentile, was 4.1% of ALS cases. Genes annotated to this polygenic score enriched for important ALS pathomechanisms. Meta-analysis with the Spanish study, using a harmonized 132 single nucleotide variation polygenic score, yielded similar logistic regression findings (odds ratio: 1.13, 95% CI 1.04–1.23).
Discussion ALS polygenic scores can account for cumulative genetic risk in populations and reflect disease-relevant pathways. If further validated, this polygenic score will inform future ALS risk models.
Glossary
- ALS=
- amyotrophic lateral sclerosis;
- AUC=
- area under the curve;
- C9orf72=
- chromosome 9 open reading frame 72;
- GO=
- Gene Ontology;
- GWAS=
- genome-wide association study;
- SNV=
- single nucleotide variation
Amyotrophic lateral sclerosis (ALS) is a fatal neurodegenerative disease characterized by rapidly progressive muscle weakness and death within 2–4 years from symptom onset1,2 with 50% of patients manifesting cognitive or behavioral dysfunction.1,2 Although ALS is traditionally divided into familial and sporadic forms, with familial ALS indicating those with an ALS family history, ALS genetic risk factors are present in both familial and sporadic patients.3 Under a monogenic model, a single risk gene is associated with a greater likelihood of developing ALS4 or contributes to a distinct phenotypic outcome, such as earlier age of disease onset.4,5 Since 1994, over 40 genes have been associated with ALS.6 The noncoding chromosome 9 open reading frame 72 (C9orf72) hexanucleotide expansion is the most common genetic form of ALS and is observed in 40% of familial and 10% of sporadic ALS in mixed European populations.7,8 Superoxide dismutase 1, TAR DNA binding protein 43, and fused in sarcoma are the next most common genes with variation frequencies of around 1% or less in sporadic cases.9 It is important that most patients with ALS do not carry a single causative ALS risk gene mutation. This highlights the notion of heritability, which captures the genetic and shared familial factors that contribute to disease risk.10 Heritability is as high as 38%–85% when considering twin data,11 36.9%–52.3% for parent-offspring pairs,10 43% for all first-degree relatives,12 and 7.2%–9.5% for common single nucleotide variation (SNV) (formerly single nucleotide polymorphism [SNP]).13,-,16 It is increasingly clear that many common SNVs may contribute a small amount of disease risk.17 Because most patients with ALS do not have a monogenic cause, it is crucial to understand the genetic contribution to ALS beyond single highly penetrant mutations to stratify population risk.
We hypothesize that polygenic scores will improve ALS risk prediction. The utility of a polygenic score for ALS, independent of C9orf72 status, has not been tested for ALS risk prediction. The goals of this study were to develop a genome-wide ALS polygenic score using an independent ALS cohort of participants not previously included in any genome-wide association study (GWAS) and test the score contribution to ALS risk models independently of C9orf72 status.
Methods
Michigan Study Participants and Sample Collection
All patients seen at the University of Michigan Pranger ALS Clinic are invited to participate, although the present case/control analysis is limited to those with ALS, thereby excluding participants with other forms of motor neuron disease. Healthy controls, without a personal or family history of a neurodegenerative disease in a first-degree or second-degree family member, are identified using a recruitment database available through the Michigan Institute for Clinical & Health Research and through population outreach via random address mailings. Participant demographics including sex (male, female), race/ethnicity (White or Caucasian, Black or African American, or Asian and Hispanic or Latino), and age (years) were obtained at the time of study enrollment. ALS diagnoses were confirmed by an ALS neurologist (S.A.G., E.L.F.) based on Gold Coast Criteria, who also recorded onset age (years), diagnosis age (years), onset segment (bulbar, cervical, lumbar, respiratory, thoracic), and presence of an ALS family history (yes or no) in the medical record. A family history of ALS in a first-degree or second-degree relative is considered positive. All participants provide venous blood, collected in an EDTA tube and frozen at −80°C for later DNA extraction.
Standard Protocol Approvals and Patient Consents
Study procedures of this Institutional Review Board (HUM28826)–approved longitudinal case/control study are published.18,-,20 All participants provided informed consent.
DNA Analysis
DNA was extracted using the QIAamp DNA Kit (Qiagen, Venlo, the Netherlands). Genome-wide genotypes at 1,748,250 positions were measured for 512 samples using the Infinium Multi-Ethnic Global-8 v1.0 array kit (Illumina, San Diego, CA) by the University of Michigan Advanced Genomics Core. All available clinical samples, including intentional duplicates (n = 6) and non-ALS diseased samples (6 primary lateral sclerosis, 12 other motor neuron disease), were included at this step to improve imputation quality. DNA samples were also analyzed for the presence of the C9orf72 repeat expansion per published protocols.7
PLINK (version 1.9) program was used to perform genetic microarray data quality control checks.21 Participants and SNVs were filtered using recommended thresholds.22,23 Participants were excluded for missing data at >1% of SNVs, discrepancies between genetic sex and predicted sex, and heterozygosity greater than 3 SDs from the mean. For intentional technical duplicate samples and unintentional related samples, the sample in each pair with the highest missingness was excluded. Participant inclusion based on genetic data quality control was visualized using a flow diagram, and 488 unique motor neuron disease and control participants met genetic quality filtering (eFigure 1A, links.lww.com/NXG/A610).
SNVs were excluded for missing genomic location data or missingness frequency in over 1% of samples. SNVs from autosomal chromosomes and the pseudoautosomal region of the sex chromosomes were handled separately from the nonautosomal regions of the sex chromosomes. Autosomal and pseudoautosomal region SNVs were further excluded for minor allele frequency <5% or for violating Hardy-Weinberg equilibrium (p value <10−6). SNV exclusion was described using a flow diagram, and 610,350 measured autosomal SNVs remained (eFigure 2, links.lww.com/NXG/A610).
Because population stratification by genetic ancestry can lead to confounding in genetic analyses,24 principal components were computed to identify genetic ancestry groups in the cohort merged with the 1000 genomes version 525 reference panel. Individuals of all genetic ancestries were included in the main analysis, which adjusted for the first 5 multiancestry principal components. A sensitivity analysis limited participants to European ancestry by only including those clustered with known 1000 genomes European ancestry samples (principal component 1 < 0.02, principal component 2 < 0.08). Principal components were recomputed within the European ancestry sample for adjustment covariates.
To harmonize with the ALS GWAS,17 measured and cleaned genetic data were imputed with 1,000 genomes version 525 using the Minimac4 program.26 After imputation, SNVs were filtered out if they had an imputation quality R2 < 0.5 or a minor allele frequency <1% in the study sample and described using a flow diagram (eFigure 3, links.lww.com/NXG/A610).
Polygenic Score Development
Imputed and cleaned SNV data facilitated polygenic score creation for cohort ALS risk. ALS risk SNV weights were derived from a GWAS of 20,806 ALS cases and 59,804 controls.17 Eligible SNVs were those present in the ALS GWAS and this study's cleaned and imputed data. PRSice 2.0 generated polygenic scores,27 using default pruning and clumping (250 kb window, R2 threshold 0.1) parameters to account for linkage disequilibrium. Polygenic scores were defined as the sum of the weighted number of variant alleles per individual. SNVs were included in the polygenic scores at a series of p value thresholds from the parent GWAS ranging from low p values (only most significant SNVs) to a 1.0 p value threshold (using all SNVs). The polygenic score with the highest R2 in relation to ALS case-control status was selected for our primary analyses. Per Polygenic Score Reporting Standards,28 for each SNV in the polygenic score, the identifier, chromosome, position, weight, and p value of association with ALS were provided (eTable 1, links.lww.com/NXG/A610). The cumulative ALS genetic risk by SNVs located beyond the C9orf72 genomic region was determined by excluding chromosome 9 SNVs between 27,400,000 and 27,700,000 base pair positions in the primary polygenic score. A sensitivity analysis allowing SNVs in this C9orf72 genomic region was also performed. A locus zoom plot29 (Figure 1) visualized SNVs in the C9orf72 region, and correlations of SNVs in this region with C9orf72 expansion status were tested using the Fisher exact test.
Single nucleotide variations (SNVs) are plotted by genomic position. The y-axis corresponds to −log10(p values) from the ALS genome-wide association study (Nicolas et al.17). The authors considered the C9orf72 region to span from 27.4 Mb to 27.7 Mb on chromosome 9 as illustrated with the blue dashed box. In an independent sample, our primary polygenic score excluded the C9orf72 region, and a sensitivity polygenic score included these SNVs. The SNV highlighted by the green diamond (rs3849943, located chr9:27543382) was associated with C9orf72 repeat status (fisher p value = 0.00001). Below the plot, positions of C9orf72 as well as other genes in the region are shown. ALS = amyotrophic lateral sclerosis.
Statistical Analyses
Statistical analyses were performed in R statistical software (version 4.1). Samples were excluded from analysis if they were duplicates or if they were from non-ALS or control participants (n = 17 non-ALS cases, n = 5 at-risk controls). Next, participants were excluded (n = 24) for missing data key covariates (sex, family history, age, C9orf72 expansion status). A total of 442 participants met study inclusion criteria (eFigure 1B, links.lww.com/NXG/A610). The distributions of continuous covariates were described using mean and SD, and the distributions of categorical covariates were described using number and sample percent. Covariate distributions for included and excluded samples were provided. The Wilcoxon rank-sum test for continuous covariates and the χ2 or Fisher exact test for categorical covariates tested for differences in the distributions of covariates between ALS and control participants.
All regression models were adjusted for sex, age, family history of ALS, and 5 genetic principal components. The first analysis used multivariable logistic regression assessed the association between ALS and control status with ALS polygenic score. The second model tested for an association with C9orf72 expansion status. The third model included both genetic components (ALS polygenic score and C9orf72 expansion status) as predictors. Because family history and C9orf72 expansion status had zero cell counts in controls, Firth penalized likelihood regression was used to avoid unstable effect estimates.
Additional statistical analyses, including classification testing, attributable fraction calculation, sensitivity analyses, and gene pathway analyses, are presented in eMethods (links.lww.com/NXG/A610).
Replication: Spanish Neurological Consortium
Participants were recruited across several sites in Spain as previously published30 or as part of the ALS Genetic Spanish Consortium as previously published31 (eMethods, links.lww.com/NXG/A610). All participants provided informed consent, and the study received local ethics board approval. The coordination and use of samples for this publication were approved by the Institutional Review Board of the National Institute on Aging. DNA extraction, genome-wide genotyping, C9orf72 repeat expansion assay, and processing which followed published protocols are presented in eMethods. Statistical methods for assessing replication are also presented in eMethods.
Data Availability
Data may be shared by qualified investigators by reasonable request to the corresponding author. A data request proposal is reviewed and approved by a review panel, and a signed data sharing agreement will then be approved.
Results
Study Participants
The primary analysis included 442 participants (223 controls and 219 ALS cases) (Table 1). A family history of ALS was present in 7.8% of ALS cases and 0% of controls. The C9orf72 repeat was present in 5.9% of ALS cases and 0% of controls. No age differences occurred between ALS and control participants, although the male participant proportion was higher in the ALS (59.0%) vs control (48%, p value = 0.027) group. The 24 participants excluded for missing genetic, demographic, or ALS assessment data (eFigure 1B, links.lww.com/NXG/A610) had similar characteristics to the analysis cohort (eTable 2).
Included Study Sample Characteristics by ALS Case and Control Status for Shared Ancestry Cohort
Genetic Data Characteristics and Polygenic Score Optimization
SNVs were measured at 1,748,250 positions. SNVs missing genomic location data, with missingness frequency of >1% of samples, with minor allele frequency <5%, or of Hardy-Weinberg equilibrium (p value <10−6) were removed, leaving 601,350 measured autosomal SNVs (eFigure 2, links.lww.com/NXG/A610). Imputation resulted in 47,109,465 SNVs. Imputed SNVs with an imputation quality R2 value <0.5 and SNVs with a minor allele frequency <1% were filtered. The final data set had 8,179,459 imputed SNVs (eFigure 3).
The C9orf72 region on chromosome 9 spanned from 27.4 Mb to 27.7 Mb. After pruning, 5 SNVs were present in this region (Figure 1). Of these, 1 SNV rs3849943, located at position 27,543,382, associated with C9orf72 expansion status (Fisher p value = 0.00001). Because our goal was to estimate the cumulative genetic risk for ALS beyond the C9orf72 expansion, out of caution, the primary polygenic score excluded this entire region. Polygenic score construction included SNVs and weights based on their association with ALS in an independent GWAS.17 Polygenic score performance was highest when constructed using a p value threshold of approximately 10−4, using 275 SNVs (eFigure 4, links.lww.com/NXG/A610). At this threshold, the incremental R2 for the polygenic score was approximately 1.2%.
For sensitivity analyses, a polygenic score using all available SNVs postpruning (n = 254,307 SNVs) showed an incremental R2 of approximately 0.4%. Another sensitivity analysis included the 5 SNVs in the C9orf72 region that were previously removed, and the observed polygenic score performance was also highest using a p value threshold of approximately 10−4 (n = 280 SNVs) (eFigure 5, links.lww.com/NXG/A610).
Associations Between Genetic Predictors and ALS Cases Status
In unadjusted analyses, ALS cases had higher mean ALS polygenic scores (average standardized score of 0.03) vs controls (average standardized score −0.08) (p value = 0.11) (eFigure 6, links.lww.com/NXG/A610). We examined the roles of genetic variables and family history in analyses adjusted for age, sex, and 5 genetic principal components. In the full study sample (n = 442 participants), a 1 SD increase in ALS polygenic score was associated with 1.28 times higher odds of ALS (95% CI 1.04–1.57) (Table 2), after also adjusting for C9orf72 repeat expansion status and family history of ALS. These findings were consistent when limiting the sample to participants lacking a C9orf72 repeat or family history of ALS (N = 416 participants). A 1 SD increase in ALS polygenic score was again associated with 1.28 times higher odds of ALS (95% CI 1.04–1.57).
Regression Results in the Full Sample Used in Sensitivity Analyses (n = 223 Controls, n = 219 ALS Cases)
ALS Case Classification Performance
Beyond association testing, we were interested in the performance of genetic factors in ALS case classification (Figure 2). The base classification model adjusted for sex, age, and 5 genetic principal components had an area under the curve (AUC) of 0.591. Adding family history of ALS alone to the base model increased AUC to 0.631 and improved classification over the base model (likelihood ratio test p = 0.06). Including C9orf72 repeat status as a covariate on top of the base model and family history increased the AUC to 0.647 and improved classification (likelihood ratio test p value <0.001). Adding the ALS polygenic score after family history and C9orf72 repeat status further raised AUC to 0.663 and improved classification (likelihood ratio test p value <0.001). To assess prediction accuracy, data sets were split into training and testing for 5-fold cross-validation. These AUC results were 0.539 for the base model, 0.588 adding family history, 0.603 adding C9orf72 repeat status, and finally 0.620 adding ALS polygenic score (eFigure 7, links.lww.com/NXG/A610). While the AUCs were attenuated, as a result of the sampling procedure, similar sequential prediction accuracy remained, highlighting the prediction capability.
Base model has sex, age, and ancestry principal components (n = 442). Receiver operating characteristic curve (ROC) for classification of ALS and control participants. The base model includes sex, age, and 5 genetic principal components and has an area under the curve (AUC) of 0.591. Adding family history to the base model increases the AUC to 0.631 (likelihood ratio test p value = 0.06). Adding C9orf72 expansion in addition to family history increases the AUC to 0.647 (likelihood ratio test p value <0.001). Adding polygenic score (PGS, region around C9orf72 removed) in addition to family history and C9orf72 expansion improves the AUC to 0.663 (likelihood ratio test p value <0.001). ALS = amyotrophic lateral sclerosis.
Attributable Fraction
To benchmark the fraction of ALS cases attributable to genetic factors, we compared those in the highest 20th percentile of ALS polygenic score with the rest of the sample. Here, 4.1% (95% CI −9.1 to 17.3) of ALS cases would be prevented if the highest 20th percentile of ALS polygenic score was at the level of the rest of the population. For the C9orf72 expansion, 6.3% (95% CI −2.7 to 15.3) of ALS cases would be avoided if they lacked the expansion.
Sensitivity Analyses
Sensitivity analysis (eResults, links.lww.com/NXG/A610, Table 2), including analysis around the C9orf72 region, and an analysis restricted to European ancestry participants (eTable 3, eTable 4, eFigure 8), overall showed findings consistent with the main analysis.
Gene Pathway Analysis
In the 275 SNV-associated genes, included the polygenic score, richR identified 65 highly enriched Gene Ontology (GO) biological process terms, including several related to the neuronal system, such as “neuron differentiation”, “generation of neurons”, “neuron projection morphogenesis”, “neurogenesis”, and “neuron development” (Figure 3, eTable 5, links.lww.com/NXG/A610). A total of 9 Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways were significantly enriched at a nominal p value <0.05, which included “Glycosphingolipid biosynthesis-ganglio series”, “Fatty acid degradation”, and “Pancreatic secretion” (Figure 4, eTable 6).
The 50 most significantly enriched biological functions using GO are illustrated in dot plots. Rich factor refers to the proportion of single nucleotide variation (SNV)–associated genes belonging to a specific term. The color indicates the level of significance (−log10padj). The numbers correspond to the number of SNV-associated genes belonging to the term. GO = Gene Ontology.
The significantly enriched Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways are illustrated in dot plots. Rich factor refers to the proportion of single nucleotide variation (SNV)–associated genes belonging to a specific term. Node size (gene number) refers to the number of SNV-associated genes within each term, and node color indicates the level of significance (−log10p value).
Replication Results
The Spanish cohort had 548 ALS cases and 2,756 controls, after removing 232 participants for missing age or C9orf72 information. Family history, C9orf72 expansion, and sex were associated with ALS case status (eTable 7, links.lww.com/NXG/A610). Owing to differences in genotyping arrays and allele frequencies between the Michigan and Spanish cohorts, available SNVs varied between the 2 cohorts. To harmonize analyses, SNVs were restricted to those available in both cohorts; the best performance in the Michigan cohort among overlapping SNVs resulted from a polygenic score consisting of 132 SNVs (p value threshold = 5 × 10−5). In the Spanish cohort, a 1 SD increase in the harmonized ALS polygenic score was associated with 1.11 higher odds (95% CI 1.01–1.22) of ALS case status (p value = 0.028), adjusted for sex, age, C9orf72 expansion, family history, and 5 genetic principal components. In the Michigan cohort, a 1 SD increase in the harmonized ALS polygenic score was associated with 1.22 higher odds (95% CI 1.00–1.50) of ALS case status (p = 0.04) when including all ancestries, mirroring results above with the 275 SNV polygenic score. When limiting to European genetic ancestry in the Michigan cohort, the harmonized 132 SNV polygenic score had a stronger association, where 1 SD increase in ALS polygenic score was associated with 1.27 higher odds (95% CI 1.03–1.57) of ALS case status (p value = 0.02). Meta-analysis of the Spanish cohort and Michigan cohort (all ancestry) resulted in an estimate of 1 SD increase in ALS polygenic score being associated with 1.13 higher odds (95% CI 1.04–1.23) of ALS case status (p value = 0.004) (eFigure 9, links.lww.com/NXG/A610).
Discussion
ALS risk factors are incompletely understood. Models that predict the steps involved in developing ALS32 are necessary to generate ALS risk profiles. Representing this genetic risk33 is critical because most individuals with ALS lack a monogenic ALS risk gene. Because genetic risk may be distributed throughout the genome, identifying polygenic risk facilitates an understanding of the multiple ALS pathologic pathways. Here we developed a weighted polygenic score using a large ALS-control GWAS.17 This score differed significantly in ALS cases vs controls from an independent Michigan cohort. Furthermore, this polygenic score represents important genes and biological functions in the pathophysiology of ALS.
In this study, the ALS polygenic score with the best model fit and lowest p value was represented by 275 SNVs when excluding the region around C9orf72 and 280 SNVs when including the region. We tested other SNV combinations as determined by default PRSice-2 p-value thresholds and a model including all SNVs. In each case, the model with fewer SNVs outperformed the larger models, suggesting that the genetic contributions to ALS are limited to a smaller subset of genes as opposed to a wide-ranging set of genes across more genomic regions. Next, we showed that an SD increase in the ALS polygenic score raised ALS odds by 1.28 times in both models without and with the C9orf72 region. Of interest, risk increased when the C9orf72 region was included, even after adjusting for the C9orf72 expansion, suggesting a possible role for alleles around the C9orf72 region on disease status, even in the absence of the repeat. Unsurprisingly, in these models, ALS risk was disproportionate for individuals with a family history or the C9orf72 expansion. Removing individuals with an ALS family history or a C9orf72 expansion did not change the impact of the polygenic score on ALS risk, meaning the polygenic score itself plays an essential role on the overall ALS risk profile. In addition, findings persisted when restricting to a European genetic ancestry population.
Polygenic scores summarize the combined effects that common and low-frequency alleles have on disease risk, thereby summarizing the genetic architecture of that disease.34 Multiple fields use polygenic scores to explain risks such as cardiovascular disease, cancers, neurodegenerative diseases,34,35 and other phenotypic traits.13 While polygenic scores are gaining traction for ALS,36 few studies propose an ALS-specific polygenic score that can stratify populations at risk for ALS.
In contrast to our methods, an Australian group leveraged a list of 853 genetic variants with a changed amino acid sequence from a comprehensive literature search.4 After screening the population, 43 genetic variants from 18 genes were retained in the model, affecting 35.4% of their ALS population. However, the authors did not further develop polygenic scores.4 Another group36 identified individuals in the Arivale Scientific Wellness cohort at elevated genetic risk for ALS using polygenic risk scores developed through literature and sought linkages to proteomics, metabolomics, and other clinical laboratory information. This group found that increased Ω-3 and decreased Ω-6 fatty acid levels and higher IL-13 levels correlated with ALS genetic risk. Based on KEGG analysis of the polygenic score developed herein, we found enrichment of the fatty acid degradation pathway,37 which is consistent with ALS pathophysiology and suggests genes included in the polygenic score have biologic plausibility.
Another study used used sparse canonical correlation analysis to identify a polygenic score of cognitive dysfunction in an ALS population.38 Like our methods, the authors focused on SNVs achieving genome-wide significance in the Nicolas study17 and with risk loci in ALS and frontotemporal dementia. Of the 45 SNVs used in their models, 27 were associated with cognitive performance in their ALS population, involving the genes MOBP, NSF, ATXN3, ERGIC1, and UNC13A. Our polygenic score also included SNVs in MOBP, ATXN3, and UNC13A, thereby supporting its validity. Additional uses of polygenic scores in ALS include examining polygenic traits for other diseases that overlap with ALS.13,39 Although this was not our approach, such studies have yielded linkages between ALS and traits of schizophrenia, cognitive performance, and educational attainment.13,39 Our findings are consistent with an Australian case-control study that observed a polygenic score for ALS was associated with case status.39
Polygenic scores have shown utility in other neurodegenerative conditions, such as Alzheimer disease, to find those at high and low genetic risk.40 For example, a polygenic score derived from the International Genomics of Alzheimer's Disease Project GWAS showed it could predict participants who would transition from mild cognitive impairment to late-onset Alzheimer disease.41 A similar approach using a polygenic score created from an Alzheimer's cohort GWAS data set associated with incident dementia in a large Swedish birth cohort.42
Our disease classification model further supports the utility of our polygenic score. Our polygenic score improved model performance, even one that included the most prevalent ALS risk gene, the C9orf72 expansion. In Alzheimer's, similar findings are noted, where a polygenic risk score was able to classify Alzheimer cases vs controls with an AUC of 0.83, even when excluding APOE4 carriers.43 This indicates that these genetic models are beneficial in case classification, even when considering strong genetic risks, which superimpose on polygenic risk. Another analysis similarly showed that polygenic scores in Alzheimer disease could classify patients accurately and that the prediction improved when incorporating additional variables such as sex and age.44 In other disorders with large effect size mutations, a polygenic score has also provided additional classification information.45
Because polygenic scores often overlap in persons with and without a disease of interest, focusing on patients with polygenic scores in distribution tails may offer better predictive power.46 Thus, to add further perspective to this polygenic risk, we showed that 4.1% of ALS cases could be avoided for individuals with the highest 20% of polygenic score if an intervention was possible. While this population attributable risk approach considers the fraction of disease caused by exposure, this idea can also be applied to genetic data.47,48 For example, a study of polygenic scores in cutaneous squamous cell carcinoma showed that removing all risk alleles from a population would decrease the risk of this cancer by 62%.49 The authors argue that identifying those at the highest genetic risk could inform programs for skin cancer screening, with the caveat that interactions of SNVs with environmental factors50 are not included in the model. A parallel approach is also proposed for breast cancer to help identify populations that would benefit from targeted risk reduction strategies.51 A similar analysis has shown changes in the prevalence of type 2 diabetes, breast cancer, hypertension, and myocardial infarctions, if a proportion of polygenic risk is removed or enhanced in the population.52 Currently, there is no biomarker or tool that can definitively predict who will develop ALS later in life. Therefore, even if the polygenic score can only explain a small number of individuals at risk, it could be an important screening method for risk reduction.
Replication of these findings is important to determine the generalizability of the results. We used genotype and ALS phenotype data from an independent Spanish cohort as a replication cohort. Although the SNVs included in the polygenic score were adjusted due to the available overlap of SNVs in both data sets, there was consistency in the magnitude of the polygenic score effect, further providing support for our proposed polygenic score. Replication of polygenic scores is critical to ascertain that the methods and population background used to develop the score is generalizable.46 Furthermore, replication cohorts can determine which risk variants are applicable across diverse populations.53 Future work may incorporate a very recent updated ALS GWAS, although we selected the older GWAS here to maintain consistency with the existing literature.54 Replicating polygenic scores in ALS remains important, although this requires large numbers of samples from participants not included in GWAS used to derive SNV weights.55
We next queried how this set of SNVs affects disease pathobiology. Through gene enrichment and pathway analysis, we showed that this polygenic score selects multiple pathways relevant to ALS biology, including synaptic signaling, regulation of protein metabolic process, neuron projection, and axon guidance. Using KEGG pathways, we also identified important ALS biological functions, including glycosphingolipid biosynthesis and fatty acid degradation.20,56 A cohort of 78,500 individuals developed a polygenic score for biological pathways and cell types to determine involvement in ALS.57 Significant pathways included those involved in neuronal development and differentiation with an emphasis on the cytoskeleton.57 Of these pathways, the cytoskeleton pathway was significant for individuals both with and without the C9orf72 repeat expansion, whereas the autophagosome pathway was only significant for C9orf72 carriers. Overlapping enriched GO pathways in our polygenic score with those of that study58 included neuron projection morphogenesis, cell morphogenesis involved in differentiation, neuron development, cellular component morphogenesis, cell development, and cell projection organization. The overall overlap shows that these 2 different methods for developing a polygenic score selects similar pathways. Other studies of gene expression in ALS have also identified dysregulated metabolic pathways and cytoskeletal pathways.58
This study has limitations. Owing to cost and a research interest in common genetic variants, we performed genome-wide genotyping instead of whole genome sequencing. While whole genome sequencing would allow us to better account for genetic background, the method we used is validated across many studies. In addition, the study population size is small compared with the number of individuals affected by ALS. However, the sample size here was limited to participants not included in prior GWASs and is thus a strength. This is important because developing polygenic scores from participants who are already in the reference GWAS may lead to biased results. In addition, because we did use a lower-cost genotyping strategy imputed to maximize overlap with the ALS GWAS used for weights, these methods could be beneficial for population screening where the cost of whole genome sequencing is not economically feasible. In addition, this study mainly consisted of participants with a European genetic ancestry. To support the generalizability of these findings, improving enrollment of and study of genotypes from participants with diverse backgrounds is required.
In conclusion, we find that a polygenic score for ALS can account for cumulative genetic risk in the population and reflect cellular processes that are relevant to ALS. If further validated, this polygenic score can be a valuable tool for ALS risk models and the design of ALS prevention studies.
Study Funding
National ALS Registry/CDC/ATSDR (1R01TS000289); National ALS Registry/CDC/ATSDR CDCP-DHHS-US (CDC/ATSDR 200-2013-56856); NIEHS K23ES027221; NIEHS R01ES030049; NINDS R01NS127188, ALS Association (20-IIA-532), the Dr. Randall W. Whitcomb Fund for ALS Genetics, the Peter R. Clark Fund for ALS Research, the Scott L. Pranger ALS Clinic Fund, and the NeuroNetwork for Emerging Therapies at the University of Michigan. This work was supported in part by the Intramural Research Program of the NIH, National Institute on Aging (Z01-AG000949-02). Project “ALS Genetic study in Madrid Autonomous Community” funded by “ESTRATEGIAS FRENTE A ENFERMEDADES NEURODEGENERATIVAS” from Spanish Ministry of Health.
Disclosure
J.F. Vázquez-Costa receives payment for lectures and presentations from Biogen. P. Mir receives payments for honoraria or lectures from Abbvie, Abbott, and Zambon. L. Galán-Dávila receives consulting fees, payment, or honoraria from Akcea, Alnylan, Genzyme, Sobi, Pfizer, and equipment donation from Pfizer. J.I. Ceberio receives payment for lectures and presentations from Abbvie, Bial, and Zambon. B.J. Traynor holds a patent for “Diagnostic and therapeutic implications for the C9orf72 repeat expansion” and has collaborative research agreements with Ionis Pharmaceuticals, Roche, and Optimeos. E.L. Feldman receives consulting fees from Novartis and is an inventor on a patent held by University of Michigan titled, “Methods for Treating Amyotrophic Lateral Sclerosis.” S.A. Goutman is an inventor on a patent held by University of Michigan titled, “Methods for Treating Amyotrophic Lateral Sclerosis.” The other authors declare no competing interests. Go to Neurology.org/NG for full disclosures.
Acknowledgment
The authors thank Crystal Pacut, Caroline Piecuch, and Stacey Sakowski, PhD, from the University of Michigan (Ann Arbor, MI) for study support, Masha Savelieff, PhD, and Emily Koubek, PhD, from the University of Michigan (Ann Arbor, MI) for editorial assistance, and the following members of the Spanish Neurological Consortium for study assistance: Mónica Povedano Panadés, Antonio Guerrero-Sola, Tania García-Sobrino, Marilina Puente Hernández, María Jesús SobridoGómez, Ivonne Jericó-Pascual, López, Adriano Jimenez-Escrig, Mario Ezquerra, and Ana Gorostidi Pagola. This study utilized the high‐performance computational capabilities of the Biowulf Linux cluster at the NIH.
Appendix 1 Authors

Appendix 2 Coinvestigators

Footnotes
Go to Neurology.org/NG for full disclosures. Funding information is provided at the end of the article.
The Article Processing Charge was funded by the authors.
Coinvestigators are listed in the Appendix 2 at the end of the article.
Previously published at medRxiv (MEDRXIV/2022/281377) on October 31, 2022.
Submitted and externally peer reviewed. The handling editor was Associate Editor Raymond P. Roos, MD, FAAN.
- Received March 1, 2023.
- Accepted in final form April 6, 2023.
- Written work prepared by employees of the Federal Government as part of their official duties is, under the U.S. Copyright Act, a “work of the United States Government” for which copyright protection under Title 17 of the United States Code is not available. As such, copyright does not extend to the contributions of employees of the Federal Government.
References
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- Shepheard SR,
- Parker MD,
- Cooper-Knock J, et al
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- Trabjerg BB,
- Garton FC,
- van Rheenen W, et al
- 13.↵
- 14.↵
- Nakamura R,
- Misawa K,
- Tohnai G, et al
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- Goutman SA,
- Boss J,
- Patterson A,
- Mukherjee B,
- Batterman S,
- Feldman EL
- 20.↵
- Goutman SA,
- Boss J,
- Guo K, et al
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- Abecasis GR,
- Auton A, et al
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- Bakulski KM,
- Vadari HS,
- Faul JD, et al
- 36.↵
- Wainberg M,
- Magis AT,
- Earls JC, et al
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- Chaudhury S,
- Brookes KJ,
- Patel T, et al
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- Lello L,
- Raben TG,
- Hsu SDH
- 53.↵
- Bogumil D,
- Conti DV,
- Sheng X, et al
- 54.↵
- van Rheenen W,
- van der Spek RAA,
- Bakker MK, et al
- 55.↵
- 56.↵
- 57.↵
- Saez-Atienzar S,
- Bandres-Ciga S,
- Langston RG, et al
- 58.↵
- Maniatis S,
- Aijo T,
- Vickovic S, et al
Letters: Rapid online correspondence
REQUIREMENTS
You must ensure that your Disclosures have been updated within the previous six months. Please go to our Submission Site to add or update your Disclosure information.
Your co-authors must send a completed Publishing Agreement Form to Neurology Staff (not necessary for the lead/corresponding author as the form below will suffice) before you upload your comment.
If you are responding to a comment that was written about an article you originally authored:
You (and co-authors) do not need to fill out forms or check disclosures as author forms are still valid
and apply to letter.
Submission specifications:
- Submissions must be < 200 words with < 5 references. Reference 1 must be the article on which you are commenting.
- Submissions should not have more than 5 authors. (Exception: original author replies can include all original authors of the article)
- Submit only on articles published within 6 months of issue date.
- Do not be redundant. Read any comments already posted on the article prior to submission.
- Submitted comments are subject to editing and editor review prior to posting.
You May Also be Interested in
Dr. Sevil Yaşar and Dr. Behnam Sabayan