Introduction

The pandemic of coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) represented a major health threat to the entire world1. During three years, there have been almost 700 million confirmed cases of COVID-19 worldwide, with more than 7 million deaths reported. A best estimate of the overall case fatality ratio after adjusting for demography and under-ascertainment in the initial outbreak in China was 1.38% (95% confidence interval 1.23–1.53), being significantly higher in aging people (6.4% in ≥60 and 13.4% in ≥80 age groups) and in males2. World-wide data of the age-stratified case fatality ratio and infection fatality ratio show a similar pattern with a remarkable sex-bias increasing with advanced age, with 60% overall deaths reported in men (estimated hazard ratio of 1.59, 95% confidence interval 1.53–1.65)3. Interestingly, sex-dependent differences in disease outcomes were also found during the past SARS-CoV and MERS-CoV epidemics4,5 and also in mice infected with the virus6.

Understanding the underlying basis of this different sex and age vulnerability is crucial because aging men and women are likely to have fundamentally different reactions to the SARS-CoV-2 virus infection, treatments, and vaccines. Male patients with COVID-19 have higher plasma levels of innate immune cytokines (IL-8 and IL-18) and stronger induction of non-classical monocytes, while females had more robust T cell activation during infection. Proposed causes include different case definition of disease, different environmental and social factors (such as lifestyle, smoking history or work-environment) and sex-specific immune-defense factors. The X chromosome harbors multiple genes important for immunity and there are many X-linked immunodeficiencies, so males have greater susceptibility to infections starting at birth6. More specifically, SARS viruses use the angiotensin converting enzyme (ACE2), encoded by an X-linked gene, as a receptor to enter and infect ACE-2 expressing cells1. Sex variation in the expression of this gene with paradoxically higher expression and higher circulating levels in men than in women has also been proposed as a candidate mechanism7. However, ascertainment bias and environmental factors are unlikely to prevail in different populations while the gender-specific immune factors or ACE2 variation would not fully explain the increased risk and sex-divergence with aging. The analysis of previously untreated patients with moderate COVID-19 disease revealed that male patients have higher levels of innate immune cytokines and more robust induction of non-classical monocytes, while female patients have more robust T-cell activation, which is sustained in old age8. A B-cell autoimmune disorder present in about 10% of individuals with life-threatening COVID-19 pneumonia has been reported, 5 times more common in males than females, characterized by detection of neutralizing immunoglobulin G autoantibodies against interferon type 19. Finally, a meta-analysis of genome-wide association studies searching for host-specific genetic factors has revealed 13 loci significantly associated with SARS-Cov2 infection or severe manifestations of COVID-19, but do not fully explain the gender differences10.

Mosaic chromosomal alterations (mCA) detectable in blood, including deletions, gains or copy neutral changes, are age-related somatic alterations that indicate clonal hematopoiesis when detectable and have been associated with increased risk for cancer, cardiovascular disease and overall mortality11,12,13,14,15. Expanded mCAs have also been recently associated with increased risk for incident infections, including COVID-19 hospitalization16. Multiple germline genetic alleles involved in susceptibility to clonally expanded mCA have been identified, with enrichment at regulatory sites for the immune system16. In men, mosaic X chromosome monosomy (XCM), acquired by somatic loss of the Y chromosome (LOY), is the most common copy number alteration in male leukocytes, estimated to occur in <2% men under 60 years of age, but exponentially increasing with aging to 15–40% in 70–85 year-old males and >50% at 93 years of age17. LOY has also been associated with a wide spectrum of human diseases including cancer, Alzheimer’s disease, cardiovascular disease, and reduced overall life expectancy in men18,19,20,21. Genetic variation in multiple loci is involved in the inherited susceptibility to LOY, which can also be driven by smoking and other environmental exposures17. Extreme down-regulation of chromosome Y gene expression mainly driven by genes with X-chromosome homologs that escape X-inactivation seems to be the functional mediator of the reported association between LOY and disease22,23.

In women, developmental (causing Turner syndrome) or late onset XCM detectable in leukocytes, usually with loss of the inactive X-chromosome, is found with lower frequency than in men but also increasing with age (0.05% in 50-year old; 0.25% in 75-year old)24. Females with XCM have an increased risk for autoimmune disease, recurrent viral infections and earlier cardiovascular mortality25, which is associated with excessive production of pro-inflammatory cytokines (IL-6), decrease in anti-inflammatory cytokines (IL-10, TGF-β) and a lower CD4:CD8 ratio26.

We have tested here the hypothesis that mCA and XCM/LOY could be underlying factors for the increased severity and mortality of COVID-19 in the elderly and mainly in men. Overall, we have associated clonal mosaicism with a 50% increase in the risk of COVID-19 lethality. We have also correlated LOY in aging males with multiple parameters of cardiovascular dysfunction, and defined the transcriptomic deregulation that underlies disease risks, including signatures of immune system dysfunction and increased coagulation activity. We have finally studied how some of the genes deregulated by LOY are involved in the response to SARS-CoV-2 infection.

Results

Higher Covid-19 severity and mortality in males, a sex-bias that increases with aging

Accumulated data on the age-stratified case fatality ratio and infection fatality ratio in a large sample from Spain, show a pattern with a remarkable sex-bias increase with advancing age (Fig. 1). Available reports, mostly based on hospital records, show the same tendency in other countries. COVID-19 lethality, mCA prevalence and LOY prevalence in men, as previously reported in multiple reports including the UK biobank dataset, appear to increase exponentially with age (Fig. 1)18,19,20,21.

Fig. 1: COVID-19 lethality and mCA and LOY prevalence as functions of age.
figure 1

Increasing sex-specific hospitalization (orange) and mortality (blue) rates for COVID-19 in Spain in the different age intervals (updated July 2022). Estimated prevalence by age in the general population of detectable mCA (black) or LOY in men (grey) in blood is also shown13,14.

COVID-19 severity variables and their association with age

We first studied the SCOURGE clinical data. Phenotype data was available from all 9578 individuals (5134 females and 4444 males) diagnosed with COVID-19 and recruited to the SCOURGE study (Table 1). According to disease severity, there were 607 cases asymptomatic (6.8% A), 2727 individuals with mild symptoms (30% L), 2141 patients with moderate disease (23.6% M), 2449 with severe manifestations (27% G), 1157 critical (12.7% C). We visually inspected the contrasts defined together with the level of severity and the age of the patients. Mean age was 62.58 years, 64.34 for males and 61.06 for females, with an age difference between sexes that was statistically significant (P = 4.1 × 10−19). All clinical categories and variables correlated with age except for “critical” and “history of pulmonary thromboembolism”.

Table 1 Number, proportion and mean age of patients in the different clinical categories of COVID-19 severity in the SCOURGE study, with and without detectable mCAs or LOY (males)

Association between mCA and COVID-19 severity

The algorithm followed by manual curation finally detected 133 individuals (1.42%), 61 males and 72 females, carrying mCAs in blood affecting the autosomes and/or the X chromosome (Table 1, Supplementary Data 1, Fig. 2a & 2b and Figure S1). Globally, 95 individuals had a single mCA while 38 of them had more than one event, for a total of 213 mCAs. There were 88 deletions, 5 whole chromosome monosomies, 20 segmental gains and 21 whole chromosome trisomies, along with 78 copy-neutral changes (somatic segmental uniparental disomies), and a few complex rearrangements. Mean age for individuals with mCAs was 75.04 ± 12.7. We then performed association analyses across the different outcome variables related to COVID-19 severity and the presence of mosaicism. We first confirmed the strong association between mosaicism and age (year) (OR = 1.051, P = 1.05 × 10−16), as previously reported. We then observed a significant association between the presence of mCA and COVID-19 lethality (1-survival, OR = 1.75, P = 0.015), after adjusting for sex and age (Fig. 3). The contribution of mCAs to COVID 19 lethality was stronger and more significant in males only (OR 2.16; 95%CI: 1.19-3.93). Although in the same direction, the split sample size was not enough to achieve statistical significance for an association in females (OR 1.32; 95%CI: 0.66-2.67).

Fig. 2: mCA and LOY detection if the SCOURGE study.
figure 2

a Plot representing the whole-genome molecular karyotype obtained by SNParray of blood DNA from an individual with several mCAs. Dots in grey are LRR values (average per widow shown by a green lane), while colored dots are BAF values of homozygous and heterozygous SNPs from odd (red) and even number (orange) chromosomes, respectively. Abnormal BAF and average LRR values in three regions (blue lanes interrupting the black lane in the upper part) correspond to mosaicism for trisomy 12, a small interstitial deletion in 13q and X-chromosome monosomy. The blue lanes interrupting the green lane at LRR = 0 correspond to small regions of homozygosity. b Circus plots showing all detected mCAs in the SCOURGE dataset. In red deletions, in blue gains and in green copy neutral events. c Analysis of LOY in male individuals in the SCOURGE study based on mean LRR from chromosome Y (mLRRY: relative amount of DNA from the Y chromosome with respect to autosomes). Blue dots correspond to males with mosaic LOY in more than 65% of cells (XCM > 65%), green dots to males with LOY/XCM between 25%–65%, and red dots to males with LOY/XCM in less than 25% of cells. The three individuals with top mLRRY values have apparently non-mosaic gains of chromosome Y (47,XYY).

Fig. 3: Associations of detectable mCAs and LOY with COVID-19-related mortality.
figure 3

Mortality was reported at most 90 days after infection. The analyses are stratified or adjusted by sex, as indicated. All analyses are adjusted for age and 10 principal components of ancestry. Individuals with prevalent hematologic cancer were excluded from the analysis. Error bars correspond to the 95% CI for the OR estimates.

Association between LOY and COVID-19 severity

Among all male cases, we detected 226 individuals with LOY (mean age 82.0 ± 7.9), a 5,08% prevalence of LOY in this cohort (Table 1, Supplementary Data 2), which is within the reported range in previous publications17,18,19,20, albeit significantly below the prevalence reported by others in the UK biobank using the same threshold21. The reasons for this discrepancy are unclear, but our calling method of LOY has demonstrated robustness in comparison with others, using both simulations and real data27. According to the estimated proportion of cells with XCM or LOY, 162 individuals had mild LOY (<25% cells with LOY), 43 moderate LOY (25-65% cells with LOY) and 21 had extreme LOY (>65% cells with LOY) (Fig. 2c). We also identified three women with detectable chromosome Y in a proportion of cells, then likely corresponding to X0/XY mosaicism and a possible diagnosis of Turner syndrome, as well as three individuals with non-mosaic XYY (Fig. 2c). We observed 6 men with both LOY and mCA, 220 with LOY and no mCA, and 55 with mCA and no LOY, which resulted in no significant correlation between the presence of LOY and mCA.

We first confirmed a strong association between LOY in males and age (OR = 1.11, P = 5.65 × 10−51). We then fitted a series of models between LOY and the contrast C or G > M or L or A, for which we had observed a strong association with age. We first observed a significant association between the contrast and LOY, primarily due to its association with age (not significant after adjusting by age, OR = 1.25, P = 0.15). We also performed association tests for all the contrasts and clinical variables adjusting only by age and we observed some significant associations. LOY was associated with reduction in survival (OR = 0.713, P = 0.045) and with clinical history of vascular disease (OR = 0.627, P = 0.001) and lung thromboembolism (OR = 0.271, P = 0.042). While associations with severity were not significant, we observed a consistent estimate of their risk given by LOY.

We then tested the association with the continuous value for mLRRY across all severity contrast and clinical variables. We found a significant association with survival for higher relative levels of chromosome Y content (β = 0.86, P = 0.0054).

We then performed a joint analysis for all mosaicisms, mCAs and LOY, confirming their strong association with age (OR = 1.08 P = 1.95 × 10−62) and with COVID-19 lethality (OR = 1.53, P = 0.004) after corrections, including adjustment for other clinical variables (Fig. 3). The associations of all type mosaicism with severity contrasts were not significant but consistent across all contrasts.

Germline aneuploidies and COVID-19

In addition to 6 individuals with XCM and likely Turner syndrome, 3 cases with 45,X0/46,XY mosaicism mentioned above, 2 more cases with 45,X0/46,XX mosaicism and one with likely 45,X0/46,XY/46,XX mosaicism, the algorithm also detected a total of 25 individuals with germline (non-mosaic) aneuploidies. We detected 7 cases with Down syndrome (trisomy 21) and 18 with gonosomal aneuploidies, including 9 with Klinefelter syndrome (47,XXY), 6 with triple X syndrome (47,XXX) and 3 with XYY syndrome (47,XYY) (Supplementary Data 3). We found an association of aneuploidies with the presence of mCAs (OR = 9.90, P = 0.0047).

We then performed association tests of phenotypic features with all the aneuploidies, removing individuals with mCAs. We did not find any significant association between COVID-19 severity parameters and any type of aneuploidy given this small sample size, although previous history of cardiopathy was significantly associated, as expected (OR = 4.02, P = 0.004).

Correlation of LOY with cellular and biochemical phenotypes in EGCUT individuals

We analyzed SNP microarray data with MADloy of a selected sample of 530 apparently healthy adult men from the Estonian Genome Center of the University of Tartu cohort (EGCUT) and classified them as having (n = 28) or not having LOY (n = 502). We then correlated genotype classification with several clinical parameters. Individuals with LOY had significantly age-adjusted decrease in red cell counts, decrease in mean corpuscular hemoglobin concentration and higher red cell distribution width, low basophil counts and borderline low lymphocyte proportions. Biochemical parameters revealed low albumin levels, low triglycerides and elevated homocysteine and urea levels (Table S1).

Blood transcriptome in individuals with LOY reveals immune defects and cardiovascular risk

We also compared blood transcriptome from 11 men with LOY (median age: 69, range: 58-84) and 32 age-paired men without LOY (median age: 68, range: 60-87) as controls. Multiple genes differentially expressed between groups were found, including autosomal and gonosomal genes (Tables S2, S3 and Figures S2, S3), providing insight into the mechanisms of disease susceptibility caused by LOY with implications for COVID-19. CSF2RA, located on the X-Y chromosome pseudoautosomal 1 (PAR1) region, is one of the most significantly down-regulated genes in LOY (Fig. 4a), along with other multiple Y chromosome genes with homologs on the X chromosome that escape X inactivation and with known function in immunity (Supplementary Data 4 and 5).

Fig. 4: Transcriptomic signatures of LOY.
figure 4

a Decreased expression of CSF2RA mRNA and increased expression of MYL9 and VWF in individuals with LOY compared with controls with no LOY (mean gene expression in red dot). b Different predicted cell counts underlying the transcriptomic differences between cases with LOY and control individuals (no-LOY). c Gene Ontology (GO) enrichment of top differentially expressed genes. Boxplots show the interquartile range, while error bars represent the spread of data around the median.

Top autosomal genes overexpressed in LOY, such as VWF and MYL9 (Fig. 4a), are associated with cardiovascular risk. VWF codes for the von Willebrand factor (vWF), a pro-coagulant protein that promotes platelet adhesion and smooth muscle cell proliferation, while MYL9 encodes Myosin Light Chain 9, regulatory, important in inflammatory immune responses.

Since changes in gene expression may reflect differences in cell-type composition and functionality, we estimated the average cell-type functional composition of samples from individuals with LOY compared to those without LOY using bulk transcriptome data (Table S4). The results were consistent with LOY individuals having significantly decreased GM-progenitors and B cell naïve cells, along with increased counts of endothelial cells (Fig. 4b). Enrichment gene set analysis using differentially expressed genes revealed a few categories significantly over-enriched, most notably the coagulation and cellular detoxification, the leukocyte migration and neutrophil activation (Fig. 4c, Tables S5 and S6). Overall, gene expression in LOY individuals leads to a down-regulated immune score.

Blood transcriptome in individuals with mCAs

For transcriptome analysis, we selected individuals only with copy-neutral mCAs to minimize the variability secondary to the individual rearrangements. We then compared blood transcriptome from the 9 individuals with mCA and 90 age-paired individuals without mCA as controls. A total of 83 genes differentially expressed between groups were found, but enrichment analysis did not reveal any significantly deregulated pathway (Supplementary Data 6).

Down-regulated genes in LOY involved in response to SARS-CoV-2 infection

We tested whether the genes that participate in the primary response to SARS-CoV-2 infection were significantly deregulated in blood cells of individuals with LOY. We obtained 249 deregulated genes with SARS-CoV-2 infection in primary human lung epithelium (NHBE) and 130 for transformed lung alveolar (A549) (339 unique genes for the two cell lines). This gene set is highly over-represented in several pathways including defense response to virus, IL-17, type I interferon and NF-Kappa B signaling (Table S7). From the deregulated genes in cells infected with SARS-CoV-2, 13 were also deregulated in individuals with LOY (Fig. 5a and Table S8) indicating a strong significant over-representation (OR of enrichment = 7.23, p = 1.5 × 10−7). Most of these genes are interferon response genes (IFIT3, IFI44L, ITFT1, IFI6), which are down-regulated in individuals with LOY (Fig. 5b–d).

Fig. 5: Transcriptomic overlap between LOY and SARS-CoV-2.
figure 5

a Overlap between top differentially expressed genes in individuals with LOY and deregulated genes in SARS-CoV-2 infected cells. Panels b, c and d show detailed gene expression patterns of some of these overlapping genes, including down-regulated in individuals with LOY (b), and over-expressed in NHBE (c) and A549 (d) cell lines infected with SARS-CoV-2. Boxplots show the interquartile range, while error bars represent the spread of data around the median.

Discussion

We have shown in the SCOURGE study that clonal detectable mCAs, including XCM, are relatively common in blood of aging individuals, as previously reported14, with much higher frequency in males due to somatic LOY18. In addition to a risk factor for cancer, cardiovascular complications, incidental infections and all cause early mortality12,16,18,28, clonal hematopoiesis with mCA and/or XCM due to LOY is a risk factor for COVID-19 lethality with a combined odds ratio of 1.53. Despite some limitations of our study due to a relatively small sample size and the possibility of uncontrolled confounding factors, similar results have been recently reported in the UK biobank revealing increased risk for diverse incident infections and COVID-19 hospitalization in people with clonal hematopoiesis with chromosomal mosaicism16,28. Our data indicate that these two types of chromosomal mosaicism underlie at least part of the aging-related and sex-biased severity and mortality of COVID-19. Therefore, identification of mCA and LOY in blood cells is likely to have an immediate clinical relevance in the management of aged patients with COVID-19.

Clonal hematopoiesis of indeterminate potential due to expansion of peripheral blood cells with acquired point mutations in a specific set of genes, is a similar condition that also increases with age and associates increased risk for cancer, cardiovascular disease, and decreased overall survival29. A significant overlap after adjusting for age has been proven between detectable mCAs or LOY and mosaic gene mutations, suggesting a possible synergistic relationship between clonal hematopoiesis with gene mutations and acquired chromosomal rearrangements30. Interestingly, a relationship between clonal hematopoiesis with gene mutations and risk of severe infections, including severe COVID-19, has also been recently documented31. However, although the increased severity and mortality of COVID-19 could be explained in part by the co-existence of clonal hematopoiesis, the underlying mechanisms in individuals with LOY are likely different from those in individuals with mCAs or mosaic gene mutations.

The predisposing factors to autosomal events and LOY seem to be mostly unrelated, as no significant association has been found between both types of events in our cohort, their blood transcriptomic signatures do not overlap and the germline loci reported to predispose to autosomal mCAs and LOY are different16,19,21. While only 10% of autosomal mCAs correspond to whole chromosome aneuploidies (mainly trisomies 8, 12 and 15 and monosomy 7) likely mediated by mitotic non-disjunction, this is the main mechanism for XCM and LOY. Mitotic non-disjunction of sister chromatids of the Y chromosome may be facilitated by the higher rate of cellular turnover of aging men. In mice, while the Y chromosome is stably transmitted during meiotic cell divisions, there is a high frequency of non-disjunction in mitosis, mainly in the earliest cleavage divisions32.

A possible pathogenetic mechanism that could be common to clonal mCAs and XCM is immunosenescence, which involves modifications of humoral and cellular immunity. One aspect of immunosenescence is a decline in the absolute number of peripheral blood lymphocytes with locus-dependent reduction of HLA class-I cell surface expression, related with increased risk of subsequent mortality33. T-lymphocytes also play a central role in the effector and regulatory mechanisms of the adaptive immune response34.

Many of the biochemical and transcriptomic alterations found in individuals with LOY have been already associated to poor prognosis for SARS-CoV-2 infection35,36. Several genes located on the Y chromosome with relevant functions in the immune system have functional homologs on the X chromosome that escape X inactivation in females (Supplementary Data 4). Cells with XCM are likely haploinsufficient for many of those genes, which are downregulated in individuals with mosaic XCM due to LOY. In this regard, we observed low expression of CSF2RA in individuals with LOY, who also have low GM progenitors. CSF2RA codes for the alpha subunit of the heterodimeric receptor for colony stimulating factor 2, a cytokine that regulates the production, differentiation, and function of granulocytes and macrophages (GM-CSF), key cells for antigen presentation in infections, and is also critical for T cell function. GM-CSF increases IL-2R and IL-2 signaling, which can increase expansion of lymphocytes and IFN-γ production important for anti-viral response. Therefore, GM-CSF leads to enhanced protective responses37. Loss or inactivation of both copies of the CSF2RA gene is associated with surfactant metabolism dysfunction-4 and pulmonary alveolar proteinosis, a primary immunodeficiency (OMIM 300770)38. As Leukine® (sargramostim, rhu-GM-CSF) has being assessed in the SARPAC trial because of its potential positive effect on antiviral immunity and contribution to restore immune homeostasis in the lungs with inconclusive results (https://clinicaltrials.gov/ct2/show/NCT04326920), our data suggest that patients with LOY might be predictive of a poor response due to their low expression of one of the receptor subunits for GM-CSF (CSF2RA)39.

Patients severely affected with COVID-19 have lower lymphocyte counts, especially T cells, higher leukocyte counts and neutrophil-lymphocyte-ratio, lower percentages of monocytes, eosinophils, and basophils, along with generally elevated levels of infection-related biomarkers and inflammatory cytokines, including IL-6. Helper, suppressor and regulatory T cells were all below normal levels in the severe group, with increased naïve helper T cells and decreased memory helper T cells1,40. We observed a significant overlap of deregulated genes in LOY individuals that participate in the immediate immune response elicited by SARS-CoV-2 virus infection. Some of these genes clearly activated in both studied cell types infected by SARS-CoV-2 are markedly under expressed in individuals with LOY (SLPI, IFI6, IFIT1, IFIT3, and IFI44L) (Fig. 5b–d). Secretory leukocyte protease inhibitor (SLPI) is a regulator of innate and adaptive immunity that protects the host from excessive inflammation in infectious disease, while the other four genes encode interferon induced proteins of the innate immune system that participate in the immediate host response to viral infections41. Dysfunctions of the adaptive immunity and interferon-mediated immediate host response in individuals with LOY are consistent with the observed sexual dimorphism in human immune system aging, and might underlie a poor immune response to SARS-CoV-2 infection42. This patterns along with the increased severity in older males, suggests that XCM due to LOY may be one underlying factor for susceptibility to COVID-19 in a proportion of patients.

In addition to depleted hematopoietic progenitor cells and possible immunodeficiency, individuals with LOY may have increased levels of circulating endothelial cells, which are known biomarkers for endothelial dysfunction and cardiovascular disease43. In a mouse model with LOY, macrophages recruited to the heart showed aberrant profibrotic differentiation leading to cardiac fibrosis during aging44. We observed up-regulation of VWF and MYL9 in LOY. Pro-coagulant vWF promotes platelet adhesion and smooth muscle cell proliferation, and elevated levels of vWF have been associated with higher risk for thrombosis and cardiovascular disease45. MYL9 is a ligand for CD69 to form a net-like structure inside blood vessels in inflamed lungs and is also a risk factor for cardiovascular disease risk found over-expressed in aged versus young injured arteries46. Through these mechanisms, LOY seems to contribute to COVID-19 lethality by its associated cardiovascular risk.

In summary, clonal detectable mCA & XCM are relatively common in aging individuals with much higher frequency in males due to somatic LOY. LOY is associated to decreased progenitors and stem cells, along with immune system dysfunction and increased coagulation and cardiovascular risk, as revealed by biochemical and gene expression data. Our data indicate that this type of chromosomal mosaicism underlies at least part of the sex-biased severity and mortality of COVID-19 in aging patients. Given its potential relevance for modulating prognosis, therapeutic intervention, and immunization responses, we propose that evaluation of mCA/LOY by currently established methods should be implemented in both, retrospective studies and all prospective and currently ongoing clinical trials with different medications and vaccines for COVID-19. Testing for mCA/LOY at large scale in elderly people may also be helpful to evaluate vaccination response and to identify still unexposed people who may be especially vulnerable to severe COVID-19 disease.

Methods

Covid-19 infection, mortality data, mCA and LOY prevalence estimates

Accumulated data was obtained from the Spanish National Epidemiological Registry (https://www.isciii.es/QueHacemos/Servicios/VigilanciaSaludPublicaRENAVE/EnfermedadesTransmisibles/Paginas/InformesCOVID-19.aspx). Hospitalization rates, intensive care admission rates, and mortality stratified by age and sex was obtained from this report. Prevalence estimates of mCA and LOY by age were obtained from the general population14,21.

EGCUT subjects, phenotype and genotype data

LOY and mCA were assessed in a total of 882 adult individuals belonging to the Estonian Gene Expression Cohort (EGCUT, www.biobank.ee) that comprises a large cohort of 53,000 samples of the Estonian Genome Center Biobank, University of Tartu47. Detailed phenotypic information from all the individuals studied, including clinical analysis (blood cell counts and general biochemistry) and follow-up until June 2020, was available in ICD-10 codes. Patients selected in this study were genotyped using OmniX array. All individuals had genotyping success rate above 95%. All studies were performed in accordance with the ethical standards of the responsible committee on human experimentation, and with proper informed consent from all individuals tested.

SCOURGE subjects, phenotype and genotype data

A total of 9578 (5134 females and 4444 males) patients diagnosed with COVID-19 and recruited to the SCOURGE study were included in this study48. Mean age was 62.58 years, 61.06 for females and 64.34 for males. Available phenotype data included age, sex, some clinical variables of past clinical history, several defined measures of COVID-19 severity and vital status (alive or dead) 90 days after diagnosis. The severity variables classified individuals in five levels called Asymptomatic (A), Mild (light: L), Moderate (M), Severe (G), and Critical (C). Additional information about pre-existing conditions as categorical variables was also available for most cases, including history of vascular disorders, cardiac problems, neurologic conditions, gastrointestinal disorders, onco-hematologic conditions, respiratory issues, and pulmonary thrombo-embolism. Blood DNA was genotyped using a customized Affymetrix SNP microarray48. Genotype data passed quality controls for GWAS analysis. The whole SCOURGE project was approved by the Galician Ethical Committee Ref 2020/197, along with the Ethics and Scientific Committees of all participating centers.

Detection of mCA and LOY

The genotype CEL files from everyone were used to extract the log-R ratio (LRR) and B-allele (BAF) frequency from SNP probes. We used the apt software for quality control (QC) and the extraction of the array intensity signals. Following the QC pipeline with filters axiom-dishqc-DQC > 0.82 and call-rate > 0.97, we observed that all individuals could be included. The signals were obtained from CNV calling pipeline with default parameters mapd-max = 0.35 and waviness-sd-max = 0.1. We also called mosaicisms in autosomes and chromosome X with the MAD algorithm49. The method uses the fixed deviation from the expected BAF value of 0.5 for heterozygous SNPs (Bdev) to call allelic imbalances by using a segmentation procedure. The segmentation was performed using the three different parameters of MAD: T > 8, aAlpha = 0.8, minSegLength >100. Some false positive alterations were detected in bad quality arrays. Therefore, curation via visual inspection, considering variability of LRR and BAF mean values in the segment, was performed by two independent investigators. Each mosaic alteration was classified as copy-loss, copy-gain or copy-neutral. The estimated percentage of abnormal cells was computed based on the B-deviation as previously reported10.

Mosaic LOY detection and quantification was performed using the MADloy tool which implements LOY calling using the mean LRR (mLRRY) and B-deviation derived-measures from chromosome Y across subjects50. For each sample, MADloy first estimates the normalized mLRRY given by its ratio with the trimmed-mean of mLRRY values in the autosomes to discard regions with copy number alterations. B-deviation is calculated for the pseudoautosomal regions 1 and 2 (PAR1, 0–2.5 Mb on both Xp and Yp; PAR2, 300 kb on distal Xq and Yq, Mb 155 and 59, respectively), and the XY transposed region (88–92 Mb on X, 2.5–6.5 Mb on Y). The method is calibrated to detect mosaicism when the proportion of affected cells is above 10%. We then plotted the values of the mLRRY signals for males and females. A signal from chromosome Y in females is observed due to the background noise of the array and some cross-hybridization. While we observed variability of the mLRRY signal, numerous males were identified with extreme low values of mLLRY, suggesting loss of chromosome Y. We categorized the level of LOY status into three groups according to the magnitude of the decrease in mLRRY, believed to be a function of XCM/LOY cellularity.

Bulk transcriptome data

Gene expression was obtained with Illumina whole-genome expression BeadChips (HT12v3) from peripheral blood RNA in the EGCUT cohort. Low quality samples were excluded. All probes with primer polymorphisms were left out, leaving 34,282 probes. The expression dataset is publicly available at GEO (Gene Expression Omnibus) under the accession number GSE4834845. In this dataset, a total of 11 individuals with LOY and 9 individuals with copy-neutral mCAs were identified. In order to consider the effect of aging on LOY/mCA detection and to have the maximum power, 32 age and gender-paired normal samples without LOY or mCA (3 controls per case) were selected for the transcriptomic analyses.

The effect of SARS-CoV-2 infection on gene expression was assessed in independent biological triplicates of two different cell lines that were mock treated or infected with SARS-CoV-2 (USA-WA1/2020). One corresponds to primary human lung epithelium (NHBE) and the other to transformed lung alveolar cells (A549). These data are available at GEO under the accession number GSE147507.

Statistical data analyses

Gene expression data was quantile-normalized to the median. We analyzed linear regression residuals of gene expression data on forty multidimensional scaling components, to correct for possible unwanted variability. Array quality was assessed using arrayQualityMetrics Bioconductor package. genefilter Bioconductor package was used to filter for features without annotation and/or exhibiting little variation and low signal across samples, leaving a total of 15,592 probes from 34,282. Differential expression (DE) between individuals with and without LOY was then performed using limma Bioconductor package. Significant DE genes were considered at false discovery rate (FDR) < 0.05. Significant DE genes at p < 0.001 level was selected for Gene Ontology (GO) and KEGG (Kyoto Encyclopedia of Genes and Genomes) enrichment analysis with clusterProfiler Bioconductor package. Over-representation of DE genes in the gene set obtained from the analysis of SARS-CoV-2 infected cell lines (p < 0.001 and log-foldchange > 0.5) was performed using exact Fisher test. Cell-type composition of the 43 individuals with bulk transcriptomic data (11 LOY, 32 normal) was estimated using the ‘xcell’ method implemented in the immunodeconv R package51.

Association analysis between mCA or LOY status and clinical data, including blood cell counts and biochemical parameters, was assessed using linear models adjusted by age. All statistical analyses were performed using the statistical software R version 3.6.3 (http://www.r-project.org).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.