Clinical implications of copy number variations in autoimmune disorders
Article information
Abstract
Human genetic variation is represented by the genetic differences both within and among populations, and most genetic variants do not cause overt diseases but contribute to disease susceptibility and influence drug response. During the last century, various genetic variants, such as copy number variations (CNVs), have been associated with diverse human disorders. Here, we review studies on the associations between CNVs and autoimmune diseases to gain some insight. First, some CNV loci are commonly implicated in various autoimmune diseases, such as Fcγ receptors in patients with systemic lupus erythemoatosus or idiopathic thrombocytopenic purpura and β-defensin genes in patients with psoriasis or Crohn's disease. This means that when a CNV locus is associated with a particular autoimmune disease, we should examine its potential associations with other diseases. Second, interpopulation or interethnic differences in the effects of CNVs on phenotypes exist, including disease susceptibility, and evidence suggests that CNVs are important to understand susceptibility to and pathogenesis of autoimmune diseases. However, many findings need to be replicated in independent populations and different ethnic groups. The validity and reliability of detecting CNVs will improve quickly as genotyping technology advances, which will support the required replication.
INTRODUCTION
Autoimmune diseases are mostly chronic and complex and arise when the immune system turns its antiforeign antigen defenses on normal healthy body components, such as pancreatic β-cells in patients with type 1 diabetes or myelin basic proteins in patients with multiple sclerosis [1,2,3]. Researchers have identified many genes responsible for autoimmune disorders during the last century. For example, since the first evidence supporting involvement of human leukocyte antigens (HLAs) in the pathogenesis of rheumatoid arthritis (RA) [4], the implications of HLA-DR4 subtypes are central knowledge, and HLA-DR genotyping is used widely to diagnose and treat RA. The association between HLA-B27 with ankylosing spondylitis (AS) remains the strongest compared with those of other common complex diseases; people homozygous for HLA-B27 have an odds ratio (OR) of ~100 and those who are heterozygous have an OR of ~50 [5]. Many tolerance checkpoints exist in the immune system to prevent self-components from activating self-reactive lymphocytes. Therefore, multiple defects in genes and molecular pathways are required for an autoimmune disease to emerge through bypassing these tolerance checkpoints. However, the mechanisms that contribute to disease pathogenesis are largely unknown.
Genetic variations, such as single nucleotide polymorphisms (SNPs), make each individual unique and determine our unique disease susceptibilities [6]. In addition, copy number variations (CNVs), which are genetic variations resulting in a coding gene dosage variation, have been recently discovered. CNVs influence interindividual differences in the risk for a disease via several mechanisms that affect gene expression, such as gene disruption or rearrangement [7,8]. CNVs are measured by whole-genome microarray-based comparative genomic hybridization (array-CGH), whole-genome SNP arrays, or whole-genome sequencing. Genome-wide approaches to study genetic variations help discover new genes that influence susceptibilities to various diseases [9,10]. Indeed, genetic variants have been associated with autoimmune disorders [11,12,13,14], which is useful for understanding the pathogenesis and discovering new drug targets [15].
In this review, we focus on CNVs newly identified to be associated with autoimmune diseases through genome-wide association studies (GWASs) as well as target gene studies and discuss their clinical implications.
COPY NUMBER VARIATIONS
What is a copy number variation?
Human genetic variations are the genetic sequence and/or structural differences within and among populations. Many genetic variants do not cause overt diseases but influence disease susceptibility and drug response. Genomic structural variants have been considered background noise in conventional genetic studies because they are difficult to distinguish from experimental errors or biases. Therefore, unverified genomic structural variants detected in the past were removed from the analysis. However, these variants drew attention of some scientists and have been recognized as genuine, novel genetic variations called CNVs [16,17,18].
CNVs are inherited or de novo structural variations, including all kinds of genomic variations > 1 kb, such as insertions, deletions, inversions, and translocations (Fig. 1) [19,20]. In total, 353,126 CNVs have been reported in the Database of Genomic Variants (http://dgv.tcag.ca/dgv/app/home) as of October 2014. A study has estimated that ~68% of the human genome is covered by at least one CNV [21]. As 75.6% of exons and 91.2% of transcripts are overlapped by at least one CNV, CNVs could have significant biological implications [21]. The functional consequences of CNVs include changes in protein expression levels and truncation or alterations in protein sequences.
Studies using next-generation sequencing (NGS) technology show that nonallelic homologous recombination and nonhomologous end joining are the major mechanisms forming CNVs, whereas a retrotransposition contributes only partially [22,23]. More than 99% of CNVs are inherited, whereas others are generated de novo during meiosis. Van Ommen [24] measured the frequency of de novo large-segment CNVs in human newborns and estimated that the frequency of segmental deletions is 1 in 8 and that of segmental duplications is 1 in 50.
Identifying copy number variations
CNVs are detected using blood- or tissue-extracted DNA. CNV detection methods are categorized into targeted and genome-wide detection approaches [19,25]. Genome-wide approaches, in which the entire genome is scanned for CNVs, include microarray-based and NGS-based analyses (Fig. 2). Microarray-based analyses are divided into two approaches of array-CGH and SNP array analyses. In array-CGH, DNA from two individuals (reference and test subjects) are labeled with different dyes and co-hybridized onto DNA array spots representing the entire genome (Fig. 2A) [26]. Relative copy number differences (gain or loss) are calculated using relative signal intensities (test/reference). Most array-CGH DNA arrays are oligonucleotide arrays composed of ~70 oligomers specific for certain genomic locations across the entire genome. Identifying CNVs using SNP arrays is based on comparing signal intensity from whole-genome SNP genotyping data of a test individual with those of a reference group [27]. Whole-genome sequencing has facilitated the discovery of CNVs (Fig. 2B) [28,29]. NGS technology is used primarily to detect single nucleotide variants (or small indels), whereas NGS data can be an important resource to identify CNVs. Moreover, NGS data can be used to detect much smaller CNVs and define CNV breakpoints more precisely. In particular, inversion and translocation, which cannot be identified precisely by either conventional cytogenetics or array-CGH approaches, are definable by NGS [30].
However, whole-genome CNV analyses may not be efficient for the validation or clinical translation of a relatively small set of known CNVs. Targeted approaches are more efficient for that purpose. Approaches targeting CNVs include quantitative polymerase chain reaction (qPCR) or Southern hybridization for single target screening and multiplex ligation-dependent probe amplification (MLPA), multiplex amplifiable probe hybridization, multiplex qPCR of short fluorescent fragments, or microsatellite genotyping for concurrent multitarget analyses [19]. The multiplexing ability of MLPA makes it attractive for DNA copy number analyses [31]. However, MLPA has its own shortcomings. For example, as MLPA depends on the length-based discrimination of the ligation products provided by capillary electrophoresis, "stuffer" sequence elements of different sizes are required. Long stuffer elements have potential problems, such as nonspecific hybridization and nonuniform amplification, which hinder widespread use of this technology. Shin et al. [32] developed a stuffer-free multiplex CNV detection method that combines the advantages of MLPA and capillary electrophoresis-strand conformation polymorphism to overcome these limitations (Fig. 3). The emergence of user-friendly and reliable multiplex CNV identification technologies will facilitate CNV-based clinical research and clinical application of the CNVs.
COPY NUMBER VARIATIONS IN AUTOIMMUNE DISORDERS
Genetic variation and human autoimmune disease
Autoimmune disorders can be caused by several different mechanisms, including continuous stimulation of autoreactive lymphocytes by diverse self-antigens or inhibiting negative regulatory mechanisms for autoimmune lymphocyte growth. More than half of all antigen receptors generated by random V(D)J recombination recognize self-antigens [33]. The proliferation of autoimmune lymphocytes induced by self-antigens is normally blocked by mechanisms that suppress antigen receptor signaling to activate transcription factors and signaling pathways. However, autoimmune disorders emerge if these mechanisms fail, and many studies have reported the genetic diversity in human autoimmune disorders. Selection and regulation of lymphocytes are controlled by cell-signaling events that may vary among individuals, which is likely due to genetic or epigenetic diversity [34].
Thanks to the advance of technologies, especially genome-wide approaches, more than 200 genetic loci or variants have been suggested to be associated with various autoimmune disorders [34]. Although the causal genes have not been identified, the existence of some genes associated with multiple autoimmune disorders suggests that common pathways may exist among them. Genes within the major histocompatibility complex (MHC) are most strongly associated with autoimmune diseases and other genes likely have smaller, regulatory effects [34].
Associations between CNVs and autoimmune diseases
Several studies have reported associations between CNVs and various autoimmune diseases (Table 1). We review the major CNVs that are consistently associated with autoimmune disorders, although some inconsistencies appear among reports due to technical limitations, and not all of the suggested CNVs have been validated clearly through replicated studies.
Complement component 4
Complement component 4 (C4) is a complement system protein consisting of > 30 proteins in plasma or on cell surfaces. The C4 gene resides in the MHC region as the C4A and C4B isotypes, both with copy numbers of 2 to 6 [35]. Yang et al. [36] studied C4 gene CNVs in 1,241 European and American patients with systemic lupus erythemoatosus (SLE), their first-degree relatives, and unrelated healthy controls and found that a low copy number of total C4 and C4A is associated with susceptibility to SLE. This association was successfully replicated in a study using 924 Han Chinese patients and 1,007 controls [37]. Kim et al. [14] studied SLE-associated CNVs in East Asians using genome-wide analysis and reported a significant correlation between low C4 copy number and SLE in Korean women.
However, the correlation between low C4 copy number and the risk of SLE is not always significant. Boteva et al. [38] investigated the C4 CNV in two large cohorts consisting of 2,207 subjects of northern and southern European ancestry (1,028 SLE cases and 1,179 controls) to determine whether a partial C4 deficiency is an independent risk factor for SLE. Multiple logistic regression was performed to control for the effects of well-known SLE-associated SNPs and risk alleles. They concluded that a genetically determined partial C4 deficiency is not an independent risk factor for SLE in either population and that the association seems to be caused by an unknown causal variant located somewhere else in the high linkage disequilibrium region with C4 [38]. A low C4B copy number, however, has been associated with RA [39].
Fcγ receptor locus
Human Fcγ receptors (FCGRs) are glycoproteins that bind to the immunoglobulin G (IgG) Fc region. The FCGR genes that encode the FcγRs are located on chromosome 1q23-24. Six classes of FCγ receptors (FcγRIA, FcγRIIB, FcγRIIA, FcγRIIC, FcγRIIIA, and FcγRIIIB) have been identified in humans [40]. Among them, Fcγ-RIIIB is a functional modulator of neutrophil activation that controls IgG binding [41]. Aitman et al. [42] detected a correlation between the FCGR3B CNV and SLE glomerulonephritis in United Kingdom (UK) families. A study of 161 SLE cases and 312 controls from the UK reported that individuals with less than two copies of the FCGR3B gene had a higher risk for SLE (OR, 2.43; p = 0.001) [43]. Subsequent studies have consistently reported a correlation between low FCGR3B copy number (< 2) and SLE [44,45].
However, this association is not consistent in patients with RA. One group reported a correlation between a FCGR3B CNV and RA in Caucasians [46]. Another study of 1,115 patients with RA and 654 controls also reported a correlation between low FCGR3B copy number and RA in Caucasians [47]. However, a third study on a population with Spanish ancestry showed that the FCGR3B CNV is correlated with SLE but not with RA [44]. A recent meta-analysis also showed that low FCGR3B copy number (< 2) is correlated with SLE, but not with RA [48]. These inconsistent findings could be due to different CNV analytical techniques or to the complex nature of RA.
Chemokine CC motif ligand 3 like-1
The chemokine CC motif ligand 3 like-1 (CCL3L1) protein binds to several proinflammatory cytokine receptors such as chemokine receptor 5. The CCL3L1 CNV has been well studied in many immune diseases, including RA [49], human immunodeficiency virus infection [50], SLE [51], and Kawasaki disease [52], but the results are conflicting. McKinney et al. [49] assessed the association of the CCL3L1 CNV with RA and type 1 diabetes mellitus (T1DM) susceptibility in Caucasians consisting of 1,136 RA cases and 1,470 controls from New Zealand (NZ) and the UK. They found that copy number more than two was a risk factor for RA in the NZ cohort (OR, 1.34) but not in the smaller UK cohort. Evidence for an association was found in the T1DM cohort (OR, 1.46) and in the combined RA/T1DM cohort (OR, 1.30). However, the Wellcome Trust Case Control Consortium study found no evidence of an association between the CCL3L1 CNV and RA or T1DM in 2,000 cases of each trait and 3,000 shared controls [10]. Thus, one cannot conclude anything about the role of the CCL3L1 CNV in autoimmunity because issues exist regarding the reliability of CNV detection assays, and the CCL3L1 locus is very complex.
β-Defensin
β-Defensins are small peptides encoded by DEFB genes in three main clusters, including 8p23.1, 20p13, and 20q11.1, which have antimicrobial activities against bacteria, fungi, and viruses [53]. They also have proinflammatory properties as chemotactic agents for dendritic cells, T cells, and neutrophils [54,55]. Hollox et al. [56] found that an increase in DEFB copy number was significantly correlated with psoriasis in 190 Dutch cases and 303 controls. Impaired induction of the epithelial β-defensins is related to Crohn's disease (CD), and defensins are underexpressed in patients with CD [57]. Fellermann et al. [58] hypothesized in 2006 that this result could be due to a low DEFB copy number and performed genome-wide DNA copy number profiling of the human β-defensin 2 (HBD-2) gene. They found that the copy number distribution in patients with CD was shifted to a lower number compared with the control group (median of four copies for healthy individuals and three copies for patients with colonic CD). They also found that a lower HBD-2 gene copy number was correlated with reduced expression of mucosal HBD-2 mRNA (p = 0.033) [58]. In 2010, Bentley et al. [59] tried to validate the findings of Fellermann et al. [58], which was the only report on the association between the DEFB CNV and CD in a larger case-control cohort of Europeans. However, the results were opposite to the previous report. As both studies used the same quantitative PCR assay, the discrepancy is unlikely to have been caused by differences in methodology. That population stratification caused the discrepancy is also unlikely, as both groups studied similar Caucasian subjects. Bentley et al. [59] suggested that the false-positive findings in the study of Fellermann et al. [58] may have been due to the relatively small, phenotypically ill-defined study population. About 1 year later, Aldhous et al. [60] tried to verify the relationship between the DEFB CNV and CD using a more accurate and reliable paralog ratio test in a larger study population consisting of 1,000 UK CD cases and 500 controls. However, neither of the associations between high and low DEFB4 copy numbers and CD [58,59] were replicated [60]. Nevertheless, the possibility may still exist that the DEFB CNV is generally related to autoimmune disorders. A large study in two Chinese cohorts detected associations of increased DEFB4 copy number with SLE, and antineutrophil cytoplasmic antibody-associated small vasculitis [61]. Thus, DEFB genes should be studied for their involvement in diverse autoimmune diseases in various populations.
Immunity-related GTPase family M
The human immunity-related GTPase family M (IRGM) gene induces autophagy as a mechanism to remove intracellular mycobacteria [62], and an association between SNPs near IRGM and CD was detected by a GWAS [63]. McCarroll et al. [64] examined the experimental data of 270 HapMap samples to assess the existence of common CNVs near IRGM and identified a common 20-kbp deletion located 2.7 kbp upstream of IRGM. They genotyped this deletion-CNV in 685 North American case-control samples and found an increased frequency of the CNV (allele frequency, 15%; OR, 1.6; p < 0.01), which has been proposed to be a likely causal variant [65]. However, a study in a Japanese population found no association between IRGM variants and CD, suggesting an influence of population stratification on the pathogenic effects [66]. One alternative explanation is that the causal variant at this locus arose after the European-Asian split.
VPREB1
B-lymphocytes are key players in innate and adaptive immunity, and their impaired function can lead to autoimmune diseases [67]. Pre-B cell receptor (pre-BCR), which is composed of VpreB and λ5, is involved in positive and negative selection of autoreactivity and shaping the B-cell repertoire [68]. Perturbation of the pre-BCR-mediated checkpoints can contribute to the development of autoimmune disorders. Yim et al. [69] reported that the VPREB1 CNV is associated with susceptibility to RA, observing that the proportion of individuals with less than two copies of the VPREB1 gene was significantly higher in the patient group than that in the controls. Similarly, the proportion of individuals with more than two copies was significantly lower in the patient group than that in the controls [69]. Considering the biological importance of pre-BCR on immunity and the association between the VPREB1 CNV and RA, more studies based on larger samples may facilitate clinical translation of this CNV.
Combined effects of CNVs associated with autoimmune diseases
Although many CNVs have been associated with various autoimmune diseases, the impact of a single CNV seems to be relatively small. However, a combination of multiple CNVs is logically expected to exert stronger effects on phenotypes. Kim et al. [14] reported that all C4, RABGAP1L, and 10q21.3 deletion variants are associated with the risk for SLE, with ORs of 1.0 to 1.3. Notably, individuals with deletion CNVs at all three loci had a 5.5 times higher risk for SLE than those who were diploid at all three loci, and individuals with deletion CNVs at two loci had a 1.78 times higher OR than those diploid at both loci [14]. This synergistic effect has also been observed in patients with AS. Jung et al. [13] reported that five deletion CNVs of HHAT (1q32.2), HLA-DPB1 (6p21.3), PRKRA (2q31.2), EEF1DP3 (13q13.1), and 16p13.3 are associated with the risk for AS, and that the ORs were 1.5 to 2.1. However, individuals with deletion CNVs in four or more loci had an 18 times higher risk for AS than those being diploid in all five loci, and individuals with deletion CNVs at three and two loci had 12.2 and 7.3 times higher ORs, respectively, than those of diploid subjects [13]. Combining CNV markers with multiplex CNV detection technology will facilitate CNV-based clinical research and application of the CNVs.
CONCLUSIONS
Results from genomics studies can be used to categorize diseases of previously unknown origin into auto-immune diseases and help identify new pathogenic mechanisms for known diseases. We provide some insight into the associations between CNVs and auto-immune diseases. First, we found that some CNV loci are commonly implicated in various autoimmune diseases, such as Fcγ receptors in SLE and ITP and β-defensin genes in psoriasis and CD. This observation suggests that some autoimmune diseases may share CNVs as common risk factors. Thus, when a CNV locus is associated with a particular autoimmune disease, its potential associations with other autoimmune diseases should be examined. Second, interpopulation or interethnic differences exist in the effects of CNVs on phenotypes including disease susceptibility. Thus, associations found in one population should be replicated in another population of interest to set up CNV databases containing information on significant CNVs in each population. Meaningful functional studies will follow only after reliable data on population-specific CNVs are produced, which can be and translated into clinical applications.
Although identifying the environmental components that interact with host genetic factors is very important to properly understand autoimmunity and develop preventive and therapeutic measures, evidence suggests that CNVs are important clues to understand the susceptibility and pathogenesis of autoimmune diseases. However, many of the findings need to be replicated in independent populations or in different ethnic groups. The validity and reliability of detecting CNVs will improve quickly as genotyping technologies advance, which will support the required replication. In addition, combined interpretation of CNVs and other types of genetic variants may help us understand disease susceptibility and pathogenesis.
Acknowledgments
This study was supported by a grant from the Korean Health Technology R&D Project, Ministry for Health and Welfare, Republic of Korea (HI14C3417).
Notes
Conflict of interest: No potential conflict of interest relevant to this article was reported.