Differential gene expression analysis of ankylosing spondylitis shows deregulation of the HLA-DRB, HLA-DQB, ITM2A, and CTLA4 genes

Ankylosing spondylitis (AS) is a rare inflammatory disorder affecting the spinal joints. Although we know some of the genetic factors that are associated with the disease, the molecular basis of this illness has not yet been fully elucidated, and the genes involved in AS pathogenesis have not been entirely identified. The current study aimed at constructing a gene network that may serve as an AS gene signature and biomarker, both of which will help in disease diagnosis and the identification of therapeutic targets. Previously published gene expression profiles of 16 AS patients and 16 gender- and age-matched controls that were profiled on the Illumina HumanHT-12 V3.0 Expression BeadChip platform were mined. Patients were Portuguese, 21 to 64 years old, were diagnosed based on the modified New York criteria, and had Bath Ankylosing Spondylitis Disease Activity Index scores > 4 and Bath Ankylosing Spondylitis Functional Index scores > 4. All patients were receiving only NSAIDs and/or sulphasalazine. Functional enrichment and pathway analysis were performed to create an interaction network of differentially expressed genes. ITM2A, ICOS, VSIG10L, CD59, TRAC, and CTLA-4 were among the significantly differentially expressed genes in AS, but the most significantly downregulated genes were the HLA-DRB6, HLA-DRB5, HLA-DRB4, HLA-DRB3, HLA-DRB1, HLA-DQB1, ITM2A, and CTLA-4 genes. The genes in this study were mostly associated with the regulation of the immune system processes, parts of cell membrane, and signaling related to T cell receptor and antigen receptor, in addition to some overlaps related to the IL2 STAT signaling, as well as the androgen response. The most significantly over-represented pathways in the data set were associated with the “RUNX1 and FOXP3 which control the development of regulatory T lymphocytes (Tregs)” and the “GABA receptor activation” pathways. Comprehensive gene analysis of differentially expressed genes in AS reveals a significant gene network that is involved in a multitude of important immune and inflammatory pathways. These pathways and networks might serve as biomarkers for AS and can potentially help in diagnosing the disease and identifying future targets for treatment.


Background
Ankylosing spondylitis (AS) is a rheumatic disorder of the axial skeleton that is characterized by inflammation and osteoproliferation, leading to a gradual loss of spinal mobility [1]. First identified in the late 1600s, AS is a type of spondyloarthropathy that predominantly affects young men, with a higher prevalence among white Europeans and a lower prevalence among sub-Saharan Africans [2][3][4].
AS is a highly heritable condition, and familial history has been strongly correlated with its pathogenesis in a number of twin studies [5][6][7]. Since 1973, the HLA-B27 gene, a major histocompatibility complex (MHC) class I allele, has been recognized as a major genetic risk factor for AS [8]. In fact, HLA-B27 frequency in a certain population is correlated with the prevalence of spondyloarthropathies, including AS, among its individuals [9,10]. However, the disease is far from being monogenic in nature. In fact, the more than 100 genetic variants associated with AS account for no more than 30% of its heritability, indicating that much research needs to be carried out in this regard [11].
Despite being associated with several environmental and genetic factors, AS genetics and pathogenesis are poorly understood. The main aim of the current study was to improve the understanding of AS on the genetic level. Thus, in the present study, publicly available gene expression profiles of AS patients were analyzed using several bioinformatics techniques. A gene network was constructed that may serve as an AS gene signature for disease diagnosis and therapeutic target identification.

Data acquisition
The dataset used in the current study was obtained from The National Center for Biotechnology Information's (NCBI) Gene Expression Omnibus (GEO) repository (accession number GSE25101) [12].

Regulation of immune system process
Any process that modulates the frequency, rate, or extent of an immune system process.

Data analysis and pre-processing
Over 10,000 genes were identified as differentially expressed (DE) genes after using the GEO2R software that is available on NCBI's website (https://www.ncbi. nlm.nih.gov/geo/geo2r/). GEO2R facilitates the R-based analysis of GEO data and presents it as a list of genes that is sorted by significance [13]. Further case-control comparison analysis of this gene list was carried out using Microsoft Excel in order to identify the most DE genes in AS, yielding a total of 114 genes. After applying a strict p-value cut-off (p < 0.05), the final gene list consisted of 97 genes, 5 genes were removed for lack of identifier, and hence the final cured list of genes was 92. One of the genes was the HLA gene locus that contained 6 genes, and these 6 genes were each entered separately instead of a one locus. The final gene list contained 97 genes which was used for the final analysis and network construction.

Functional enrichment analysis
Following that, the 97 genes were entered into three databases for gene-gene network and pathway analysis: the Molecular Signatures Database (MSigDB), the geneMA-NIA database, and the Reactome Pathway database.
The MSigDB is a collection of annotated gene sets where a predefined gene set can be examined for overlap with the previously annotated genes in this database https://www.gsea-msigdb.org/gsea/msigdb/index.jsp. It is used with the Gene set enrichment analysis software (GSEA). GSEA is an analytical software that extracts insights from RNA expression analysis using gene sets, which are groups of genes with common function, location, or regulation (https://www.gsea-msigdb.org/gsea/  index.jsp). For this first tool, the 97 genes were compared for overlap with three main gene set categories in MSigDB, the Gene Ontology (GO), Hallmark collection, and the canonical pathways (CP). Similarly, GeneMANIA is an online tool that is used to predict the function of poorly understood genes as well as the genes that are most likely to be functionally similar (http://genemania.org/). For this second tool, the top 20 DE genes were input, in addition to the genes that were most downregulated (fold change < 0.6 and pvalue < 0.05) and the genes that were most upregulated (fold change > 1.24 and p-value < 0.05).
Lastly, the Search Tool for the Retrieval of Interacting Genes (STRING) database, v. 11.0, was used to analyze high-confidence (> 0.7) protein-protein network interactions among the 97 DE genes [19]. The STRING database scores protein-protein interaction information from public sources and complements this data with computational predictions (https://string-db.org/).

Identification of differentially expressed (DE) genes
Scrutinization of the 10,000 DE genes yielded 97 genes with a significant level of DE. All genes were deemed significant based on the parametric p-value only, as none were significant in terms of the adjusted p-value. Genes with no identifier were not included in the analysis.  Pathway analysis (MSigDB overlaps) Figure 1 illustrates the number of genes in each of the results of the overlap with the MSigDB genesets. Tables 1, 2, and 3 specify the genes in the overlaps. Namely, overlaps were most common for the CD3E, IL2RA, CD27, CTLA-4, and ICOS genes. It is interesting to note here that the top overlapping genes were associated with regulation of the immune system processes, parts of cell membrane, and signaling related to T cell receptor and antigen receptor, in addition to some overlaps related to the IL2 STAT signaling, as well as the androgen response.

Gene network analysis (GeneMania)
Three gene networks were constructed: one for the top 20 DE genes (Fig. 2), one for the most significantly downregulated genes (fold change < 0.6 and p-value < 0.05) (Fig. 3), and one for the genes that were most upregulated (fold change > 1.24 and p-value < 0.05) (Fig. 4)

REACTOME pathway investigation
The most significantly over-represented pathways in the AS samples were the "RUNX1 and FOXP3 which control the development of regulatory T lymphocytes (Tregs)" and the "GABA receptor activation" pathways (Fig. 5).

Protein-protein interaction (PPI) network
The STRING database, v. 11.0, was utilized to identify high-confidence protein-protein interactions (PPI) of the most DE genes. With a significantly low PPI enrichment value (4.83e −6 ), it can be ascertained that the submitted query items are at least partially biologically connected as a group (Fig. 6).

Discussion
HLA-B*27 is a major susceptibility allele for ankylosing spondylitis (AS), with over 95% of AS patients expressing this genotype. However, AS heritability is not fully understood, as the vast majority of individuals with the HLA-B*27 genotype do not develop AS [20]. An interplay between genetics and the environment is believed to be involved in AS, and its environmental trigger is thought to be ubiquitous [5]. In addition, the HLA-B*27 accounts for only around 20% of AS heritability, indicating the need for further scientific elucidation [20,21]. In this study, we identified several DE genes, upregulated genes, and downregulated genes in AS. The significantly downregulated genes (p < 0.05; FC < 0.6) in AS were namely several HLA class II genes, the ITM2A gene, and the CTLA-4 gene. The most downregulated genes in AS in the present study were HLA-DQ B1 and HLA-DR (B1, B3, B4, B5) genes as well as the HLA-DRB6 pseudogene. These genes were previously related to the inheritance in multiple sclerosis. For instance, epistasis between the HLA-DRB1 and HLA-DQB1 alleles was found to determine the susceptibility to multiple sclerosis [22]. The HLA-DQB1 gene is responsible for exogenous peptide presentation and encodes one of the two components of the HLA-DQ molecule, which regulates T cell response to islet cell autoantigens [23,24]. In fact, HLA-DQB1 alleles have been identified as susceptibility loci for type I diabetes, celiac disease, multiple sclerosis, and narcolepsy [22,[25][26][27][28]. With regard to AS, HLA-DQB1 was previously associated with the age of onset and severity of AS in a Han Chinese population [29]. However, in Tunisian AS patients, HLA-DRB1 and HLA-DQB1 were not directly linked to the disease [30]. Among HLA-B27-negative AS patients, negative associations were observed for the HLA-DRB1*15:01, HLA-DQB1*02:01, and HLA-DQB1*06:02 alleles, but a positive association was seen for the HLA-DRB1*11 allele [31]. In one large previous study, there was an independent association with variants in the HLA-A, HLA-DPB1, and HLA-DRB1 loci with AS [32]. It is likely that the role of these genes in AS is more complex and population related, and we must keep in mind that our dataset is derived from a Portuguese population. The second most downregulated gene was the integral membrane protein 2A (ITM2A) gene. ITM2A is primarily involved in the cellular differentiation of chondrocytes, myocytes, and osteoblasts as well as cellular autophagy [33][34][35]. In fact, enhanced ITM2A expression inhibits the chondrogenic differentiation of mesenchymal stem cells [36]. Also, ITM2A downregulation has been associated with decreased autophagy of breast and ovarian cancer cells [37,38]. ITM2A was reported to be consistently downregulated in a meta-analysis of four AS case-control datasets [39]. Our results confirm this.
The cytotoxic T-lymphocyte-associated protein 4 (CTLA-4) gene, also known as CD152, was the third most significantly downregulated gene in AS patients. This gene encodes the CTLA-4 surface glycoprotein, an inhibitory molecule that is constitutively expressed by T cells and which maintains tolerance to self-antigens [40]. Although the precise mechanisms are not fully understood, interactions of ligands with CTLA-4 serve to inhibit T cell responses [41]. Klocke et al. created a murine model for CTLA-4 deficiency by breeding mice in which CTLA-4 can be conditionally deleted in adulthood using tamoxifen treatment. The mice developed abnormal immune activation, multiorgan lymphocyte infiltration, and autoantibodies in a short period of time [42]. CTLA-4 polymorphisms have been significantly associated with autoimmune disorders, namely AS, Graves' disease, and rheumatoid arthritis, in British Caucasian, Irish Caucasian, Iranian, West Algerian, and West Mexican populations but not in a Turkish population [43][44][45][46][47]. It is interesting that the HLA genes and the CTLA-4 proteins were connected through the CD3E protein (Fig. 6). In the present study, the genes with the most overlaps in AS patients were the CD3E, IL2RA, CD27, CTLA-4, ICOS, ITGA6, SERPINE2, and CD2 genes. The CD3E gene encodes a part of the T cell receptor-CD3 (TCR/ CD3) complex, which is essential in linking antigen recognition to signal transduction. CD3E has been previously associated with type I diabetes in women, but it did not show a significant association with susceptibility to coeliac disease [48,49]. Moreover, a reduced number of immunoreceptor tyrosine-based activation motifs (ITAMs) in CD3E, among the other TCR/CD3 complex subunits, was associated with a breakdown in central tolerance in a murine model [50]. To our knowledge, CD3E was not previously related to AS.
Similarly, the interleukin 2 receptor subunit alpha (IL2RA) gene, also known as CD25, encodes a part of the IL-2 receptor, a cytokine signaling molecule that is important for microbial defense and self-recognition. In one study, IL2RA polymorphisms did not confer the risk of acute anterior uveitis with AS, while another study found that the IL2RA rs2104286 polymorphism was associated with intermediate uveitis but not HLA-B27associated acute anterior uveitis [51,52]. Transcriptome analysis identified IL2RA as a potential candidate gene for AS, but genotyping assays found no association between IL2RA polymorphisms and non-anterior uveitis susceptibility [53].
The CD27 gene encodes a tumor necrosis factor (TNF) receptor that is thought to act as a co-stimulatory immune checkpoint for T cell generation and maintenance [54]. The CD70 molecule belongs to the tumor necrosis factor ligand superfamily, and its transient availability on lymphocytes and dendritic cells activates B or T cells that express CD27 [55,56]. As part of a surface ligand-receptor pair with CD70, CD27 dysregulation has been linked to inflammatory and autoimmune disease [57]. Clonally expanded T cells belonged to the CD27−/CD28− populations in 2 AS patients, while 18 newly diagnosed AS patients were found to exhibit a decreased frequency of CD27+ B cells compared to controls [58,59].
The inducible T cell co-stimulator (ICOS) gene, also known as CD278, is part of the CD28 superfamily receptors, along with CD28 and CTLA-4. ICOS is essential for T cell function and activation, and it is expressed on activated CD4 and CD8 T cells [60,61]. It was discovered to have important roles in T cell proliferation and cytokine secretion after it was first identified as a marker of T cell activation [62]. ICOS superinduces IL10 and coinduces a number of other cytokines [63][64][65]. ICOS dysregulation has been previously implicated in a murine model of spondyloarthritis, where it was found to be essential to maintaining the balance between IL10 and IL17 expression [66]. B cells express the ICOS ligand (ICOSL) constitutively, which is inducible in monocytes, dendritic cells, fibroblasts, and endothelial cells. Blocking of the ICOS-ICOSL interactions inhibits both T-helper (Th) type 2 and Th1-dependent responses [60]. In an AS context, conflicting findings were reported with regard to CD4+/CXCR5+/ICOS+ T cell percentages in the peripheral blood of AS patients [67][68][69][70].
The integrin alpha-6 (ITGA6) gene, which encodes a transmembrane receptor, mediates adhesion on the cell-to-cell and cell-to-matrix levels, and it is commonly upregulated in a wide range of tumors [71]. In fact, inhibition of the ITGA6 gene led to the suppression of tumor angiogenesis and gallbladder cancer growth [72]. ITGA6 has been previously identified as part of the differential expression profile of rheumatoid arthritis, and it was among the most differentially expressed genes in a murine model of AS [73,74]. Similarly, the CD2 gene encodes a cell adhesion molecule that is expressed on the surfaces of natural killer cells and T cells as well as some subpopulations of normal B cells [75]. In rheumatoid arthritis, high levels of CD2 activation epitope T11-3 were reported on blood and synovial fluid T cells [76].
The serpin family E member 2 (SERPINE2) gene encodes a glycoprotein that is expressed on a diverse range of cell types, but it has a key role in the pathophysiology of cardiac fibrosis [77]. Little information is available on the role of the SERPINE2 gene in AS, but, in osteoarthritis, this gene has been reported to prevent cartilage catabolism via the inhibition of MMP-13 expression [78].
There are some limitations to the present study. Genes were only found to be significantly expressed when considering the parametric p-value and not the adjusted pvalue. Moreover, the analyzed dataset was obtained from a Portuguese population, which might limit the extrapolation of these findings to other populations. We could assume that these patients are ethnically homogenous, as the study by Pimentel-Santos did, but we also do not have information about other diseases that these patients may have had that may have interfered with the results. Thus, the results of our analysis need to be further validated in a different set of patients and their matched controls. Since the research was retrospective in nature, we were unable to collect any additional information about the patients or the microarray characteristics. This research discovered many new genes and networks. However, further studies may be needed to validate the effect of these genes/networks in a clinical setting.

Conclusion
We have described the differential expression profile of AS patients from a previously published dataset using recent analysis techniques. Our data pointed to the involvement of immune system processes and inflammatory processes, as well as other processes and pathways that were not described in the original article. These pathways and networks once validated might serve as biomarkers for AS and can potentially help in diagnosing the disease and identifying future targets for treatment.
Additional file 1: Table 1. List of submitted genes converted into NCBI (Entrez) genes through the MSigDB.
Additional file 2. List of DE genes downloaded from GEO2R.