Pathway analysis of smoking-induced changes in buccal mucosal gene expression

Background Cigarette smoking is the leading preventable cause of death worldwide, and it is the most common cause of oral cancers. This study aims to provide a deeper understanding of the molecular pathways in the oral cavity that are altered by exposure to cigarette smoke. Methods The gene expression dataset (accession number GSE8987, GPL96) of buccal mucosa samples from smokers (n = 5) and never smokers (n = 5) was downloaded from The National Center for Biotechnology Information's (NCBI) Gene Expression Omnibus (GEO) repository. Differential expression was ascertained via NCBI’s GEO2R software, and Ingenuity Pathway Analysis (IPA) software was used to perform a pathway analysis. Results A total of 459 genes were found to be significantly differentially expressed in smoker buccal mucosa (p  < 0.05). A total of 261 genes were over-expressed while 198 genes were under-expressed. The top canonical pathways predicted by IPA were nitric oxide and reactive oxygen production at macrophages, macrophages/fibroblasts and endothelial cells in rheumatoid arthritis, and thyroid cancer pathways. The IPA upstream analysis predicted that the TP53, APP, SMAD3, and TNF proteins as well as dexamethasone drug would be top transcriptional regulators. Conclusions IPA highlighted critical pathways of carcinogenesis, mainly nitric oxide and reactive oxygen production at macrophages, and confirmed widespread injury in the buccal mucosa due to exposure to cigarette smoke. Our findings suggest that cigarette smoking significantly impacts gene pathways in the buccal mucosa and may highlight potential targets for treating the effects of cigarette smoking. Supplementary Information The online version contains supplementary material available at 10.1186/s43042-022-00268-y.


Background
Tobacco smoking is responsible for one in six of all deaths from non-communicable diseases, leading experts to identify tobacco control as the highest priority public health intervention [1,2]. The prevalence of smoking has fallen around the world over the past three decades, but the absolute number of people who smoke has increased [3]. Despite a coordinated worldwide effort against smoking, there are around 1.1 billion current smokers, and it is expected that this number would reach 1.9 billion by 2025 if current smoking patterns are maintained [4].
Cigarette smoke contains over 5000 chemicals, of which 98 have been identified as carcinogenic or probably carcinogenic to humans [5]. The plethora of carcinogens in cigarette smoke perturbs biological pathways related to cellular proliferation, inflammation, and tissue injury, with strong links to various types of cancer [6,7]. In cancer patients, cigarette smoking has been associated with an increased symptom burden as well as a reduced efficacy of chemotherapy [6,8].
Smoking-induced differential gene expression has been well-documented in previous studies. In fact, smoking has a characteristic impact on the transcriptome, as it activates inflammatory and oxidative responses, changes airway structures, and alters gene expression across tissue types [9]. Previous studies have shown that cigarette smoking significantly alters the gene expression profiles of adipose tissue, buccal cells, nasal epithelial cells, lung tissue, and whole blood [10][11][12][13][14].
The aim of the current study is to broaden the understanding of the molecular pathways that are altered in buccal mucosa after exposure to cigarette smoke. Gene expression data from smokers and never smokers were analyzed via Ingenuity Pathway Analysis (IPA), which is a web-based software application that identifies new targets within the context of biological systems.

Data acquisition
The microarray dataset investigated in the present study was obtained from The National Center for Biotechnology Information's (NCBI) Gene Expression Omnibus (GEO) repository (accession number GSE8987). This dataset included gene expression data of buccal mucosa samples from smokers (n = 5) and never smokers (n = 5) [15]. Smokers were classified as those who had smoked at least 10 cigarettes per day and who had a cumulative smoking history of at least 10 pack years [15]. Table 1 shows the gene expression data samples included in the current study.
As per the original study by Sridhar et al., buccal mucosa samples were collected from the study participants by scraping the inside of their mouths with a concave plastic tool with serrated edges. Total RNA was extracted from buccal mucosa samples using TRIzol reagent (Invitrogen, Carlsbad, CA), and RNA integrity was assessed using a denaturing agarose gel. The Affymetrix Human Genome U133A (HG-U133A) Array (Affymetrix, Santa Clara, CA) was then used to profile the gene expression of the extracted total RNA samples [15].
The demographics of the 10 subjects varied with regard to sex, age, and race. Among the 5 smokers, the mean age was 36 years old (± 8 years), with 1 male and 4 females. Similarly, the mean age of the 5 never smokers was 31 years old (± 9 years), with 2 males and 3 females. In terms of race, the smoker group comprised 3 Caucasians and 2 African Americans, while the never-smoker group consisted of 2 Caucasians and 3 African Americans. Demographic data for individual subjects were not provided in the dataset, but statistical comparisons of the smoker and never-smoker groups revealed not significant p values for sex (p = 0.42), age (p = 0.36), and race (p = 0.40) [15].

Identification of differentially expressed (DE) genes
The GEO2R software, which is available on the NCBI website, was used to create a list of 15,000 differentially expressed genes between smoker and never-smoker buccal samples.
The 15,000 genes were inputted into a Microsoft Excel spreadsheet and sorted by significance (Additional file 1: Table S1). After applying strict cut-off criteria (p < 0.05 and absolute fold change between − 0.5 and 1.5), the list of DE genes was narrowed down to 459 genes.
The Bioconductor package Enhanced Volcano was used to visualize the 459 DE genes in the form of a labelled volcano plot [16].

Ingenuity pathway analysis (IPA)
The list of DE genes was inputted into IPA software (QIAGEN, Hilden, Germany), where the 'core analysis' function of the software was used to interpret the data in terms of canonical pathways and upstream regulators.

Pathway and functional enrichment analysis
The Bioconductor package clusterProfiler was used to carry out an over-representation analysis of the DE genes [17,18]. Similarly, the SIGnaling Network Open Resource 2.0 (SIGNOR 2.0) was used to explore the signaling networks that exist between the DE genes [19]. Figure 1 displays a volcano plot of the full list of DE genes. However, only 459 genes exhibited significant differential expression, with 261 genes found to be overexpressed and 198 found to be under-expressed. Figure 2 illustrates the chromosomal location, molecular class, and cellular location of the 459 DE genes. Chromosome 1 had the highest number of significantly DE genes (n = 63), followed by chromosome 6 (n = 30), chromosome 2 (n = 29), and chromosome 19 (n = 27). Similarly, the most represented molecular classes among the significantly DE genes were enzymes (19.6%) and transcription regulators (12%). Lastly, the majority of the significantly DE genes were located either in the cytoplasm (40.5%) or the nucleus (25.7%).  Table 2 lists the most significantly DE genes between smoker and never smoker buccal mucosa samples, showing that protein-coding genes occupy the top ranks in terms of significance. Figure 3A demonstrates the interplay between the DE oncological pathways, cytokines, and genes in smoker buccal mucosa, namely the IL2, EGFR, and ESR2 genes. Other than TIMP3, all the proteins in the pathway were predicted to be inhibited in smoker buccal mucosa. Figure 3B illustrates the results of an interaction network analysis of the DE genes in smoker buccal mucosa. Interestingly, the RPA1 gene was shown to have the highest number of interactions with the other DE genes in smoker buccal mucosa, but it did not have a significant level of differential expression (p > 0.05).

Upstream regulators
The top 20 regulators predicted by IPA included the TP53, APP, SMAD3, and TNF proteins as well   as the drug dexamethasone, among other molecules (Table 3). Figure 4 illustrates the data in Table 3 and emphasizes the predicted activation status of the top upstream regulators as revealed by IPA. As can be seen from Fig. 4, the most inhibited upstream regulator in smoker buccal mucosa is predicted to be the TP63 protein.
Dexamethasone was predicted to be a top upstream regulator and affected a total of 78 genes via indirect interactions (Fig. 5A). Likewise, microRNA-8 (miR-8) was found by IPA to be among the top upstream regulators to be activated, as miR-8 targeted 7 of the DE genes between smokers and never smokers (Fig. 5B). Of those genes, 5 (CCND2, ITGAV, QKI, RPS6KB1, and SMAD2) were under-expressed and 2 (BMP2 and CLDN3) were over-expressed.
Further analysis of the top upstream regulator proteins resulted in the construction of gene-gene (Fig. 6) and protein-protein (Fig. 7) interaction networks. Figure 6 shows that the 36.04% of the top upstream regulator proteins were predicted to have interactions with one another, 26.19% have shared protein domains, and 22.85% were co-expressed. Similarly, Fig. 7 shows that the TP53 and TNF proteins had the highest number of interactions with the other top upstream regulator proteins.

Enriched biological pathways
The most significant canonical pathway was identified as the nitric oxide and reactive oxygen production at macrophages ( Table 4).

Correlation of smoker buccal mucosa with other diseases
The DE genes in smoker buccal mucosa are significantly associated with cancer and organismal injury, among other diseases (Table 5). Pathway and functional enrichment analysis Figure 8 illustrates the most over-represented biological processes in smoker buccal mucosa. Interestingly, craniosynostosis and fibroid tumors were revealed to be the topmost significantly over-represented biological processes. Figure 9 shows the results of signaling network analysis of the 459 significantly DE genes, with the SMAD2 gene having the most interactions. SMAD2 is directly downregulated by the CTDSPL and SKIL genes and indirectly upregulated by the BMP2 gene.

Discussion
The most significantly differentially expressed (DE) protein-coding genes in smoker buccal mucosa were the CHD5, QKI, BATF3, and IL6R genes, which have previously reported associations with smoking and related diseases.
The CHD5 gene, which is a tumor suppressor gene that is preferentially expressed in the nervous system and testis, was significantly upregulated in smoker buccal mucosa [20,21]. CHD5 is believed to serve as a master regulator in tumor-suppressive networks, and CHD5 expression levels are strongly associated with the   prognosis of several cancers, including hepatocellular carcinoma and non-small cell lung cancer [20,[22][23][24].
One study found that a rare CHD5 variant, rs12564469-rs9434711, contributed to the risk of hepatocellular carcinoma, a risk effect which was statistically significant in alcohol drinkers but not smokers [25]. The QKI gene contributes to a number of human diseases, including cancers, myelin disorders, and schizophrenia, and it is a critical regulator of alternative splicing in cardiac myofibrillogenesis and contractile function [26]. QKI has also been identified as a master regulator of alternative splicing in human lung cancer cell lines, but no significant statistical association was found between QKI expression and smoking status in lung tumors [27,28]. Moreover, QKI was identified as a significantly altered gene in the ciliated epithelial cells of lungs affected by chronic obstructive pulmonary disease (COPD), a disease that is primarily caused by tobacco smoking [29].
The BATF3 gene belongs to the AP-1 transcription factor family, whose members respond to a range of pathological and physiological stimuli by mediating gene expression [30]. BATF3 controls the differentiation of dendritic cells, inhibits the differentiation of regulatory T cells, and critically regulates the development of memory T cells [31,32]. BATF3 expression in the lungs was necessary in order to induce protection against allergic airway inflammation through tolerization with Helicobacter pylori extract [33]. Moreover, the acute inhalation of electronic cigarette smoke by healthy never smokers led to the significant upregulation of BATF3, among other genes that play a role in promoting tumorigenesis [34].
The IL6R gene is a pleiotropic regulator of both acquired and innate immune responses, and it is believed to be expressed in the lungs [35]. There have been conflicting findings regarding the benefits of anti-IL-6R therapy for COVID-19-induced acute respiratory distress syndrome [36,37]. In the context of smoking, exposure to cigarette smoke led to increased IL6R mRNA levels in primary bronchial epithelial cell lines [38]. Moreover, a certain IL6R haplotype (rs6684439-rs7549250-rs4129267-rs10752641-rs407239) has been associated with a lower COPD risk in a Mexican Mestizo population, while the IL6R variant Asp358Ala did not show any association with COPD [39,40]. Pseudogene expression was also altered in smoker buccal mucosa, most notably in the upregulation of FMO6P, ZNF259P1, and ZNF702P and the downregulation of ALDOAP2 and PNLIPRP2. FMO6P has significant sequence homology with the FMO3 gene, the latter of which functions to metabolize a small amount of nicotine [41]. A single nucleotide variation in the FMO6P pseudogene, rs6608453, was associated with nicotine dependence in African Americans [42]. Likewise, ALDOAP2 was over-expressed in both healthy and non-healthy smokers compared to non-smokers, while exposure to cigarette smoke resulted in the upregulation of the PNLIPRP2 polymorphic pseudogene in a murine model [43,44]. In contrast, ZNF259P1 and ZNF702P did not have previously reported associations with smoking. ZNF259P1 was significantly correlated with the tumor size of primary lung adenocarcinomas, while ZNF702P was found to be upregulated after BCL2L10 knockdown in two ovarian cell lines [45,46].
Analysis of upstream regulators revealed that the tumor protein 53 (TP53) gene was the most significantly DE regulator in smoker buccal mucosa. TP53 contains cellular proliferation by guarding against genomic mutation, and TP53 mutations are among the most common genetic alterations in human cancers [47]. Tobacco smoking is known to influence TP53 mutation patterns and frequencies in lung cancer and urothelial cell carcinoma patients [48,49]. In fact, a large proportion of TP53 mutations in the lung cancers of smokers were G → T transversions, a primary mutagenic signature that is caused by DNA damage from tobacco smoke [50].
The most significant canonical pathway identified by IPA was the "nitric oxide and reactive oxygen production at macrophages". Nitric oxide and reactive oxygen species are essential for maintaining redox balance, but they also act in pathological processes [51]. Tobacco smoke contains large numbers of free radicals, including nitric oxide and reactive oxygen species (ROS), that cause oxidative stress on the cellular and sub-cellular levels [52,53]. In turn, smoking-induced oxidative stress activates inflammatory response pathways that produce endogenous ROS at the site of oxidative stress, potentially causing further oxidative damage to that site [53]. Smoking also reduces the production of nitric oxide while also elevating the production of ROS in endothelial cells [54,55]. Smoking-induced ROS production is especially concerning as it may contribute to the progression of endometrial adenocarcinoma [56]. Among the DE genes, those associated with craniosynostosis and fibroid tumors were over-represented in smoker buccal mucosa.
Craniosynostosis, which is caused by the premature fusion of cranial sutures, is the second-most common cranio-facial anomaly [57]. Smoking during pregnancy was associated with an increased risk of craniosynostosis, while exposure to secondhand smoke modestly increased the risk of this birth defect [58]. Maternal smoking impacts cranio-facial development by acting upon variant alleles of the transforming growth factor alpha (TGF-α) gene, and genetic variation of the TGF-α gene is associated with increased risk of cranio-facial defects [59,60].
Fibroid tumors are non-cancerous growths that develop inside or on the uterus and are the most common type of pelvic tumor detected in women [61]. Previous studies that investigated the impact of smoking on fibroid tumors yielded conflicting results. Earlier studies suggested that smoking had a protective effect against fibroid tumors, but subsequent studies have shown either a negative effect or no relationship at all [61,62]. It is worthwhile to note that smoking has been shown to have an anti-estrogenic effect in women, resulting in an earlier natural menopause as well as protective associations with the risk of estrogen-related cancers [63,64].
Pathway network analysis revealed that the SMAD2 gene had the highest number of interactions with other DE genes, and it was also a target of miR-8. SMAD3 was predicted by IPA to be an inhibited upstream regulator. The SMAD Family Member 2 (SMAD2) gene encodes for a protein that is vital for early development, and SMAD2 mutations were associated with complex cranio-facial defects in a murine model [65]. SMAD2, SMAD3, and SMAD4 mediate the signal transduction of transforming growth factor-β (TGF-β) superfamily members, the latter of which induce a range of effects that involve cellular differentiation, proliferation, migration, and apoptosis [66].
The present study is affected by a few limitations. The sample size was relatively small, and the patient samples differed in terms of sex and race, which could confound the interpretation of the genetic variation. Additionally, several differentially expressed genes in smoker buccal mucosa were uncharacterized or unmapped to pathways, meaning that their effects are not considered in the current analysis.

Conclusion
The current findings signify the importance of inflammatory response and oxidative stress as a major component of smoking-induced tissue injury. Most significantly, nitric oxide-related inflammation stands as one of the canonical pathways underlying genetic and molecular pathways changes coupled with exposure to cigarette smoke. Future lines of research should focus on validating the results of the current study in a larger population to ascertain potential therapeutic targets in the context of smoking-induced damage.