Comprehensive analysis of an mRNA co-expression network and a ceRNA network reveals potential prognostic biomarkers in oral squamous cell carcinoma

Background Oral squamous cell carcinoma (OSCC) is a prevalent and aggressive oral cancer with a poor prognosis. Its polygenic risk is likely influenced by complex transcriptional disorders involving networks of co-expressed and functionally related genes, though such investigations are limited in OSCC. Methods We analyzed the GSE37991 dataset, comprising 40 OSCC and 40 normal oral tissue samples from the Gene Expression Omnibus. Tumor-specific modules were identified using weighted correlation network analysis (WGCNA), leading to the selection of hub mRNAs and lncRNAs. These lncRNAs were used to construct lncRNA–mRNA and competing endogenous RNA networks. We further examined the expression profiles and survival data of these genes from the Cancer Genome Atlas. Prognostic markers were identified and validated through 5-year survival analysis and Cox proportional hazards modeling. RT-qPCR was used to validate the expression levels in clinical OSCC tissues. Results We identified 1847 differentially expressed genes in OSCC tissues. WGCNA revealed four OSCC-specific modules, screening 120 hub mRNAs and five hub lncRNAs. Two prognostic markers (AQP5, IL-26) from hub mRNAs and three (FRMD5, INHBB, GUCY1A3) from the lncRNA–mRNA network were associated with survival. Validation showed lower expression of AQP5 and GUCY1A3, and higher expression of FRMD5 and INHBB in OSCC compared to normal tissues.


Introduction
Oral squamous cell carcinoma (OSCC) is one of the most aggressive malignancies, characterized by a high incidence rate, locoregional metastases, and resistance to existing treatments.These factors contribute to a poor prognosis for patients with OSCC [1][2][3].OSCC is highly prevalent and primarily linked to tobacco and excessive alcohol use and infection with the human papillomavirus (HPV) [4][5][6][7][8].The American cancer society estimates for oral cavity and oropharyngeal cancers in the United States for 2024 are about 58,450 new cases and 12,230 deaths.Squamous cell carcinoma (SCC) accounts for 90% of head and neck cancers, originating from the epithelial lining of the oral cavity, pharynx, and larynx.OSCC is more prevalent in men and adults over 50, with the highest incidence in South and Southeast Asia due to area nut consumption.The global incidence of OSCC is rising, especially among younger populations, with a projected 30% annual increase by 2030, driven by lifestyle changes and the growing prevalence of HPV-related oropharyngeal cancer.Various factors, from unhealthy behaviors to viral infections, contribute to the complexity and heterogeneity of OSCC pathogenesis.While, large-scale genomics [9], transcriptomics [10], and epigenomics [11] have identified potential molecular targets, further investigation is warranted to explore the networks of coexpressed genes associated with OSCC prognosis, given its nature as a multi-step chronic disease.
In the era of big data, high-throughput data analysis and prioritized information network screening are emerging trends in cancer research.Unlike previous studies focusing on individual molecules and pathways, high-throughput data analysis emphasizes networks of co-expressed and functionally related genes.Various tools and algorithms, such as weighted correlation network analysis (WGCNA), have been developed to identify key targets.WGCNA, a systemic biological algorithm, constructs coexpression gene networks using high-throughput gene expression profiles to identify highly synergistic gene sets [12].This tool effectively compares differentially expressed genes (DEGs) and identifies specific genes in cancer tissues, facilitating the exploration of gene interactions in functional networks throughout cancer progression.WGCNA has been widely applied to various cancers [13][14][15][16], including OSCC [17].
With the vast amounts of RNA data uncovered through contemporary biological research, new terms and concepts, such as competing endogenous RNAs (ceRNAs), continually emerge.CeRNAs represent a regulatory mechanism describing the interactions between RNAs, including pseudogene transcripts, long noncoding RNAs (lncRNAs), circular RNAs (circRNAs), and mRNAs.They regulate gene expression by competing with microRNA response elements (MREs) [18].Dysregulation of the ceRNA network is intricately linked to the initiation and progression of various cancers [19][20][21].Investigating ceRNA interactions offers novel insights into cancer pathogenesis and potential therapeutic approaches.
In this study, we used a comprehensive and stepwise approach to uncover the potential molecular mechanisms of OSCC progression.Initially, we utilized GSE37991 microarray data to screen DEGs between tumor tissues and normal controls, identifying their biological functions and related pathways through gene ontology (GO) and Kyoto Encyclopedia of genes and genomes (KEGG) analyses.We then applied WGCNA to identify tumor-specific modules and their core genes.Within these modules, we focused on lncRNAs to construct a competing ceRNA co-expression network.We meticulously analyzed the interactions between key lncR-NAs and mRNAs, shedding light on the roles of lncRNAs in OSCC oncogenesis.To evaluate, clinical relevance, we conducted survival analysis and univariate and multivariate Cox proportional hazard regression analyses on the screened mRNAs and lncRNAs.Finally, we validated the expression levels of hub genes in clinical OSCC patient tissues.The workflow of this study is shown in Figure S1.This study may led a light on the five hub genes in the pathogenesis of OSCC, and might be beneficial to further molecular mechanisms investigation and development of targeting therapeutics for OSCC.

Data acquisition and preprocessing
The expression profile GSE37991 was downloaded from the GEO database (www.ncbi.nlm.nih.gov/ geo/), and was measured using Affymetrix Human Genome U133 Plus 2.0 Array (Affymetrix, Santa Clara, CA, USA).GSE37991 contains 40 OSCC samples and 40 normal oral tissues samples.Background adjustment, quantile normalization, and summarization of the raw data were conducted using the R software Affy package [22].The R software limma package [23] was used to identify DEGs using the empirical Bayes method.Genes with p < 0.05 and |log (fold change, FC)|> 1 were considered statistically significant DEGs.

Functional enrichment analysis of DEGs
To understand the biological significance of the upregulated and down-regulated genes, we used the database for annotation, visualization and integrated discovery (DAVID) for gene ontology (GO) analyses, covering biological process (BP), cellular component (CC), and molecular function (MF), as well as KEGG enrichment analyses [24].Enrichment was considered significant when p < 0.05.

WGCNA module construction and visualization
WGCNA is a systems biology algorithm that identifies correlation patterns among genes in high-throughput data, effectively revealing modules and key hub genes [12].In this study, we selected genes with p < 0.05 and FC > 1.5 or FC < 0.667 for WGCNA to reduce computational complexity and enhance data signals without losing important information [25].After screening for outliers in cancerous and normal tissues using WGCNA, we constructed co-expression networks using soft thresholds (scale-free network structure R 2 = 0.85) for both cancerous and normal samples.Cancer set-specific modules were selected for further analysis.The top 30 genes with the highest intramodular connectivity within these modules were considered hub genes, and weighted gene coexpression networks was constructed by Cytoscape [26].

Construction of lncRNA-mRNA co-expression network
Human lncRNA data were sourced from the HUGO Gene Nomenclature Committee (HGNC).LncRNAs within each module were filtered by Entrez ID.Coexpression relationships between lncRNAs and mRNAs were used to construct lncRNA-mRNA co-expression network, which was visualized with Cytoscape.

Survival analysis and univariate analysis
Fragments per kilobase million (FPKM) data for head and neck tumors were obtained from TCGA (https:// portal.gdc.cancer.gov/) database.A 5-year survival analysis was performed after log2 conversion including 120 hub genes (top 30 genes from each tumor-specific module) and genes from the lncRNA-mRNA co-expression networks.Cases with overall survival (OS) > 1 month were analyzed to avoid including patients with advanced tumors at diagnosis, which may skew results.Patients were classified into high and low expression groups based on median gene expression levels, and genes detected in < 50% of samples were excluded.Univariate Cox regression analysis, conducted with the survival package [29], identified genes with prognostic value.

Cox proportion hazard model multivariate analysis
To refine the prognostic gene signature, Cox Proportional Hazards Model Multivariate Analysis was performed using the survival R package.Both hub genes and genes from the lncRNA-mRNA co-expression networks were analyzed separately.The final model, containing only significant prognostic genes (p < 0.05), was visualized using the survminer R package [30].The analysis started with all included genes and sequentially eliminated those not statistically significant.

Identification of DEGs and gene functional enrichment analysis
A total of 1847 DEGs were identified between OSCC and normal tissue samples, with 697 up-regulated and 1150 down-regulated (Figure S2).GO and KEGG enrichment analyses of these genes were performed using DAVID, with the top ten terms (p < 0.05) were shown in Fig. 1.GO analyses revealed that up-regulated genes were mainly associated with processes crucial to development, such as collagen and extracellular matrix (ECM) tissue decomposition, cell adhesion, and the inflammation (Fig. 1A-C).Down-regulated genes were primarily related to muscle contraction and exogenous metabolic processes (Fig. 1E-G).KEGG pathway analysis showed that up-regulated genes were enriched in pathways related to ECM receptor interaction, PI3K-AKT signaling, cytokinesreceptor interactions, amoebiasis, rheumatoid arthritis and systemic lupus erythematosus (Fig. 1D).Down-regulated genes were enriched in pathways such as salivary secretion, tight junctions, calcium signaling, PPAR signaling, metabolic pathways, and cytochrome P450 enzymes drug metabolism, retinol and tyrosine metabolism (Fig. 1H).

Construction and analysis of WGCNA cluster tree
WGCNA analysis was performed using genes with |log(FC)|> 0.585 and p < 0.05.For the scale-free network structure, soft thresholds were chosen when the initial R 2 value exceeded 0.9 (Fig. 2A for normal and Fig. 2B for tumor).Co-expression modules were calculated, with different colors representing gene modules and gray indicating unmerged genes.Seventeen gene modules were identified in normal tissue (Fig. 2A) using a soft threshold of 11 and a minimum module size of 30.Twenty gene modules were identified in OSCC tissue (Fig. 2B) using a soft threshold of six and a minimum module size of 30.
Additionally, we conducted a repeated analysis within each module of the two cluster trees (Fig. 3A).The OSCC modules in light cyan, green, cyan, and salmon did not overlap with any normal modules, suggesting that these modules are specific to OSCC.Using intramodular connectivity to identify key genes, we selected the top 30 as hub genes and visualized them with Cytoscape (Fig. 3B,C).

Selection of lncRNAs from tumor-specific modules
We selected lncRNAs from the hub sub-network to construct a lncRNA-mRNA co-expression network (Fig. 4A-D).The following lncRNAs were chosen for further analysis: LOC441426 (cyan module), TTTY2 (light cyan module), PART1 (salmon module), MEG3 and C10orf85 (green module).These lncRNAs were used to build the lncRNA-mRNA co-expression network.

Construction of a ceRNA network
CeRNAs regulate microRNA-induced gene silencing by binding to microRNAs through microRNA response elements (MREs), thus serving as a communication network among RNA transcripts.We predicted interactions between lncRNAs and microRNAs, as well as between microRNAs and mRNAs.Overlapping predictions within these interactions and the lncRNA-mRNA co-expression network were identified, and an lncRNA-microRNA-mRNA ceRNA network was constructed (Fig. 5).

Expression of five hub genes in clinical OSCC tissues
We analyzed the expression levels of five hub genes in three normal and four OSCC tissue samples.The results are shown in Fig. 6B.AQP5 and GUCY1A3 were found to be expressed at lower levels in OSCC tissues as compared to the normal samples, while FRMD5 and INHBB showed higher expression levels in OSCC samples.IL-26 did not exhibit a significant difference between OSCC and normal tissues.These findings suggest potential new marker genes for OSCC tumorigenesis.

Discussion
OSCC is a prevalent and aggressive malignancy with a 5-year survival rate of 50% [31][32][33][34].Despite advancements in surgical methods, radiotherapy, and chemotherapy improving patient quality of life, the prognosis for OSCC remains poor [2].Thus, exploring its molecular mechanisms is crucial for developing effective prevention and treatment strategies.In this study, WGCNA was employed to identify OSCC-specific modules, revealing 120 hub mRNAs and five hub lncRNAs.We constructed lncRNA-mRNA and ceRNA co-expression networks using these hub lncRNAs.By analyzing FPKM data from the TCGA database and applying survival analysis with the Cox proportional hazard model, we validated these hub mRNAs and lncRNA-mRNA network genes.Notably, AQP5 and IL-26 (from hub mRNAs), and FRMD5,  INHBB, and GUCY1A3 (from the lncRNA-mRNA network) were significantly associated with patient prognosis, suggesting their potential as prognostic biomarkers for OSCC.This study enhances our understanding of OSCC's molecular mechanisms and could lead to more precise diagnostic methods and targeted therapies.
AQP5, a member of the aquaporin (AQP) family of membrane proteins, facilitates water and glycerol transport and is primarily expressed in secretory glands and certain epithelial tissues [35][36][37].Its overexpression is associated with increased cancer cell proliferation, metastasis, and migration in various cancers [38][39][40][41].In OSCC, AQP5 is down-regulated compared to normal tissues and exhibits high intramodular connectivity in the OSCC regulatory network.Cox proportional hazard model analysis identifies AQP5 as an independent prognostic factor for OSCC, suggesting it plays a significant role in OSCC progression.Furthermore, AQP5 may  dendritic cells (pDCs) to secrete type I interferon [44][45][46].Its role in cancer is debated: it can inhibit colorectal cancer cell proliferation [44] or enhance gastric cancer cell survival [47].In this study, IL-26 was down-regulated in OSCC and associated with poor prognosis.While, IL-26 can stimulate type I interferon secretion and potentially activate dormant cancer stem cells, its diminished expression might reduce drug efficacy [48] and worsen survival outcomes.However, the lack of a significant expression difference between OSCC and normal tissues may be due to the small sample size, and further research with additional samples is ongoing.
Cox proportional hazards model multivariate analysis revealed that FRMD5, INHBB, and GUCY1A3, identified from the lncRNA-mRNA network, are significant prognostic factors for OSCC.FRMD5, a member of the FERM protein family, was first identified as a downstream target of the mutant p53R273H protein [49].It is highly expressed in colorectal cancer and predicts poor prognosis [50], a finding consistent with our OSCC results.INHBB is a cytokine in the TGF-β family [51].Overexpression of INHBB is associated with poor survival in OSCC.INHBB forms activin B, which promotes OSCC metastasis by regulating EMT-related genes [51].GUCY1A3, which encodes the α1 subunit of soluble guanylate cyclase [52], has limited research linking it to cancers.One study [53] showed that GUCY1A3 overexpression increases cyclic GMP production, promoting angiogenesis in gliomas.Our study found GUCY1A3 down-regulated in OSCC, with low expression predicting poor prognosis.Additionally, the lncRNA C10orf85 regulates FRMD5, INHBB, and GUCY1A3.Although the precise role of C10orf85 and its regulatory mechanisms with these genes in OSCC are not fully understood, these findings suggest it is a key player.
We also constructed a ceRNA network with five hub lncRNAs (LOC441426, TTTY2, PART1, MEG3, and C10orf85).PART1, MEG3, and C10orf85 regulate top hub mRNAs through various miRNAs.For instance, PART1 positively regulates SLC38A3 and PANK1 in the salmon module, while ITGA11 in the green module is co-expressed with lncRNA C10orf85 and MEG3.These insights provide new strategies for exploring OSCC pathogenesis.
We acknowledge several limitations in our study that should be considered when interpreting the results.Although we performed extensive data collection and computational analysis, the findings are primarily based on data mining techniques and may require further validation.Specifically, while we validated the expression of five hub genes in clinical samples compared to normal tissues, these results have not been confirmed through experimental studies such as in vivo animal models or functional assays.Additionally, the number of patient samples tested is limited.To robustly establish the significance of our findings, it is essential to conduct comprehensive experimental research, including cellular and animal model studies.Such validation is crucial for corroborating our results and understanding their biological relevance in the context of OSCC.

Conclusions
In summary, we used WGCNA to identify tumorspecific modules and hub genes in OSCC.We constructed a functional competing endogenous RNA (ceRNA) network based on these modules and identified key genes-AQP5, IL-26, FRMD5, INHBB, and GUCY1A3-that are associated with OSCC patient survival and validated the differential expression levels of these genes in patient and normal tissues.The findings may suggest that these genes could serve as novel biomarkers for prognostic assessment in OSCC and may offer new targets for therapeutic intervention.

Fig. 1
Fig. 1 Gene enrichment analysis of up-regulated and down-regulated DEGs.GO enrichment analysis of up-regulated DEGs: A Biological Process (BP); B Cellular Component (CC); C Molecular Function (MF).KEGG analysis of up-regulated DEGs: D. GO enrichment analysis of down-regulated DEGs: E BP; F CC; G MF. KEGG analysis of down-regulated DEGs: H.The x-axis shows the GeneRatio of each term and the y-axis shows the GO and KEGG pathway terms

Fig. 2
Fig. 2 WGCNA analysis.Analysis of the scale-free fit index for various soft-thresholding powers for A Normal and B Tumor.C, D Clustering dendrograms of genes for tumor and normal.Different colors are used to represent gene modules, where gray represents genes that cannot be merged

Fig. 3 Fig. 4
Fig. 3 Identification of hub gens.Module comparisons: A Comparison of tumor and normal modules.The vertical axis represents tumor modules, while the horizontal axis represents normal tissue modules.The depth of red indicates the degree of overlap between modules, and the numbers in each rectangle show the number of overlapping genes.Top 30 hub genes: B, C The top 30 hub genes in tumor-specific modules (Cyan, Green, Lightcyan, and Salmon) are shown.Nodes represent hub genes, with node size reflecting intramodular connectivity, edge thickness indicating weight, and node color representing log(Fold Change).The top five hub genes are centrally located in the network

Fig. 5
Fig. 5 Construction of ceRNA co-expression network.A LncRNA C10orf85; B LncRNA MEG3; C LncRNA PART1; The rhombus represents lncRNA, the round represents RNA, and the V-shaped represents microRNAs.Dark blue indicates down-regulation and red indicates up-regulation.The color of the microRNA node is meaningless

Fig. 6
Fig. 6 Expression of five hub genes in clinical OSCC versus normal tissues.A Overall survival analyses of five hub genes in the OSCC TCGA dataset.B Expression levels of AQP5, IL-26, FRMD5, INHBB and GUCYLA3 in OSCC tissues were detected by RT-qPCR.Statistically significant genes are highlighted in the figure, with red lines indicating high expression and blue lines indicating low expression