In silico analysis of non-synonymous single nucleotide polymorphisms of human DEFB1 gene

Single nucleotide polymorphisms (SNPs) play a significant role in differences in individual’s susceptibility to diseases, and it is imperative to differentiate potentially harmful SNPs from neutral ones. Defensins are small cationic antimicrobial peptides that serve as antimicrobial and immunomodulatory molecules, and SNPs in β-defensin 1 (DEFB1 gene) have been associated with several diseases. In this study, we have determined deleterious SNPs of the DEFB1 gene that can affect the susceptibility to diseases by using different computational methods. Non-synonymous SNPs (nsSNPs) of the DEFB1 gene that have the ability to affect protein structure and functions were determined by several in silico tools—SIFT, PolyPhen v2, PROVEAN, SNAP, PhD-SNP, and SNPs&GO. Then, nsSNPs identified to be potentially deleterious were further analyzed by I-Mutant and ConSurf. Post-translational modifications mediated by nsSNPs were predicted by ModPred, and gene-gene interaction was studied by GeneMANIA. Finally, nsSNPs were submitted to Project HOPE analysis. Ten nsSNPs of the DEFB1 gene were found to be potentially deleterious: rs1800968, rs55874920, rs56270143, rs140503947, rs145468425, rs146603349, rs199581284, rs201260899, rs371897938, rs376876621. I-Mutant server showed that nsSNPs rs140503947 and rs146603349 decreased stability of the protein, and ConSurf analysis revealed that SNPs were located in conserved regions. The physiochemical properties of the polymorphic amino acid residues and their effect on structure were determined by Project HOPE. This study has determined high-risk deleterious nsSNPs of β-defensin 1 and could increase the knowledge of nsSNPs towards the impact of mutations on structure and functions mediated by β-defensin 1 protein.


Background
Genetic factors like single nucleotide polymorphisms (SNPs) are one of the most common types of genomic sequence variations and potentially alter disease outcomes. Over a million SNPs are known and lie in the coding regions or within introns and intergenic sequences of DNA that are not coded/translated into amino acids directly. Missense non-synonymous SNPs (nsSNPs) are of particular interest as a single nucleotide change codes for different amino acids and may affect functions of the encoded protein and disease outcome. The increasing interest in unraveling the relationship between variation in genetic sequences and its effect on a protein's structure and function has led to several in silico methods. Experimental techniques to identify the effect of multiple nsSNPs are highcost, laborious, and involve lots of time, and therefore, in silico approaches can be used as preliminary tools to analyze the effect of nsSNPs.
Antimicrobial peptides (AMPs) are important defense peptides of the host and act at the frontline of innate responses displaying a broad spectrum of antimicrobial activity against different microorganisms including bacteria, viruses, fungi, and protozoa [1]. Human beta-defensins (β-defensins) are AMPs that not only protect mucosal membranes from microbial attack but also link two arms of immune system-innate and adaptive immunity-by performing numerous immune-related functions including activation, multiplication, and movement of immune cells to the area of infection, balancing of cytokine production, and regulation of repair and wound healing mechanisms [2]. Thus, defensins play a pivotal role in sustaining physiological homeostasis and act as a key guard of the tissues of the oral cavity, respiratory tract, skin, and digestive and genitourinary tracts against the pathogens [3]. Defensin's antimicrobicidal action is dependent on their cationic charge and also amphipathic organization of structure that enables it to bind to negatively charged bacterial membranes and then permeabilizing them to form pores that destroy bacteria [4]. One of the most important defensins is β-defensin 1 (DEFB1 gene), and polymorphisms in the DEFB1 gene have been involved and related with several diseases including allergies, cystic fibrosis, and cancer [5]. Multiple SNPs have been identified in the DEFB1 gene, and such SNPs may modify the expression or activity of β-defensin 1 and could affect the susceptibility of individuals to infection [6]. In this study, we systematically collected missense nsSNPs of the DEFB1 gene and screened the nsSNPs using multiple bioinformatics software tools to analyze the damaging nature of the nsSNPs of the DEFB1 gene.

Dataset collection
The NCBI SNP database (https://www.ncbi.nlm.nih.gov/ SNP/) was used to access the SNPs of the DEFB1 gene (accessed 16 May 2020). The primary sequence of the protein (Uniprot accession number: P60022) encoded by the DEFBI gene was obtained from the UniProt database (accessed 16 May 2020). Only missense nsSNPs were chosen from the NCBI SNP database as they can modify the sequence of the amino acid encoded by the protein and have the potential to disturb the structural arrangement and function of the proteins.

Prediction of deleterious nsSNPs by various bioinformatics tools
Several online-based tools were employed to determine deleterious missense nsSNPs associated with the DEFB1 gene. First, missense nsSNPs of the DEFB1 gene selected from the NCBI SNP database were subjected to Sorting Intolerant from Tolerant (SIFT; http://sift.bii.a-star.edu.sg/) and Polymorphism Phenotyping v2 (PolyPhen v2; http:// genetics.bwh.harvard.edu/pph2/) tools. SIFT makes a prediction whether substitution of an amino acid has damaging effect on the function of the protein based on sequence alignment and homology [7]. SIFT gives a probability score of observing a new amino acid at that specific position, and a score less than or equal to the threshold of 0.05 is deleterious, and a prediction greater than the threshold is tolerant [7][8][9]. PolyPhen v2 predicts the possible impact of nsSNPs based on sequence, structural conformation, and phylogenetic features characterizing the substitution, and the output of the PolyPhen v2 is a prediction showing the substitution as probably damaging, possibly damaging, or benign, along with a numerical score ranging from 0.0 (benign) to 1.0 (damaging) [10].
The nsSNPs identified to be damaging by both SIFT and PolyPhen v2 tools were then subjected to Protein Variation Effect Analyzer (PROVEAN; http://provean.jcvi.org), Screening for non-acceptable polymorphisms (SNAP; https://rostlab.org/services/snap/), Predictor of human deleterious single nucleotide polymorphism (PhD-SNP; http://snps.biofold.org/phd-snp/phd-snp.html), and SNPs&GO (GO-Gene Ontology; http://snps-and-go.biocomp.unibo.it/snps-andgo/) web-based tools. PROVEAN analyzes the consequence of sequence variation on the function carried out by the protein [11], and PROVEAN score ≤ − 2.5 indicates that the amino acid variant has damaging effect whereas variant having score > − 2.5 is considered to have neutral effect on the protein. SNAP is a neural network-based method, and using in silico-acquired sequence-based information classifies all nsSNPs into damaging and neutral [12]. PhD-SNP is based on support vector machines (SVMs) which predicts whether a point mutation is a neutral polymorphism or is associated to genetic disorders in humans [13]. SNPs&GO can determine whether a variation is disease-associated or not by gathering unique information derived from sequence of the protein, phylogenetic relation, and function encoded by the protein [14].
Prediction of nsSNPs effect on protein stability by I-Mutant 3.0 I-Mutant 3.0 tool was used to determine the impact of nsSNPs on protein stability (http://gpcr.biocomp.unibo. it/cgi/predictors/I-Mutant3.0/I-Mutant3.0.cgi). I-Mutant 3.0 is trained to predict the effect of SNPs on the free energy change value (Delta Delta G: DDG) based on the sequence or tertiary structure of the protein [13,15]. Delta Delta G (DDG) is the change in Gibbs free energy, and the change in the free energy of folding can be predicted from the difference in the free energies of the folded wild-type and mutant structures [16]. I-Mutant 3.0 gives the output as a DDG value which is categorized into one of the three predictions: largely unstable (DDG < − 0.5 kcal/mol), largely stable (DDG > 0.5 kcal/mol), or neutral (− 0.5 ≤ DDG ≤ 0.5 kcal/mol). Phylogenetic conservation analysis of nsSNPs by ConSurf The extent of evolutionary rate of conservation of amino acids in β-defensin 1 was analyzed using the ConSurf web server (http://consurf.tau.ac.il/). ConSurf calculates conservation scores of the amino acid position of the protein based on the phylogenetic relation between homologous sequences [17,18]. ConSurf predicts conservation scores ranging from 1 to 9 with grade 9 being the most conserved position.

Gene-gene interaction
Following the identification of several disease-associated polymorphisms by whole-genome association analysis, there is an increasing interest in the detection of the effects of polymorphism due to interaction with other genetic factors [20]. GeneMANIA (http://www.genemania.org) predicts gene function and generates information such as gene co-expression, co-localization, shared protein domains, and pathway involved [21]. The gene-gene interaction network of the DEFB1 gene was predicted by GeneMANIA.

Prediction of protein structure and mutant analysis by the Project HOPE software tool
Have (y)Our Protein Explained (HOPE) is a web-based approach for the automated analysis of mutants and analyzes the consequence of point mutations on the structural conformation and function of proteins [22]. HOPE gathers information from various sources including annotated sequence data from the UniProt database and builds an extensive report about the effect of mutation with text, figures, and animations.

Retrieval of nsSNPs from the NCBI SNP database
The nsSNPs of the DEFB1 gene systematically examined in this study were retrieved from the NCBI SNP database. A total of 4024 SNPs were reported in the human DEFB1 gene in the database, and among the SNPs reported, 86 were missense SNPs, 32 were synonymous, 45 SNPs in 5′ untranslated region (UTR) region, 49 SNPs in 3′ UTR, 2866 were intronic, and the rest were upstream and downstream transcripts. The missense nsSNPs were selected for our investigation since deleterious nsSNPs could have structural and functional impact on the protein.

Prediction of deleterious nsSNPs
First, 86 missense SNPs were submitted to the SIFT software tool which predicted that 18 nsSNPs were deleterious with score ≤ 0.05 and remaining were tolerant. Next, nsSNPs were subjected to PolyPhen v2 analysis. The results of both SIFT and PolyPhen v2 tools were combined to increase the accuracy of prediction, and only SNPs with SIFT score ≤ 0.05 and PolyPhen score > 0.90 were selected. Ten nsSNPs were identified by both tools as deleterious. The selected nsSNPs were submitted to other bioinformatics tools-PROVEAN, SNAP, PhD-SNP, and SNPs&GO. The results obtained from these tools are shown in Table 1. Eight nsSNPs were determined to be deleterious by all in silico tools mentioned above: rs1800968, rs140503947, rs145468425, rs146603349, rs199581284, rs201260899, rs371897938, rs376876621. rs55874920 was found to be deleterious by SIFT, PolyPhen v2, PROVEAN, and SNAP. rs56270143 was found to be deleterious by SIFT, PolyPhen v2, PRO-VEAN, SNAP, and PhD-SNP. These SNPs potentially predicted as deleterious were taken into consideration and further analyzed by other tools. Prediction of effects of nsSNPs on protein stabilization by I-Mutant 3.0 I-Mutant server was used to analyze whether the selected missense nsSNPs increase or decrease the stability of β-defensin 1 protein. According to I-Mutant server, nsSNPs rs140503947 and rs146603349 showed a DDG value of less than − 0.5, which indicated that they were largely unstable and decreased protein stability. The results of I-Mutant server are shown in Table 2.

Evolutionary conservation analysis by ConSurf
Specific positions of amino acids that are essential for the correct function of a protein change more slowly than other residues (i.e., are evolutionarily conserved), and deleterious mutations are more frequent in conserved positions [23]. Selected nsSNPs analyzed with ConSurf revealed that they were positioned in the conserved regions, and the tool predicted that C67S, G57R, P50Q, C44R, C44Y, and C59Y were highly conserved. ConSurf also indicated that the selected nsSNPs have structural and functional effects on βdefensin 1 protein. C67S, T58S, and P50Q were predicted as functional residues, and G57R, C44R, C44Y, and C59Y were predicted as structural residues. The results are shown in Table 3.

Prediction of post-translational modification by ModPred
PTMs play a significant role in protein folding and degradation, regulation of gene expression, and in several biological pathways. An extensive analysis between SNPs, PTMs, and diseases are necessary as SNPs may induce PTMs, and determining disease-related nsSNPs associated with PTM sites can help in better interpretation of diseases [24]. PTMs mediated by nsSNP were analyzed using ModPred server. According to ModPred, disulfide linkage sites were predicted at C67S, C44R, and C59Y residues, and the results are shown in Table 3.

Gene-gene interaction
It has become highly important to predict genes with specific DNA sequence polymorphisms, each with combinations of wild-type and variant alleles and genotypes that impact susceptibility to a disease mainly through interactions with genetic and environmental factors [25]. GeneMANIA constructs a composite gene-gene functional interaction network, and the interaction network of the DEFB1gene as predicted by GeneMANIA is shown in Fig. 1.

Prediction of the effects of amino acid changes on βdefensin 1 protein from Project Hope
The Project HOPE software showed how the wild-type and variant amino acids contradicted from each other in terms of their physicochemical properties such as hydrophobicity, charge, and size. Predictions regarding the differences in the properties of the amino acids and the effect of polymorphic amino acids on the domain and conservation obtained from the Project HOPE software are given in Table 4. Results of Project HOPE for the 3D predictions of β-defensin 1 protein (wild and variant) are shown in Table 5.

Discussion
SNPs can act as markers in pharmacokinetics predicting responsiveness to treatment therapies and play a role in disease prognosis. It can also help in facilitating more tailored personalized treatment to patients and improve medication strategies. By studying the effects generated by functional coding SNPs in disease-associated proteins, new compounds can be designed for correcting the effects of those mutations in the population [26]. Several in silico studies on polymorphisms to predict nsSNPs as deleterious or neutral have been carried out. Studies have been carried out on different genes such as tumor necrosis factor-α (TNF-α) [27], ATP-binding cassette sub-family B member 1 (ABCB1) [28], B cell lymphoma/ leukemia 11A (BCL11A) [29], and interleukin 27 [30]. In this study, various computational approaches were used to predict the deleterious nsSNPs of β-defensin 1. First, this work tested the missense nsSNPs of the DEFB1 gene using SIFT and PolyPhen. The nsSNPs which were considered as deleterious by both these tools were then analyzed using PROVEAN, SNAP, PHD-SNP, and SNP&GO. By comparing the scores of these methods, ten nsSNPs were found to be deleterious-rs1800968 (C67S), rs55874920 (T58S), rs56270143 (G62W), rs140503947 (Y35C), rs145468425 (G57R), rs146603349 (P50Q), rs199581284 (C44R), rs201260899 (C44Y), rs371897938 (G21R), rs376876621 (C59Y). These ten SNPs were further analyzed by I-Mutant server which determined that nsSNPs rs140503947 and rs146603349 decreased protein stability. ConSurf analysis revealed that SNPs were located in the conserved regions. Defensins are amphipathic with a net positive charge and possess disulfide bonds which are important for their molecular function [31]. ModPred results showed that SNPs affect PTMs such as disulfide linkage, amidation, and glycosylation. When disulfide linkages are affected, the native and misfolded defensins could exhibit altered biological functions affecting its antimicrobial and chemotactic activity [32]. The damaged PTMs by SNPs may affect protein's structure and functions and could be considered as biomarker candidates and drug targets for therapeutic purpose [33]. Gene-gene interaction network of the DEFB1 gene predicted by According to ConSurf server, "f" is functional residue and "s" is structural residue GeneMANIA showed that it interacts with CCR6, a chemokine receptor expressed by the immune cells such as dendritic cells and memory T cells. β-defensins recruit these immune cells to the site of microbial invasion by interacting with CCR6 [34]. The DEFB1 gene is constitutively expressed by epithelial cells of many tissues and so the depletion of its activity can increase the risk of microbial infection. Another study speculated that the Proline is considered to be rigid and produces a specific backbone conformation. This variation has the potential to alter this distinct conformation.
Closely related residues as the variant are found at this location in other similar proteins, and this polymorphism is probably not damaging.

rs199581284 C44R
As the variant is charged, it can affect folding of the protein. In the interior of the protein, hydrophobic interactions are disrupted as the wild-type amino acid is hydrophobic than the variant.
The amino acid is buried, and the variant possibly alters the core organization of this domain.
This variation is likely damaging.

rs201260899 C44Y
Variant is bigger than the wild-type and might not fit in the core of the protein.
The amino acid is buried, and the variant might disrupt the core structure of the domain.
This variation is likely damaging.

rs371897938 G21R
As the variant is a charged residue, this can give rise to repulsion problems with charged ligands or other residues.
New residue is introduced in the signal peptide and may disturb recognition of this signal peptide. Glycine is flexible compared to other residues; this flexible nature might be important for the protein's activity. Variant can destroy this function.
Other residues have been seen at this position. So, this variation might happen without being deleterious to the protein.

rs376876621 C59Y
Variant might not fit in the core as it is bigger than wild-type. Hydrophobic interactions are disturbed in the interior as the wild-type residue is more hydrophobic than the variant.
Wild-type is buried, and the variant might alter the core organization of this domain.
Only this residue type is found at this position, and this variation is likely damaging. downregulation of the DEFB1 gene in various tumors such as renal cell carcinoma, prostate cancer, and nonfunctioning pituitary adenoma is mediated by the activation of phosphatidyl-inositiol 3-kinase (PI3K)/protein kinase B (Akt)/mammalian target of rapamycin (mTOR) (PI3K-AKT-mTOR) pathway through transcriptional regulation of the DEFB1 gene [35]. PI3K signaling pathway alters PAX2 expression which in turn affects the DEFB1 gene expression as PAX2 acts as a transcriptional repressor of the DEFB1 gene [35,36]. Deleterious SNPs of the DEFB1 gene may affect the interaction and functioning of other genes in the gene-gene interaction network. Physicochemical and molecular properties of the variant residues are sometimes different from the original structure, and the conservation of the structure is important for the preservation and execution of specific functions [37]. This encouraged us to study how the molecular properties differ from wild-type and variant amino acids using Project HOPE analysis. Project HOPE determined the possible effects of nsSNPs on the various molecular properties of the amino acid residues including size, charge, and hydrophobicity and also predicted the impact of SNPs on the domain and structure of β-defensin 1 protein. The prediction of deleterious SNPs has been done through computational tools, and welldesigned experimental and clinical analyses are required to examine the effects of these nsSNPs on structure and function of β-defensin 1 protein.

Conclusion
Several web-based algorithmic tools based on sequence and structural conservation were used to identify deleterious nsSNPs of the DEFB1 gene. Using different computational tools, 86 missense nsSNPs of the DEFB1 gene were screened; out of these, ten nsSNPs were found to be potentially deleterious. The deleterious variants in the DEFB1 gene may affect its cellular function in protecting against microbial attack and also its immunomodulatory function. Use of multiple bioinformatics tools to predict pathogenic nsSNPs would be advantageous in reducing cost and time but confirmation of the roles of these nsSNPs requires experimental validation. This in silico study could form the basis for targeting pathogenic sites of β-defensin 1 protein.