Availability of data and materials
URL links of supplementary files are available in Additional file 1.
Egyptian Journal of Medical Human Genetics volume 21, Article number: 35 (2020)
The 2019-nCoV is a novel SARS coronavirus which was first isolated from three individuals having pneumonia with connection to the Wuhan epidemic of the severe respiratory illness . The 2019-nCoV shares a close relationship with the original SARS-CoV, and it is believed to exhibit a zoonotic property. Genomic analysis of the virus has shown that it clusters genetically with the Beta coronavirus genus, alongside two other strains derived from bat. It shares a 96% identity with other bat coronavirus samples (Bat Cov RaTG 13) at the whole genome level. Chinese researchers in February 2020 discovered the amino acid difference in specific parts of the human and pangolin virus genome sequences, however, whole-genome comparison between the pangolin coronavirus, and the 2019-nCoV found a maximum of 92% identical genetic materials, which has so far not been sufficient enough to confirm pangolins to be the viral intermediate host .
Vaccines have been produced to target several animal coronavirus diseases, which includes the canine coronavirus, the infectious bronchitis virus of birds, and feline coronavirus. Previous efforts aimed at the development of antiviral vaccines for the Coronaviridae family that majorly affects humans that have been targeted at the Middle East respiratory syndrome and severe acute respiratory syndrome coronavirus. The MERS and SARS vaccines have been tried in animal models and up till February 2020, there has been no cure or protective vaccine that has exhibited safety and efficacy in humans .
The historical immunotherapy consensus has been about the targeting of easily accessible antibody-binding extracellular antigens only. The reason for this is because the antibodies which are of higher molecular weight stop the antigens from gaining access to their intracellular targets through the crossing of the cell membrane. In consistence with this thought train, approved therapeutic antibody targets are mostly extracellular antigens . Three broad approaches more recently have been used in intracellular antigen targeting. It is not impossible for normally intracellular antigens that become externalized to be targeted by antibodies or their derivatives in a disease state. It is also not impossible to engineer cell-penetrating antibodies or fragments of antibodies and even antibodies whose expression is intracellular, with the aid of gene therapy. Finally, cell surface MHC-I-binding antibodies can be generated (major histocompatibility complex class I) .
With reference to previous virus related in-silico vaccine design studies [6, 7], we designed a new potential vaccine candidate using the main proteinase of the 2019-nCoV as the target protein. The viral main proteinase coding sequence was mapped out from its full genome which has been made accessible for the public in the database of Genbank (https://www.ncbi.nlm.nih.gov/nuccore/MN908947.3?report=fasta) with the accession number “MN908947.3” (Additional file 1). The sequence which ranges from the 10055 to 10972 nucleotides of the viral genome was translated, and the amino acid sequence was used in the 3D structural homology protein model prediction. A total of 120 templates were found, and an initial HHblits profile was designed by making use of the outlined procedure in Remmert et al. . In the vaccine development process, we engaged the BCEPred which predicts the antigenic region of proteins based on individual or combination of different physico-chemical properties (flexibility/mobility, polarity, hydrophilicity, turns, accessibility, and exposed surface). Observations has been made as regarding the combination of these properties which showed that combining two or more confers a better accuracy when compared to a single property. Previous studies have revealed that the combination of the flexibility, hydrophilicity, exposed surface, and polarity properties of proteins produces a better performance on comparison to any other combination at a 2.38 threshold . We therefore selected these properties in our B-cell epitope prediction process. The resulting peptide with the best epitope properties is a sequence of 15 amino acids (92-DTANPKTPKYKFVRI-106) which gave the highest epitope value of 3.053 (Fig. 1).
We went further to confirm the potential of the predicted B-cell epitope in generating high affinity antibodies through T-cell epitope prediction. This was achieved using the SYFPEITHI prediction server (database for MHC ligands and peptide motifs) . This tool gives room for the detection of the ligation strength to a defined HLA type for a sequence of amino acids. The algorithms used are based on the book “MHC Ligands and Peptide Motifs.” The probability of being processed and presented is given in order to predict T-cell epitopes. The predicted T-cell epitope with the highest score is a nonamer which covers the second amino acid to the tenth (TANPKTPKY). This prediction was validated using the IEDB analysis resource consensus tool, which is another T-cell epitope prediction server . The HLA class II binding regions of the antigenic sequence were predicted using the HLAPred server , which allows the identification of peptides that can bind with both the HLA class I and class II from the antigenic sequence. The HLAPred output shows the HLA class II prediction according to four selected alleles in an HTML mapping display format (Fig. 2). The 104-VRI-106 segment of the B-cell epitope was predicted to be a promiscuous binder as shown in Fig. 2. The promiscuous binding regions are those which bind with many HLA alleles.
Viral internalization greatly depends on glycosylation sites present on the viral protein. N-glycosylation sites on the 2019-nCoV main proteinase were therefore predicted using the NetNGlyc 1.0 prediction tool (Fig. 3) . The graph illustrates predicted N-glycosylation sites across the protein chain where the x-axis represents protein length from the amino terminal to the carboxyl terminal. The position with a potential (the green vertical lines) crossing the threshold (red horizontal line) is predicted glycosylated.
Sequences and structural motifs in polypeptide chains that classified and determined by comparative analysis make up the protein’s conserved domain. These domains are used in molecular evolution as building blocks which may undergo different forms of arrangements and recombination to produce proteins with varying functions. The importance of the conserved domains as evolutionary elements has led us into determining the level of conservation of the 2019-nCoV main proteinase epitope region. This was achieved using the conserved domain database (CDD) . The conserved region of the protein covers the 29th amino acid of the sequence to the last, with the inclusion of the predicted epitope sequence (Fig. 4).
The physiochemical properties of the final peptide as predicted by the Expasy ProtParam server  predicted a molecular weight of 1778.08 Da with a theoretical pI of 10 indicating an alkaline protein. The half-life assessment was predicted to be 30 h in vitro in mammalian reticulocytes, > 20 h in yeast, and > 10 h in vivo in E. coli. The aliphatic index estimation predicted a score of 52.00, which indicates thermostability. The predicted GRAVY score was − 0.407, indicating a hydrophilic protein which is consistent with the 3D structural view of the protein (Fig. 5).
The secondary structures of the protein antigenic region as viewed in the Pymol molecular visualizer showed a loop dominated peptide with no helices (Fig. 5). We hereby recommend this peptide for further in vitro and in vivo studies as our in silico study that has predicted this region of the 2019-nCoV main proteinase as a potential B-cell epitope for a potent vaccine design against the virus.
URL links of supplementary files are available in Additional file 1.
2019 novel coronavirus
Severe acute respiratory syndrome coronavirus
Middle East respiratory syndrome
Severe acute respiratory syndrome
Major histocompatibility complex
Human leukocyte antigen
Wang, C. Horby, P. W. Hayden, F. G. and Gao, G. F. (2020). A novel coronavirus outbreak of global health concern. Lancet. 395 (10223): 470–473. doi:https://doi.org/10.1016/S0140-6736(20)30185-9. PMID 31986257
Chen WH, Strych U, Hotez PJ, Bottazzi ME (2020) The SARS-CoV-2 vaccine pipeline: an overview. Current Tropical Medicine Reports. https://doi.org/10.1007/s40475-020-00201-6
Abramson RG. Overview of targeted therapies for cancer. My Cancer Genome; (2017). Available from: https://www.mycancergenome.org/content/molecular-medicine/overview-of-targeted-therapies-for-cancer/ [Google Scholar]
Iva Trenevska, Demin Li, Alison H. Banham. Therapeutic antibodies against intracellular tumor antigens. Front Immunol. 2017; 8: 1001. Published online 2017 Aug 18. doi: https://doi.org/10.3389/fimmu.2017.01001 PMCID: PMC5563323
Dash R, Das R, Junaid M, Akash MFC, Islam A, Hosen SZ (2017) In silico-based vaccine design against Ebola virus glycoprotein. Advances and Applications in Bioinformatics and Chemistry 10:11–28. https://doi.org/10.2147/aabc.s115859
Shey, R.A., Ghogomu, S.M., Esoh, K.K. (2019). In-silico design of a multi-epitope vaccine candidate against onchocerciasis and related filarial diseases. Science Reports,9: 4409 . https://doi.org/https://doi.org/10.1038/s41598-019-40833-x
Remmert M, Biegert A, Hauser A, Söding JH (2012) lightning-fast iterative protein sequence searching by HMM-HMM alignment. Natural Methods 9:173–175
Saha.S and Raghava G.P.S. (2004). BcePred: Prediction of Continuous B-Cell epitopes in antigenic sequences using physico-chemical properties. In G.Nicosia, V.Cutello, P.J. Bentley and J.Timis (Eds.) ICARIS 2004, LNCS. Springer, 3239, 197-204.
Rammensee H, Bachmann J, Emmerich N et al (1999) SYFPEITHI: database for MHC ligands and peptide motifs. Immunogenetics 50:213–219
Kim Y, Ponomarenko J, Zhu Z, Tamang D, Wang P, Greenbaum J, Lundegaard C, Sette A, Lund O, Bourne PE, Nielsen M, Peters B. 2012. Immune epitope database analysis resource. NAR.
Kobayashi H, Wood M, Song Y, Appella E, Celis E (2000) Defining promiscuous MHC class II helper T-cell epitopes for the HER2/neu tumor antigen. Cancer Res. 60:5228–5236
Larsen MV, Lundegaard C, Lamberth K, Buus S, Lund O, Nielsen M (2007) Large-scale validation of methods for cytotoxic T-lymphocyte epitope prediction. BMC Bioinformatics. 8:424
Marchler-Bauer A et al (2017) CDD/SPARCLE: functional classification of proteins via subfamily domain architectures. Nucleic Acids Res. 45(D):200–203
Gasteiger E., Hoogland C., Gattiker A., Duvaud S., Wilkins M.R., Appel R.D., Bairoch A. (2005). Protein identification and analysis tools on the ExPASy server; (In) John M. Walker (ed): The Proteomics Protocols Handbook, Humana Press.pp. 571-607
We appreciate the leadership of the Laboratory of Cellular Dynamics (LCD), University of Science and Technology of China for the all-round support and academic advisory role. We also acknowledge the strong support from the USTC Office of International Cooperation all through the challenging period of the coronavirus epidemic.
Authors received no funding for this project from any organization.
Authors declare no competing interest.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Durojaye, O.A., Mushiana, T., Cosmas, S. et al. An in silico epitope-based peptide vaccine design against the 2019-nCoV. Egypt J Med Hum Genet 21, 35 (2020). https://doi.org/10.1186/s43042-020-00071-7