In silico prediction of HBD gene variants in the Iranian population

The quantification of hemoglobin A2 (Hb A2; α2δ2) is used as a valuable test to differentiate α- and ß-thal carriers in clinical laboratories. Therefore, the HBD (δ-globin) gene variants could result in reduced levels of Hb A2 and have implications for thalassemia screening programs. The aim of the present study was to predict the consequences of HBD gene variants identified in the Iranome project. The highest number of variants was in the Persian Gulf Islanders. The variants of p.Gln132Glu (HBD: c.394C>G), p.Gly17Arg (HBD: c.49G>C), p.Thr5Ile (HBD: c.14C>T), and p.Ala28Ser (HBD: c.82G>T) presented damage results in three or more prediction tools. In addition, it seems that the p.Gly30= (HBD: c.90C>T) decreases the use of authentic splice and, instead, creates a new donor splice site (DSS) or leads to the use of a cryptic DSS. Most of these variants have been associated with a decrease in Hb A2 levels. Due to the high mutational diversity in the HBB gene in the Iranian population and the use of Hb A2 quantification to differentiate α- and ß-thal carriers among Iranian clinical laboratories, some attention should be taken to a possible co-inheritance of HBD gene variants to avoid the misdiagnosis of ß-thal carriers.

As a member of the ß-globin gene family, the HBD or δ-globin gene is located on chromosome 11. This gene is positioned on the 5′ side of the HBB gene and encodes a 147-amino acid protein that differs from ß-globin in only 10 amino acids [3]. The HbVar database (http://globin.bx. psu.edu/hbvar/) is known as a database of information about Hb variants and mutations that cause thalassemia. Although more than 120 variants have been identified in the HBD gene so far, they are far fewer in number than the number of variants reported in the HBB gene [4]. This is because the HBD gene variants are clinically "silent". Most of the HBD gene variants are missense and result in reduced levels of Hb A 2 [5].
On the other hand, Hb A 2 accounts for only a small fraction of total hemoglobin and has no known physiological role [3]. However, Hb A 2 quantification is used as a valuable test to differentiate αand ß-thal carriers in clinical laboratories; its level is normal or slightly reduced in α-thal carriers and is increased to more than 4% in ß-thal carriers. Therefore, in the populations such as Iran in which ß-thal is a serious problem in the health system [6][7][8], any factors affecting the level of Hb A 2 could have implications for thalassemia screening programs.
The Iranome database (http://www.iranome.ir/) has recorded the genomic variants found in 800 healthy individuals from eight major ethnic groups in Iran, including Arabs, Azeris, Balochs, Kurds, Lurs, Persians, Persian Gulf Islanders, and Turkmen, with 100 individuals per ethnic group [9]. In this study, we used 14 in silico prediction tools to identify the deleterious possibility of the HBD variants reported in the Iranome database.

Methods
All studies related to the identification of HBD gene variants in recent years in the Iranian population as well as all HBD gene variants identified in the Iranome project were extracted. To screen the rare variants with frequencies residing around or under 1%, the allele frequency for each variant discovered in 1000 Genomes Project and Genome Aggregation Database (gnomAD) (Available at http://grch37.ensembl.org/Homo_sapiens/ Info/Index) was used as the reference. In addition, variants were checked for previously reported in the Single Nucleotide Polymorphism database (dbSNP) (https:// www.ncbi.nlm.nih.gov/snp/), ITHANET web portal (https:// www.ithanet.eu/), HbVar database, as well as in the literature. The ClinVar database (https://www.ncbi.nlm.nih.gov/ clinvar/) was used to search for known variants along with their clinical significances.
Here, we used VarSome database (http://varsome.com) for the interpretation of sequence variants [19]. This database uses ACMG standards and guidelines for interpretation [20]. The gene reference sequence was NG_063112.2. The NM_ 000519.4 was used to determine the variant position. Position of the variants in protein was determined based on Uni-ProtKB/SwissProt P02042.

Results
Among the 800 healthy individuals studied in the Iranome project, HBD gene variants have been reported in 46 individuals from different ethnicities. Accordingly, the highest number of variants was in the Persian Gulf Islanders, followed by Balochs, Lurs, Kurds, Persians, Turkmen, Azeris, and Arabs, respectively (Table 1). According to NM_000519.4 reference transcript, a total of 16 different single nucleotide variations (SNVs), including seven exonic and nine intronic variants, were identified in the HBD gene. All variants were single nucleotide substitutions, and no insertion or deletion variants were observed. In addition, all variants were detected in heterozygous states (Table 1).
Except for HBD: c.315+55G>T and HBD: c.92+ 43A>G, the other intronic variants were previously recorded in dbSNP. However, none of the intronic variants were reported in the ITHANET, HbVar, and ClinVar databases (Table 1). Also, analysis on the MutationTaster, FATHMM-XF, PhD-SNP g (except for HBD: c.315+ 199A>G), and CADD tools showed that all intronic variants were in the category of benign/polymorphisms. In addition, based on the VarSEAK, MaxEntScan, Net-Gene2, and NNSplice tools, none of these variants had effect on splicing events ( Table 2).
Exonic variants were divided into two groups, synonymous SNVs and nonsynonymous SNVs, which accounted for two and five variants, respectively (Fig. 1). The nonsynonymous variants were p.Gln132Glu (HBD: c.394C>G), p.Asp53Glu (HBD: c.159 T>G), p.Gly17Arg (HBD: c.49G>C), p.Thr5Ile (HBD: c.14C>T), and p.Ala28Ser (HBD: c.82G>T). Although p.Asp53Glu (HBD: c.159 T>G) showed neutral results in all 14 tools, the other four nonsynonymous SNVs presented damage results in three or more prediction methods ( Table 3). The analysis of HBD: c.82G>T variant on the splice site prediction tools revealed that the replacement of guanine with thymine at position c.82 activates a cryptic donor splice site (DSS) at c.78. This new splice site was much stronger than the authentic splice site at c.92+1 position (Fig. 2a).
None of the two synonymous SNVs, p.Gly30= (HBD: c.90C>T) and p.His98= (HBD: c.294C>T), were found in both ITHANET and HbVar databases (Table 1). Unlike the HBD: c.294C>T variant, HBD: c.90C>T had deleterious results on MutationTaster, CADD, and PhD-SNP g web tools (Table 3). In addition, based on the splice site prediction tools, the replacement of cytosine with thymine at position 90 decreases the score for the use of authentic DSS at c.92+1 and, instead, creates a new DSS at c.89 position or leads to the use of a cryptic splice site located at 16 nt upstream of authentic DSS (Fig. 2b).

Discussion
More than 1.5 million variants have been identified in the genomes of individuals studied in the Iranome project [9]. Using 14 prediction tools, we evaluated a number of 16 HBD gene variants reported in the Iranome database (Table 1). Based on the ACMG guidelines, none of these variants were categorized as pathogenic or likely pathogenic (Tables 2 and 3).
Zhang et al. [21] reported p.Gln132Glu (HBD: c.394C>G) as a novel δ-globin variant in a healthy Chinese 35-year-old man in 2019 and named it Hb A2-Puer. The hematological and electrophoretic data related to this Hb variant in heterozygous state were as follows: Hb (g/dL) 16.1, MCV (fL) 85.2, MCH (pg) 29.0, Hb A (%) 97.4, Hb A 2 (%) 1.3, and Hb X (%) 1.4. Our analysis showed that HBD: c.394C>G is a variant with deleterious effects on MutationTaster, FATHMM-XF, I-Mutant disease, and PhD-SNP g , and neutral/benign on SIFT, PROVEAN, PolyPhen-2, SNPs&GO, VarSEAK, and Pmut prediction tools. In addition, with a score of 18.06 in the CADD web tool, this variant could not get the phred-like score cutoff at 20 to locate in the top 1% probability of being deleterious. On the other hand, the same variant has been reported on the HBB gene (Hb Camden: HBB: c.394C>G) with conflicting interpretations of pathogenicity, from silent or likely benign/uncertain significance in the literature [22] or in the ClinVar database, respectively, to causative in the ITHANET database. Finally, based on the ACMG guidelines, the HBD: c.394C>G variant was classified as a variant of uncertain significance (VUS) [20]. According to the Iranome database, the HBD: c.394C>G variant has been observed in a Kurdish healthy individual in heterozygous form. Since the only report of this variant in the literature is related to Zhang et al.'s [21] study, it can be assumed that the present study is the second one to report this variant in the world and the first study to annotate it in Iran.
The HBD: c.49G>C variant, also known as Hb A2′ or Hb B2, is one of the most common δ-globin gene variants, mainly found in Black families and occurs in nearly 1% of African-Americans [1]. The same substitution was found on the HBB and HBG2 genes [Hb D-Bushman (HBB: c.49G>C) and Hb F-Melbourne (HBG2: c.49G>C), respectively]. The Hb A2-Yialousa (HBD: c.82G>T) and HBD: c.14C>T had also been reported in the Iranome database. The Hb A2-Yialousa has been shown to be the  most common HBD gene variant in the Mediterranean area [1] as well as in Iran [23], and a rare variant in China [24]. It seems that the replacement of guanine with thymine at position c.82 activates a cryptic DSS at the upstream (Fig. 2a). The HBD: c.14C>T variant was first identified in Greek Cypriots [25]. The frequency of this variant is low and has been reported in other populations such as Oman [26] and China [24]. All of these variants have been associated with a decrease in Hb A 2 levels [23][24][25][26]. Based on our analyses, these variants presented damage results in three or more prediction methods (Table 3). In addition, according to ACMG guidelines, they were classified as likely benign. Therefore, the pathogenicity or neutrality of them remains unknown. Another missense variant, HBD: c.159 T>G, was previously recorded in the dbSNP (rs757106601). However, there was no information associated to this variant in the literature, as well as in the ClinVar, ITHANET, and HbVar databases (Table 1). In addition, the same variant has not been reported on the HBB gene. Our analysis showed neutral results related to this variant in all 14  prediction tools used in the present study and classified as VUS (Table 3). Two synonymous HBD gene variants had been reported in the Iranome database (Tables 1 and 3). The p.Gly30= (HBD: c.90C>T) variant had deleterious results on MutationTaster, CADD, and PhD-SNP g web tools (Table 3). It seems that the replacement of cytosine with thymine at position 90 decreases the score for the use of authentic DSS and, instead, creates a new DSS or leads to the use of a cryptic DSS (Fig. 2b). The same variant was found on the HBB gene, p.Gly30= (HBB: c.90C>T), which has been categorized as a pathogenic variant in the ClinVar database. According to the ACMG guidelines, HBD: c.90C>T was classified as VUS (Table 3). This variant was reported for the first time in the Iranome database in a healthy individual with Arab ethnicity. To the best of our knowledge, there is no report of this variant in the literature so far.
Thalassemia is a serious health problem in the Iranian population. Numerous studies conducted in Iran have shown a high mutational diversity for αand ß-thal diseases [27][28][29][30][31][32]. However, to our knowledge, only a limited number of studies have been performed in Iran to identify the HBD gene variants. In fact, with the exception of the study by Kordafshari et al. [23], which reported the spectrum of HBD gene variants in 21 individuals, other studies were case reports that identified a specific variant in one or a limited number of individuals [33][34][35]. Therefore, at least six different types of variants

Conclusions
Given the small number of studies performed on the HBD gene in Iran and the fact that the HBD gene variants are clinically "silent," it can be assumed that the spectrum of variants of this gene in our population is much wider. Evidence of this claim is revealed by the HBD gene variants identified in individuals who participated in the Iranome project. Out of the 16 HBD gene variants reported in the Iranome database, five variants including HBD: c.394C>G, HBD: c.49G>C, HBD: c.82G>T, HBD: c.14C>T, and HBD: c.90C>T showed the potential of deleterious effects in the present study. All of these variants, except for HBD: c.90C>T as a novel variant, have been associated with a decrease in Hb A 2 levels. Due to the high mutational diversity in the HBB gene in the Iranian population and the use of Hb A 2 quantification to differentiate αand ß-thal carriers among Iranian clinical laboratories, some attention should be taken to a possible co-inheritance of HBD gene variants to avoid the misdiagnosis of β-thal carriers.