Selected amino acid mutations in HIV-1 B subtype gp41 are Associated with Specific gp120V3 signatures in the regulation of Co-Receptor usage

Background The third variable loop (V3) of the HIV-1 gp120 surface protein is a major determinant of cellular co-receptor binding. However, HIV-1 can also modulate its tropism through other regions in gp120, such as V1, V2 and C4 regions, as well as in the gp41 protein. Moreover, specific changes in gp41 are likely to be responsible for of damage in gp120-CCR5 interactions, resulting in potential resistance to CCR5 inhibitors. In order to genetically characterize the two envelope viral proteins in terms of co-receptor usage, we have analyzed 526 full-length env sequences derived from HIV-1 subtype-B infected individuals, from our and public (Los Alamos) databases. The co-receptor usage was predicted by the analysis of V3 sequences using Geno2Pheno (G2P) algorithm. The binomial correlation phi coefficient was used to assess covariation among gp120V3 and gp41 mutations; subsequently the average linkage hierarchical agglomerative clustering was performed. Results According to G2P false positive rate (FPR) values, among 526 env-sequences analyzed, we further characterized 196 sequences: 105 with FPR <5% and 91 with FPR >70%, for X4-using and R5-using viruses, respectively. Beyond the classical signatures at 11/25 V3 positions (S11S and E25D, R5-tropic viruses; S11KR and E25KRQ, X4-tropic viruses), other specific V3 and gp41 mutations were found statistically associated with the co-receptor usage. Almost all of these specific gp41 positions are exposed on the surface of the glycoprotein. By the covariation analysis, we found several statistically significant associations between V3 and gp41 mutations, especially in the context of CXCR4 viruses. The topology of the dendrogram showed the existence of a cluster associated with R5-usage involving E25DV3, S11SV3, T22AV3, S129DQgp41 and A96Ngp41 signatures (bootstrap = 0.88). Conversely, a large cluster was found associated with X4-usage involving T8IV3, S11KRV3, F20IVYV3, G24EKRV3, E25KRV3, Q32KRV3, A30Tgp41, A189Sgp41, N195Kgp41 and L210Pgp41 mutations (bootstrap = 0.84). Conclusions Our results show that gp120V3 and several specific amino acid changes in gp41 are associated together with CXCR4 and/or CCR5 usage. These findings implement previous observations that determinants of tropism may reside outside the V3-loop, even in the gp41. Further studies will be needed to confirm the degree to which these gp41 mutations contribute directly to co-receptor use.


Background
Human immunodeficiency virus type 1 (HIV-1) entry into the host cell is mediated by the viral mature envelope (env) glycoproteins, gp120 and gp41, that constitute a trimeric complex anchored on the virion surface by the membrane-spanning segments of gp41 [1][2][3][4]. The gp120 exterior glycoprotein is retained on the trimer via labile, noncovalent interactions with the gp41 ectodomain [5], and it must be flexible to allow correct conformational modifications. The initial binding of gp120 to the cellular CD4 receptor indeed triggers conformational changes in gp120 that promote its following interaction with one of the chemokine co-receptors, usually CCR5 or CXCR4 [6][7][8][9][10][11][12][13]. This binding also induces the arrest of the transmembrane gp41 transitions at a prehairpin intermediate stage that leads to the insertion of the fusion peptide into the target cell membrane and ultimately to virus-cell fusion activity [14,15]. Multiple intermolecular contacts are required to maintain trimer integrity in gp120: the C1 and C5 region in gp120 are thought to be a provider to the gp120/gp41 interface and to the disulfide bond loop region of gp41, respectively [5,[16][17][18].
Based on the V3 location of the main genetic coreceptor usage determinants, the genotypic approaches for the tropism determination are so far based on sequencing and analyzing the V3 loop of gp120 with different algorithms available online [46,47].
Therefore, due to the above mentioned reasons, the present investigation aims to genetically characterize HIV-1 B-subtype env sequences in terms of co-receptor usage and to define the association of mutations within the gp120 V3-region and the gp41 protein according to CCR5 and/or CXCR4 usage. For this purpose, we analyzed 526 HIV-1 subtype-B env sequences, only viral isolates from single patient, mostly retrieved from the Los Alamos database.

Sequence analysis
The analysis included 526 HIV-1 subtype-B env fulllength sequences, partially retrieved from our database (from 33 HIV-positive patients receiving highly active antiretroviral therapy), and the majority from the Los Alamos database [58]from 493 infected individuals at all stages of infection, with one isolate per single patient [58]. Sequences available with pure phenotype and/or co-receptor determinations have been considered, while molecular clone and dual-mix viruses have not been used. Published env consensus sequences of pure HIV-1 (A, B, C, D, F1, F2, G, H, J, and K) were used as reference for each subtypes [58], and multiple sequence alignments of V3 and gp41 segments were performed by using ClustalX [59] and were manually edited with the Bioedit software [60].

V3 and gp41 sequencing
The sequencing of the V3 gp120 region and the entire gp41 was performed on 33 plasma samples, as described elsewhere [61,62]. In brief, for gp41 sequencing, RNA was extracted, retrotranscribed, and amplified by use of 2 different sequence-specific primers. Gp41-amplified products were full-length sequenced in sense and antisense orientations by use of 8 different overlapping sequence specific primers for an automated sequencer (ABI 3100; Applied Biosystems). Sequences with a mixture of wild-type and mutant residues at single positions were determined to have the mutant(s) at that position. Nucleotide sequences were previously submitted to Genbank [63].
Subtypes were assessed by the construction of phylogenetic trees generated with the Kimura 2-parameter model. The statistical robustness within each phylogenetic tree was confirmed with a bootstrap analysis using 1000 replicates.

Tropism prediction
Within all 526 gp160-sequences, the V3 region was extrapolated and submitted for tropism prediction to Geno2-Pheno algorithm. Geno2Pheno [46] is a bioinformatics tool based on support vector machines. Beyond tropism prediction, it assigns to each V3 sequence a score, called false positive rate (FPR), ranging from 0% to 100%, which represents the probability for a sequence to belong to an R5-virus. According to FPR values, we selected sequences with FPR < 5% (indicating a strong X4 prediction) and sequences with FPR > 70% (indicating a strong R5 prediction) for X4-tropic and R5-tropic viruses, respectively. These sequences, together with the related gp41sequences, were then used for the entire study.

Statistical analysis
To analyze gp41 and V3 mutations, we calculated the frequency of all mutations in the 345 gp41 amino acids and 35 V3 amino acids, using the env selected sequences. Fisher exact tests were used to determine whether the differences in frequency between the 2 groups of patients were statistically significant (sequences with strong R5 and X4 prediction, respectively).
The Benjamini-Hochberg method has been used to identify results that were statistically significant in the presence of multiple-hypothesis testing [64]. A false discovery rate of 0.05 was used to determine statistical significance.
To identify significant patterns of pairwise associations between V3 and gp41 mutations, we calculated the coefficient and its statistical significance for each pair of mutations. A positive and statistically significant correlation between mutations at two specific positions (0 < < 1; P ≤ 0.05) indicates that the latter mutates in a correlated manner in order to confer an advantage in terms of co-receptor selection and that the co-occurrence of these mutations is not due to chance. Moreover, to analyze the covariation structure of mutations in more detail, we performed average linkage hierarchical agglomerative clustering, as described elsewhere [63,65]. Mann-Whitney U tests have been used to assess statistically significant differences among all the pairwise mutations associated. Statistical tests have been corrected for multiple-hypothesis testing by using the Benjamini-Hochberg method at a false discovery rate of 0.05 [64].

Prevalence of mutations
The study included 526 HIV-1 subtype-B env sequences, with the majority retrieved from the Los Alamos database. The V3 region was extrapolated from these gp160sequences and submitted to the Geno2Pheno algorithm for tropism prediction.
Based on the FPR values, we selected 105 V3 sequences with FPR < 5% and 91 sequences with FPR > 70%, for their X4-using and R5-using co-receptor, respectively. These 196 sequences, together with the related gp41sequences, were then used for the rest of the study.
As a first analysis, we confirmed in our dataset that the classical V3 positions 11 and 25 (consistent with previous observations [66][67][68]), wild-type amino acid at position 11, S11S, and E25D mutation were significantly associated with R5-tropic viruses, while mutations S11KR and E25KRQ were significantly associated with CXCR4 co-receptor usage (Figure 1a).
Since networks of V3 mutations are variable and complex, positions 11 and 25 are not sufficient to provide a full understanding of the mechanisms underlying different co-receptor usage. For example, it has been demonstrated that CCR5 interacts with the conserved V3 region encompassing the residues 4 to 7 (P4-N5-N6-N7) and the binding of this co-receptor is blocked when N7 is replaced by charged amino acid [30]. In our dataset, the mutation N7K has been found only in X4-predicted viruses (prevalence 9.5%; P = 0.002) (Figure 1a).
Interestingly, the majority of these V3 mutations found associated with the co-receptor usage were also recently found by our group as being involved in mechanisms underlying different co-receptor usage, using a completely different approach and dataset of isolates [68]. Frequencies of HIV-1 gp120 V3 and gp41 mutations. Frequencies of gp120 V3 (panel "a") and gp41 (panel "b") mutations in HIV-1 R5tropic isolates with FPR > 70% by Geno2Pheno-algorithm prediction (dark grey) and HIV-1 X4-tropic isolates with FPR < 5% by Geno2Phenoalgorithm prediction (light grey). Statistically significant differences were assessed by chi-square tests of independence. P values were significant at a false-discovery rate of 0.05 following correction for multiple tests. *, P < 0.05; **, P ≤ 0.01; ***, P ≤ 0.001. In addition, it is important to note that the selected dataset of sequences used in this study is small compared to the total number of sequences available in the Los Alamos database; we also analyzed a different dataset of sequences with known phenotyping determination, composed by 326 and 91 V3-sequences (one HIV-1 B-subtype sequence/patient), with non-syncytiuminducing (NSI)-and syncytium-inducing (SI)-information, respectively.
Almost all statistically significant associations among V3 mutations and tropism found previously in the study were confirmed with this new analysis. The classical R5tropic determinants S11S and E25D were found with high prevalence in NSI-sequences (73.6% and 64.1%, respectively, versus 34% and 11%, respectively, in SIsequences; P < 0.05), while the classical X4-tropic mutations S11KR and E25KRQ were found with high prevalence in SI-sequences (40.6% and 51.6%, respectively, versus 2% and 11%, respectively, in NSI-sequences; P < 0.05). Moreover, the novel identified V3 mutations T22A in the R5-predicted viruses, and I12V, A19V, Y21H and H34Y in the X4-predicted viruses were also confirmed (P < 0.05).
The high variability of the V3 loop found in our study should not be surprising, since positive selection has been implicated in the maintenance of such diversity, in individuals as well as at the population level and in coreceptor selection [68][69][70][71][72]. It is likely that the principal driving force in the evolution of the V3 region of HIV-1 is the cell receptor usage, the escape from host immune response, or a combination of the two [73,74].
Conversely, we identified 13 mutations whose prevalence was significantly higher in X4-than in R5-viruses, suggesting their association with the CXCR4-usage. Among them, 5 mutations had a prevalence > 10% in X4-predicted viruses (V69I, A96T, S129N, D163N and A189S) (Figure 1b). Several gp41 residues associated with different coreceptor-usage reside within the Heptad Repeat 1 and 2 (HR1 and HR2) (A30, L34, Q52, D125, N126, S129, L134, N140, N141 and Q142), in the cluster I epitope transiently exposed during fusion (V69), and in the tryptophan-rich membrane-proximal external region (MPER) (D153 and D163). All these positions are localized in gp41 ectodomain known to be immunodominant and to induce high-titer antibodies in the majority of HIV-1-infected individuals [75][76][77][78][79][80][81]. The fact that all these mutations are localized in the extracellular domain of gp41 is consistent with the idea that gp41 may act as a scaffold in order to maintain the stability of the gp120/gp41 complex, and therefore finally influencing the viral tropism as well, directly or indirectly.

Association among mutations
By the analysis of associations between mutations, for the first time we found specific and statistically-significant correlations between V3 and gp41 mutations. In particular, several associations among mutations were associated with the CXCR4 prediction. An exception was represented by the A96N gp41 mutation that was positively correlated with T22A V3 ( = 0.22; P = 0.030; both associated with CCR5-usage) and negatively correlated with the known S11KR mutations ( = -0.17; P = 0.018). The A96N gp41 mutation is specifically localized in gp41 ectodomain and in particular within the cluster-I, that is a gp41 immunodominant loop involved in the interactions with gp120 [16,18,[82][83][84][85].
Regarding the positive correlations between V3 and gp41 mutations associated with CXCR4-usage, several were localized in the gp41 ectodomain (Table 1). In particular, a strong correlation was observed for A30T gp41 with either F20IVY V3 ( = 0.38; P = 0.001) or E25KRQ V3 ( = 0.29; P = 0.006) ( Table 1). Of note, F20IVY V3 and E25KRQ V3 were found in 80% and 90% of patients with A30T gp41 respectively, thus further supporting that these mutations are highly correlated with each other. Another positive correlation was observed for L34M gp41 with N7KTY V3 (Table 1).
Interestingly, both A30T gp41 and L34M gp41 were also found recently associated phenotypically with CXCR4 usage [54,56,57]. Specifically, evaluating the available gp41 sequence data from samples submitted for coreceptor tropism testing by Trofile™, a CLIA-validated cell-based recombinant virus assay, Stawiski et colleagues have observed 26 gp41 mutations associated with CXCR4-use (Dual Mix/CXCR4), with the majority being on the extracellular region [56].
A30T gp41 and L34M gp41 are located in a specific region of HR1 involved in a direct interaction with gp120 [88]. In addition, the presence of A30T gp41 and L34M gp41 was observed in CXCR4-using isolates characterized by a high infectivity and/or replication capacity in CXCR4-expressing cells, thus supporting their involvement in the mechanism underlying CXCR4 usage [56,89,90]. Overall, this supports the role of these two mutations in the stabilization of non-covalently complex gp120/gp41, and/or in viral receptor attachment and membrane fusion.
Of note, we also found positive correlations between V3 mutations and gp41 mutations localized in the transmembrane domain or in the cytoplasmic tail of gp41. This is the case of A189S gp41 , localized in gp41 transmembrane domain, which correlated with Q32KR V3 ( = 0.27; P = 0.021). Both mutations were found positively associated with the CXCR4 prediction. Moreover, it has already been noted that Q32KR V3 could determine a reduction of gp120 binding affinity for the CCR5 N-terminus, and this reduction is even stronger than that observed when positive charges are present at the classical V3 positions 11 and 25 [68].
Similarly, L210P gp41 , localized before the Kennedy sequence (that is a loop of the C-terminal tail of gp41 which is supposed to be exposed on the viral surface [91]), showed a strong correlation with G24EKR V3 ( = 0.31; P = 0.019).
Overall, our results suggest that specific additional gp41 mutations could be taken into account in order to implement the genotypic prediction algorithms currently in common use, as already demonstrated by Thielen and colleagues, who observed an improvement (albeit marginal) of CXCR4 co-receptor usage prediction [57]. In this work, it has been shown that mutations at N-terminus of gp41, such as A30T and L34M, are strongly associated with co-receptor phenotype in two independent datasets (444 and 1916 patients screened, respectively). The authors affirm that this region could theoretically be used to predict co-receptor use, alone or in combination with the V3 region. In our study, these 2 mutations, A30T and L34M, were both 100% associated to CXCR4-tropic viruses (Table 1). It is conceivable that even mutations in gp41 may modulate co-receptor specificity and facilitate efficient CXCR4-mediated entry. This is consistent with other observations that showed that determinants of CXCR4 use in a set of dual-tropic env sequences, with V3 sequences identical to those of R5-tropic clones, mapped to the gp41 glycoprotein. Indeed, Huang et colleagues have shown that mutations in the fusion peptide and cytoplasmic tail of gp41 contribute to CXCR4 use by a dual-tropic clone, while a single G515V mutation (according to HXB2 gp140 numbering) in gp41-fusionpeptide of another dual-tropic clone was sufficient to confer CXCR4 use to the R5-tropic original clone [48]. Similarly, the same authors reported previously that for HIV-1 subtype-D the V3 loop sequence of dual-tropic clones was identical to those of co-circulating R5-tropic clones, indicating the presence of CXCR4 tropism determinants also in domains different from V3 [41]. Interestingly, the threonine in position 96 that we find mutated in 72.5% of our viral X4-tropic B-subtype sequences (A96T gp41 ) and negatively correlated with the R5-determinant T22A V3 , is the wild-type amino acid of gp41 in HIV-1 consensus sequence of subtype D viruses.
Based on crystal structures of HIV-1 gp41 so far available [92][93][94][95], the positions A30, L34, A96, S129 and N140 are all exposed on the surface of the glycoprotein (in HR1 or HR2 domains). Similarly, position L210 too, being near the epitopes for neutralizing antibodies, is presumably exposed on the surface glycoprotein [91]. Differently, the position of gp41 N195 seems to be located at the end of the classical single membrane spanning domain (172-198 amino acids), recently proposed to shuttle between two different conformations during the fusion process [96]. The same residue, based on another work [91], is part of an external loop of gp41 in an alternative membrane-spanning model, suggesting its alternating intra-and extra-membrane localization.
Consequently, we could speculate that gp41 A30T, L34M, A96NT, S129DQN, N140IT, N195K and L210P mutations may act together (directly or indirectly) with specific V3 signatures, via allosteric effects on the gp120/gp41 complex. This may allow the best conformational structural plasticity of gp41 and gp120 for their appropriate and specific binding to the cellular receptors and co-receptors. To support this hypothesis, the x-ray crystal structures of CD4-bound HIV-1 gp120 have revealed that the gp120 "core" consists of a gp41interactive inner domain, a surface-exposed and heavily glycosylated outer domain and a conformationally flexible bridging sheet [14,30,97]. In addition, recent studies showed that in CD4-bound state two potentially flexible topological layers in the gp120 inner domain apparently contribute to the noncovalent association of gp120 with gp41 [98] and insertions in V3 or polar substitutions in a conserved hydrophobic patch near the V3 of gp120 resulting in decreased gp120/gp41 association and decreased chemokine receptor binding [99].
With regard to the gp120-CD4 binding, it was found that the resulting conformational modifications protrude the V3 flexible loop to interact with the cellular coreceptor [29,97]. Interestingly, monoclonal antibodies directed against the D19 epitope within the V3 region had a neutralizing function only for the X4-tropic viruses, regardless of the presence of sCD4, while for R5 isolates only upon addition of sCD4 [100]. Consequently, the inaccessibility of this antibody to R5-tropic viruses in the absence of sCD4 might indicate that there are significant V3 loop conformational differences between these two viral variants [101], but also that specific interactions occurring in the gp120/gp41 complex may participate in the HIV-1 co-receptor usage and neutralization sensitivity.
Finally, we should mention that Anastassopoulou et colleagues have shown that viruses resistant to the small molecule CCR5 inhibitor, vicriviroc, can be caused by 3 conservative changes in the fusion peptide of HIV-1 gp41 [102], and similarly Pfaff et al., very recently, found the involvement of gp120 and gp41 mutations in modulating the magnitude of drug resistance to another small CCR5 antagonist, aplaviroc [103]. Overall, these studies, which focus on changes toward resistances without assessing the issue of tropism-switch, are complementary to our results.

Conclusions
In this study, we found that specific gp41 mutations are significantly associated with different co-receptor usage and with specific V3 mutations, thus providing new information that could be taken into account for improving co-receptor usage prediction. These findings implement previous observations that determinants of tropism may reside outside the V3 loop, even in the gp41 transmembrane protein. It is possible that the gp120/gp41 complex may become structurally or functionally involved at different stages during virus-cell entry and fusion. Probably, the associations among V3 and gp41 mutations may also have an impact on the HIV pathogenesis, it is known that CXCR4 phenotype has been associated with progression and increased severity of HIV disease, and several gp41 mutations are associated with viral fitness and cytopatic effects. Additional studies are needed to confirm the degree to which these gp41 mutations contribute directly to coreceptor use and to establish the specific and precise utility of this information.