HIV-1 Gag C-terminal amino acid substitutions emerging under selective pressure of protease inhibitors in patient populations infected with different HIV-1 subtypes

HIV-1 Gag amino acid substitutions associated with protease inhibitor (PI) treatment have mainly been reported in subtype B, while information on other subtypes is scarce. Using sequences from 11613 patients infected with different HIV-1 subtypes, we evaluated the prevalence of 93 Gag amino acid substitutions and their association with genotypic PI resistance. A significant association was found for 13 Gag substitutions, including A431V in both subtype B and CRF01_AE. K415R in subtype C and S451G in subtype B were newly identified. Most PI-associated Gag substitutions are located in the flexible C-terminal domain, revealing the key role this region plays in PI resistance. Electronic supplementary material The online version of this article (doi:10.1186/s12977-014-0079-7) contains supplementary material, which is available to authorized users.


Findings
An amino acid substitution is commonly defined as an amino acid change between two consecutive sequences based on longitudinal data [1,2]. Amino acid substitutions in HIV-1 protease, commonly called resistance mutations if they confer HIV-1 drug resistance, are known to emerge under selective pressure of protease inhibitors (PIs) [3]. As an alternative mechanism, HIV-1 can escape PI selective pressure by the selection of substitutions in the protease substrate Gag [1,[4][5][6][7]. Such Gag substitutions arising during PI-based treatment have mostly been characterized in HIV-1 subtype B (Additional file 1: Table S1), while only a few studies have focused on non-B subtypes using small cohorts of patients (Table 1). Gag variability has been shown to impact PI susceptibility in a subtypedependent manner [4,6], warranting a comprehensive analysis of PI-associated Gag substitutions across different subtypes. Here, we identified novel Gag substitutions in HIV-1 non-B subtypes using longitudinal data from patients failing PI-based therapy. Moreover, we evaluated the prevalence of the newly identified and the previously reported Gag substitutions in different HIV-1 subtypes and investigated their association with genotypic PI resistance using a large sequence dataset.
We first investigated the emergence of non-B Gag substitutions during PI-based treatment in a cohort of 1068 patients followed at the University Hospital of Leuven, for which virological outcome and treatment information were available [12]. Our protocol and quality control of viral sequencing and viral load tests have been described previously [13,14]. For 69 patients infected with HIV-1 non-B subtypes and receiving PI-based treatment for at least three months, sequence information for Gag, protease and reverse transcriptase (RT) was available at baseline and at treatment failure, which was defined according to the guidelines of the European AIDS Clinical Society (EACS) (http://www.eacsociety.org/). Under drug selective pressure, 21 different substitutions at 18 Gag positions were identified among 12 patients, of whom 11 harbored Gag substitutions in the presence of (pre-existing or simultaneously acquired) drug resistance mutations in protease or RT ( Figure 1, Additional file 1: Table S2). Gag substitution P453Ins (insertion: EPTAPP) emerged in patient 343 in the absence of PI and RTI resistance mutations. Some substitutions were from a less to a more common amino acid such as M138L. Specifically, patients failing LPV/r-based regimens developed one of the following Gag substitution patterns: L363W + E477Q, F363L + N389T + P422Q + P455L, K411Q, P472S + P474L, K415R + I469T, M138L, A374T or G420A. Patients failing DRV/r-based regimens developed Gag substitution patterns P453Ins or T427P + R452G. Patients failing an ATV/rbased regimen developed Gag substitution patterns: P453L or V374A + R387K + S451G + P453Ins. A patient failing a regimen containing FPV/r and SQV/r developed L363W. Longitudinal data from 34 PI-naïve patients infected with non-B subtypes revealed the emergence of one Gag substitution (V370A) in a single patient. Overall, when analyzing all subtypes, the proportion of PI-treated patients with Gag substitutions was much higher than that of PI-naïve patients (17.4% (12/69) vs 2.9% (1/34), p-value = 0.037).
For our second analysis, we compiled a comprehensive list of 93 Gag substitutions at 55 positions in B and non-B subtypes observed in PI-treated patients, based on literature results or our first analysis as described above (Table 1, Additional file 1: Table S1). Next, we systematically evaluated the prevalence of these variants in major HIV-1 subtypes using 10865 full-length Gag sequences retrieved from the HIV Los Alamos database (one sequence per patient) ( Table 2). Sequence alignment and quality control have been described previously [15]. We found that the prevalence of 62 (66.7%) Gag variants at 39 positions was above 1% in at least one subtype or CRF (A1, B, C, D, F1, G, CRF01_AE, CRF02_AG). Among the 55 Gag positions, only 363 and 455 were highly conserved with less than 1% overall amino acid variation in every subtype and CRF in our dataset ( Figure 2A). Moreover, 77 of these 93 variants (82.8%) were found at 42 positions located in the Gag C-terminal domain (positions: 362-500).
As treatment information of the 10865 full-length gag nucleotide sequences was largely lacking, our third analysis aimed to evaluate whether these 93 Gag variants were significantly associated with genotypic PI resistance. Among the 11613 sequences pooled from the Leuven and the Los Alamos datasets (Table 2), 6645 spanned both the gag and the full-length protease regions, and were translated into amino acid sequences for our analysis. Using the drug resistance interpretation algorithms HIVdb V7.0 [16] and Rega V9.1 [17], 660 sequences were concordantly estimated to be partially or fully resistant to at least one PI, and 5657 sequences were concordantly estimated to be fully susceptible to all PIs (Additional file 1: Table S3). Sequences with discordant estimates of PI susceptibility were excluded from our analysis. Fisher's exact tests were then used to compare the amino acid prevalence between these PI-susceptible and PI-resistant datasets. Of the 93 Gag variants, 16 at 13 amino acid positions were associated with (partial or full) PI resistance in at least one HIV-1 subtype (p-value < 0.05, Additional file 1: Table S4). After multiple testing correction using the false discovery rate approach described in [18], 13 Gag variants at 10 positions remained significantly PI-associated within individual subtypes (adjusted p-value < 0.05), including 11 variants located in the Gag C-terminal domain ( Figure 2B, Table 3). Our analysis successfully identified the known PI-associated Gag substitution A431V, strengthening the validity of our approach. As the only PI-associated Gag substitution found in more than one subtype, A431V had a high prevalence in the PI-resistant strains of subtype B (13.5%) and CRF01_AE (18.2%) ( Table 3). Interestingly, of the 21 Gag substitutions observed in our first analysis, K436R, N451S C (n = 1) V135I, I376V, L486F 01_AE (n = 1) P453L/T/I F # (n = 61) [11] M138L, F363L, L363W, A374T, V374A, R387K, N389T, K411Q, K415R, G420A, P422Q, T427P, P445L, S451G, R452G, P453L, P453Ins, I469T, P472S, P474L, E477Q A1 (n = 1), C (n = 6), D (n = 1), F1 (n = 1), J (n = 1), 01_AE (n = 1), 02_AG (n = 1) Our study *Non-B Gag substitutions reported during PI-based treatment. The substitutions are summarized based on the original publications, and for the substitutions in our study, it is given relative to the baseline sequences sampled from individual patients (see Figure 1). The substitutions also identified in subtype B are indicated in bold. Additional file 1: Table S1 summarizes the information of Gag substitutions in HIV-1 subtype B. # Information of HIV-1 subtype or sub-subtype was ambiguous or not available.    Table S2 provides the full list of Gag, protease and RT substitutions in these 12 patients.   K415R and S451G were newly identified to be significantly associated with genotypic PI resistance in subtypes C and B respectively, suggesting a possible involvement in PI-resistance.

I WS G P G P S
To our knowledge, this study presents the first largescale sequence analysis to establish statistical significance of PI-associated Gag substitutions in HIV-1 non-B subtypes. Our longitudinal analysis of a clinical cohort of patients failing PI-based therapy confirmed that PItreated patients developed more Gag substitutions than PI-naïve patients. The majority of these Gag substitutions emerged in the context of pre-existing or simultaneously acquired PI or RTI resistance mutations, confirming the important role of the known resistance mutations, while in some patients Gag substitutions emerged in the absence of resistance mutations (Figure 1, Additional file 1: Table S2). Such Gag substitutions may therefore contribute to the virological failure of PI-based treatments. Based on two widely used genotypic interpretation algorithms, our comparative analysis found that only 13 (13.8%) of the 93 Gag substitutions emerging under PI selective pressure were significantly associated with genotypic PI resistance (Table 3). Particularly, the novel Gag substitutions K415R and S451G were identified in both our longitudinal and cross-sectional sequence analyses. This suggests that they may play a role in viral escape from PI selective pressure, partially contributing to the observed virological failure. Since virological outcome and treatment information is lacking for most sequences extracted from the HIV Los Alamos database, this limits our analysis to address the clinical impact of the newly identified substitutions with large-scale data. Using small cohorts, previous studies suggested that different subtypes may develop different Gag substitutions [6,19,20]. We confirmed this hypothesis since only 9 of the 58 Gag substitutions reported in non-B subtypes (Table 1) were also observed in subtype B (Additional file 1: Table S1). Among non-B Gag substitutions, 4 were significantly associated with genotypic PI resistance, of which only A431V was PI-associated in subtype B as well (Table 3). However, further evaluations on subtypes A2, D, F2, J, K and other CRFs are still needed due to the restriction of our study to particular subtypes. Interestingly, a predominant presence of PI-associated Gag substitutions at the flexible C-terminal domain of Gag ( Figure 2B) leads us to suggest the hypothesis that PI-associated Gag substitutions tend to emerge in the structural flexible regions. These Gag substitutions can emerge along with protease drug resistance mutations as shown in our longitudinal sequence analysis (Figure 1, Additional file 1: Table S2) and previous studies [21,22]. Future studies are still needed to investigate the significance of coevolution between Gag substitutions and protease resistance mutations.
Overall, our findings showed different PI-associated substitutions in the Gag C-terminal domain across *A list of Gag substitutions whose prevalence differs significantly between sequences estimated to be (fully or partially) PI-resistant and sequences estimated to be PI-susceptible (see full reports in Additional file 1: Table S4). The substitutions are indicated relative to the consensus amino acids from individual subtypes [15]. One-tailed Fisher's exact tests were performed, and p-values were adjusted using multiple testing correction via the false discovery rate (FDR) approach [18]. # Statistical analyses were only performed on individual subtype (B, C, G, 01_AE) datasets, which contained more than 10 (partially or fully) PI-resistant sequences. Additional file 1: Table S3 summarizes the subtype distribution of PI-resistant and PI-susceptible sequence datasets. &: The numerator indicates the number of sequences for which the corresponding Gag position is covered; the denominator indicates the number of sequences displaying the respective amino acid substitutions.