Next-generation sequencing analyses of the emergence and maintenance of mutations in CTL epitopes in HIV controllers with differential viremia control

Background Despite the low level of viral replication in HIV controllers (HICs), studies have reported viral mutations related to escape from cytotoxic T-lymphocyte (CTL) response in HIV-1 plasma sequences. Thus, evaluating the dynamics of the emergence of CTL-escape mutants in HICs reservoirs is important for understanding viremia control. To analyze the HIV-1 mutational profile and dynamics of CTL-escape mutants in HICs, we selected 11 long-term non-progressor individuals and divided them into the following groups: (1) viremic controllers (VCs; n = 5) and (2) elite controllers (ECs; n = 6). For each individual, we used HIV-1 proviral DNA from PBMCs related to earliest (VE) and latest (VL) visits to obtain gag and nef sequences using the Illumina HiSeq system. The consensus of each mapped gene was used to assess viral divergence, and next-generation sequencing data were employed to identify SNPs and variations within and flanking CTL epitopes. Results Divergence analysis showed higher values for nef compared to gag among the HICs. EC and VC groups showed similar divergence rates for both genes. Analysis of the number of SNPs showed that VCs present more variability in both genes. Synonymous/non-synonymous mutation ratios were < 1 for gag among ECs and for nef among ECs and VCs, exhibiting a predominance of non-synonymous mutations. Such mutations were observed in regions encoding CTL-restricted epitopes in all individuals. All ECs presented non-synonymous mutations in CTL epitopes but generally at low frequency (< 1%); all VCs showed a high number of mutations, with significant frequency changes between VE and VL visits. A higher frequency of internal mutations was observed for gag epitopes, with significant changes across visits compared to Nef epitopes, indicating a pattern associated with differential genetic pressure. Conclusions The high genetic conservation of HIV-1 gag and nef among ECs indicates that the higher level of viremia control restricts the evolution of both genes. Although viral replication levels in HICs are low or undetectable, all individuals exhibited CTL epitope mutations in proviral gag and nef variants, indicating that potential CTL escape mutants are present in HIC reservoirs and that situations leading to a disequilibrium of the host-virus relationship can result in the spread of CTL-escape variants. Electronic supplementary material The online version of this article (10.1186/s12977-018-0444-z) contains supplementary material, which is available to authorized users.


Background
One of the main characteristics of HIV-1 infection is the occurrence of clinical, but not virological, latency between the acute and AIDS phases over time, with variable duration among infected individuals that generates distinct progression profiles. Although the majority of individuals present high viral loads during the chronic infection phase, evolving to AIDS after 8-10 years of infection, a small fraction remains clinically asymptomatic for a long period. These individuals have normal CD4 + T cell counts in the absence of antiretroviral treatment (ART) and are termed longterm non-progressors (LTNPs) [1]. Moreover, some individuals, called HIV controllers (HICs), exhibit spontaneous control of viral replication at different levels, maintaining low or undetectable viremia during infection [2].
A crucial characteristic of HIV-1 is the high genetic variability and elevated rate of intra-host viral evolution [3]. This higher rate of viral evolution favors the emergence of viral variants that are more cytopathic [4][5][6], resistant to ART [7] and/or constitute escape variants from the host immune response [8][9][10][11]. In addition to variants that show escape from neutralizing antibodies [12], CTL-escape mutations are important to HIV-1 pathogenesis, as much evidence points to the pivotal role of the CD8 + T cell response in viral control [13][14][15][16] and as a continuous force driving viral selection [17]. These escape mutations have been characterized as amino acid changes occurring in central (impairing TCR recognition) and terminal (modifying anchor residues) regions of CTL epitopes or in their flanking regions, impairing epitope processing [18]. CTL-escape mutations begin to arise during the acute phase of infection [19][20][21][22] and can be identified even at human population levels based on HLA profiles [23][24][25]. Especially in individuals with protective HLA alleles, like HLA B*57 and B*27, those mutations arise faster in higher numbers [26] and preferentially, at anchor residues or multisite [27].
Although different studies have identified low HIV-1 evolutionary rates in LTNPs and HICs [28][29][30], immuneescape variants continue to arise in patients with these profiles, mainly when harboring protective HLA-B alleles [17,31,32]. Moreover, HICs present an efficient CD8 + T cell response [33,34] that can favor high selective pressure and the emergence of immune-escape variants [35]. In HICs, this phenomenon can generate new effective CD8 + T cell responses after viral escape and maintain viremia control [36,37] or can result in a loss of viremia control and disease progression [38][39][40]. Regardless, studies have rarely detected CTL-escape mutants in proviral sequences from HICs, even when emerging in plasma viral sequences.
Next-generation sequencing (NGS) is shown a useful tool for studying the dynamics of minority HIV-1 variations in infected individuals. NGS has been successfully applied to reveal hidden mutations conferring resistance to antiretroviral drugs in treated patients [41][42][43], to monitor viral tropism change dynamics [43,44], and to assess the dynamics of immune-escape mutations in HIV-1-infected individuals [16,19,20]. However, this approach has not yet been employed to evaluate CTLescape mutations in HICs.
Thus, the present study aimed to apply the NGS strategy to evaluate the overall genetic variability of gag and nef genes and to identify the emergence of potential CTL-escape mutations in proviral DNA sequences from 11 LTNP/HICs during long-term follow-up.

Study subjects
A cohort of 11 HICs with an LTNP profile, defined as subjects infected with HIV-1 for at least eight years and maintaining RNA viral loads lower than 2000 copies/ml and CD4 + T cells counts higher than 500 cells/mm 3 without ART, were followed-up at the Instituto Nacional de Infectologia Evandro Chagas (INI), Rio de Janeiro, Brazil. These subjects were classified into the following groups according to plasma viral load (VL): (1) elite controllers (ECs) if most (≥ 70%) plasma viral load determinations were below the limit of detection for clinically available assays (< 50 or < 80 copies/ml) (n = 6) and (2) viremic controllers (VCs) if most (≥ 70%) VL determinations were between 80 and 2000 copies/ml (n = 5). Patients were seen at least once every 6-12 months to perform clinical monitoring tests, such as RNA viral load quantification and CD4 + T cell count. At each visit, PBMCs were obtained as previously described [45] and stored in liquid nitrogen until use. The present work was approved by the Brazilian National Committee for Research Ethics, and all patients provided written informed consent.

Genomic DNA extraction and PCR
For each patient, PBMCs samples from the earliest (V E ) and latest (V L ) visits were used for DNA extraction. Thawed PBMCs (≅ 1 × 10 7 cells for VCs and ≅ 2 × 10 7 cells for ECs) were suspended in 1 ml of DNAZOL (Invitrogen, Wisconsin, USA) and incubated for 72 h at 4 °C. Genomic DNA was further extracted as previously described [46] and used to amplify fragments related to regions 408-1844 and 8697-9639 of the HIV-1 Genome (in relation to HXB2), encompassing gag and nef, respectively, as described elsewhere [45]. The primer sets are described in Additional file 1: Table S1. The PCR reactions were carried out by using Platinum ® Taq DNA Polymerase High Fidelity (Invitrogen) according to the manufacturer's protocol. To avoid using samples near or at the limit of detection, all samples were previously tested in triplicate via nested PCR, and only samples with at least 2 positive reactions of 3 total were included in the study (data not shown). The PCR products were purified using an Illustra GFX PCR DNA purification kit (GE Healthcare, Pennsylvania, USA) and quantified with Qubit dsDNA BR Assay Kit (Invitrogen) using a Qubit ® 2.0 fluorometer (Invitrogen).

Library preparation and NGS
Purified gag and nef amplicons obtained for each patient visit were multiplexed in equimolar pools and used to construct an NGS genomic library with Nextera XT DNA Library kits (Illumina, California, USA) and Nextera XT Index Kit (Illumina), according to the manufacturer's instructions. The generated libraries were normalized and clustered using HiSeq SR Rapid Cluster V2 (Illumina). NGS was performed using HiSeq Rapid SBS v2 of 200 cycles (Illumina) with an Illumina HISEQ 2500 sequencer (Illumina).

Data analysis
The raw NGS data obtained were compiled into FastQ files, and the quality of the reads was assessed by using the software FastQC [47]. The tool Trimmomatic v0.32 [48] was used to trim adapter sequences, and the first 100 bp of the reads were obtained. Furthermore, Sickle [49] was employed to select only sequences with size > 150 bp and Q > 30. Reads were mapped separately to gag and nef sequences from HXB2 (GenBank accession K03455) with Geneious 9.0.5 [50] by using medium sensitivity and 5 iterations with the Geneious mapping algorithm. Consensus sequences with a 90% base threshold and a minimum coverage of 20X for each mapping were obtained through Geneious. Reads were then remapped using obtained consensus sequences as a reference to filter differences between HXB2 and individual HIV-1 quasispecies. Samtools [51] was utilized to obtain mapping coverage statistics. Genetic divergence calculations between the consensus for V E and V L and Neighbor-Joining (NJ) trees with 1000 bootstrap replicates were generated with Mega v.6 and by using the Tamura Nei substitution model, as recommended by jModel test [52]. For each mapping, variant call analysis was performed in Geneious software to assess single-nucleotide polymorphisms (SNPs); variation at Q > 30, coverage > 100X and frequency > 0.5% were used as NGS sequencing quality parameters. Relevant CTL epitopes, restricted by the HLA-B alleles carried by the individuals included in the study group, were selected from the epitope database of Los Alamos HIV Immunology Database [53] available September 2017 (Additional file 1: Table S2 and S3) and used to identify variations within or adjacent (3 amino acids flanking the sequence) to epitopes that may be related to the immune response.

Statistical analysis
GraphPad Prism 6 was used to plot graphs and to estimate the median values of divergence per year, number of variants, ratio of variable positions/total of positions and synonymous/non-synonymous mutation ratio. The Mann-Whitney U test was performed using R v3.4 to compare gag and nef genes from the HIV-1 EC and VC groups. p values < 0.05 were considered statistically significant. Table 1 describes the clinical and epidemiologic characteristics of a group of HIV-infected individuals from the INI cohort classified as LTNPs and HICs. The median age of the individuals was 48 years (IQR 45-51); most of them (67%) were women, and the heterosexual category of exposure was prevalent (58%). The median time of HIV-1 suppression was 15 years (IQR 14-18). The medians of CD4 + and CD8 + T cells counts were compatible with the values currently described for HIV-1-uninfected individuals [54]. Among the 11 individuals analyzed, all were infected with HIV-1 variants genotyped as subtype B, except for VC14, who was infected with subtype F1. Six of 9 patients carried HLA-B allele B*57 or B*52, associated with slow clinical progression [55][56][57]. Two individuals were heterozygotes for CCR5Δ32 deletion, also described as a protective factor [56]. EC17, VC14, and EC42 did not carry an HLA-B allele or exhibit a CCR5 genetic profile associated with the control of viral replication and/or non-progression to AIDS.

NGS data yield and gene mapping
From the NGS, a general median number of 4,667,122 reads (IQR 5,491,122-3,688,230) of 35-200 bp were obtained per visit from each individual. Approximately 70% of reads (IQR 68-76%) were retained after selection by size and quality controls. Additional file 1: Table S4 shows the mapping coverage for both genes of each patient per visit. For gag, the median coverage was 212,674 reads/bp (IQR 173,132-261,207) per individual/ per visit. The data generated allowed reconstruction of the consensus sequence for at least the first 1050 bp of gag. However, gag mapping was not successful for EC17 and EC42 samples at the V E visit. With regard to the nef gene, the median coverage was 286,463 reads/bp (IQR 179,882-426,834) per individual/per visit. The generated data allowed reconstruction of the consensus sequence of 821 bp, covering full-length nef. Correct nef mapping was not successful for VC06 V E and EC17 V L visit samples.

Genetic diversity of gag and nef regions among HICs
To estimate evolution of the studied viral genes, we calculated the genetic divergence between the V E and V L  (Fig. 1a).
For nef sample VC06 V E and gag samples EC17V E /EC42 V E , which were not successfully mapped by NGS, bulk sequences available from previous studies using conventional Sanger sequencing [58] were employed. Among the HICs, the median of viral divergence per year was significantly higher for nef compared to gag (0.6 vs 0.1%; p < 0.03). Similar divergence rates were observed comparing EC and VC groups for both genes. Compared with gag from the VC (p < 0.008) and EC (p < 0.04) groups, nef from the VC group showed significantly higher divergence rates. We used variant call analysis to quantify and qualify SNPs and variations identified in the NGS data for each individual and visit. For both genes, VCs showed a higher number of variations (Fig. 1b) than did ECs (p = 0.05 for gag and p < 0.003 for nef). Comparison of the variable position ratio per total analyzed positions showed the same pattern ( Fig. 1c; gag VCs vs ECs p < 0.04; nef VCs vs ECs p < 0.03). Otherwise, gag in VCs had a higher number of variants than did nef in ECs (p < 0.009), whereas nef from VCs showed higher ratios of variable positions than did gag from ECs (p < 0.002). Synonymous/nonsynonymous mutation ratios were not significantly different between the HIC groups or the studied genes, even though the median ratio for VC gag was greater than one and the ratios for gag in ECs and nef in both groups were less than one (Fig. 1d).

Variability in CTL restricted epitopes in HICs
To assess the occurrence of potential Gag and Nef protein immune-escape mutations in the HICs included in this study, we identified for each individual the epitopes  We further analyzed all non-synonymous SNPs associated with those epitope regions for each patient by comparing their frequencies between V E and V L . Tables 2  and 3 show HICs carrying epitope mutations with a frequency change > 10% between visits, as distributed according to the HLA-B allele. Full lists of Gag and Nef epitope mutations for all the individuals included in this study are available as Additional file 1: Tables S5 and S6. Among HICs, EC individuals presented two main patterns of epitope mutation changes. For those individuals with none (EC02, EC52) or very few (EC17) viral blips during clinical follow-up, the majority of the epitope mutations detected were rare (< 2%) in both Gag and Nef, except for Gag epitope positions A83S (adjacent to EV9) and S278C (RI9), which each corresponded roughly to 23% in EC02 V L .
In addition, we observed mutations with major frequency changes between visits for all ECs who had more frequent viral blips (< 30%) during clinical follow-up (Tables 2 and 3). For EC11, we detected main changes in the frequency of Nef epitope mutations V133P (from 70.7 to 99.4%) and F191 V (from 85.5 to 60.2%) and Gag epitope mutation K26R (from 0 to 94.1%) between V E and V L . For EC18, Nef epitope mutation K92R appeared only at V L (68%); for Gag, epitope mutations K28R, I34L, R39 K and T280S became the majority at V L , whereas R43Q and R76K mutations decreased from V E (approximately 20%) to undetectable levels at V L . EC13 showed the highest number of Nef changes in mutation frequency, such as SNPs V10 K, G12R, M79I, and G140R, which became predominant at V L ; in contrast, reversion to subtype B consensus Y and V residues were observed for Y135F and V148I. For Gag, NGS data for EC13 at V L revealed E40Q and N126R mutations in approximately 20%, but with no significant changes in comparison to the corresponding bulk sequence available for this individual.
As expected, VCs had more mutations with frequencies above 1% than did ECs. In addition to the major frequency changes, some patients also showed a high number of mutations with equivalent frequency throughout the visits. VC06 presented dominant changes in Nef mutations V85I, L87I, R105K, and I114V, despite no significant change in Gag. VC10 also presented major changes in Nef mutations T15A and E182L; for Gag, main alterations were observed in N126S and in a Consensus amino acid from the available bulk sequence shown instead of frequency; Location A-adjacent to epitope; Location I-within epitope co-circulation of A146P/A146S, T280S/T280I, S173T/ S173, and I223V/I223 variants at both visits. VC14 showed reversion of Nef T71R and H116N and the emergence of I101V, H102Y, P129Q, and V133I at V L ; for Gag, major changes were approximately 20% of the reversions for V82I, I147L, and S310T. Furthermore, VC15 presented main changes in Gag for V82I, D121A, T122A, and T280V, with reversion observed for H124N, whereas Nef V85L and H102Y increased by approximately 30%. VC16 presented a large number of amino acid changes, mainly in Gag epitopes, with reversion of approximately 30% for I34L, V35I, V46I, V82L, A118P, A119T, G123 K, and S125R mutations and an increase from 20 to 30% for H124S, Q1227H, I138L, A146P, V168T, T242 N and A340G from V E to V L . Major changes from approximately 40-70% were also found for T122A, S173T, and V191I. Co-circulation of subtype B consensus amino acids with mutations was found for K335R/K335. For Nef, we observed co-circulation of wild-type and Y81F and H116N mutations, reversion of H102Y and major

Discussion
HICs are a rare population of HIV-1-infected individuals who represent the best existing model of spontaneous viral control [2,59,60]. Although the mechanisms responsible for this control are not fully understood, studies show that these individuals have a differentiated and more effective CTL response [61], which should be then reflected in greater genetic pressure on their viral quasispecies, mainly in immunodominant regions. Moreover, the presence of protective HLA alleles related to clinical non-progression and/or viremia control is also associated with stronger selective pressure for virus diversity [17,22,26,27,31,32,35,36,[62][63][64][65][66].
Although most HICs have plasma viral loads below the limit of detection of commercial assays, basal viral replication levels can be detected by ultrasensitive methods [59,67,68] and should favor viral evolution to some degree due to the characteristic high-genetic variability from HIV. Several studies have reported lower levels of viral diversity in HICs compared with the levels in typical progressors [28,29,32,59,69]. Gijsberg et al. estimated divergence rates of 0.9-1.9% over ten months for gag in HIV-1 samples from typical progressors [70]. In our study, a median of viral divergence of 0.1% per year was observed for gag gene from HICs, corroborating the low level of viral evolution found in HIV-1 samples from those individuals. A higher median of divergence was observed for nef (0.6% per year), suggesting that nef has a greater potential for viral diversity than does gag, in agreement with previous studies showing greater conservation of gag [71,72]. Previous observations from our group of lower quasispecies diversity for the env gene from EC samples in comparison to VCs [73] and the low, but similar, median values of divergence in gag and nef for viral samples from ECs and VCs in the present study indicate that lower levels of viral replication restrict evolution.
To our knowledge, this is the first study employing NGS to analyze HIV-1 diversity in major CTL epitopes of HICs. Indeed, most similar studies were performed with SIV-infected primates with a viremia control profile [74,75]. The use of the NGS platform allowed the estimation of variability in terms of SNP quantity and nature. Previously, Cale et al. [76] using 454 sequencing, showed that full coverage of 50,000 reads/bp was sufficient to detect variants with frequencies of 0.006%. In our work, we used a medium coverage of > 200,000 reads/bp to assess SNPs with frequencies higher than 0.5%, which should identify the most representative escape mutation variants while preventing analysis of data related to sequencing artifacts. With this approach, we showed higher levels of variants and variable positions in the gag and nef genes of viral samples from VCs compared to ECs, correlating with the higher variability expected for the first group. Although differences between the synonymous/non-synonymous mutation ratios of both groups and genes were not statistically significant, gag of ECs and nef of ECs and VCs displayed a predominance of non-synonymous mutations, in contrast with previous studies reporting that synonymous mutations are more significant to evolution of gag and nef [32,77].
To characterize possible CTL-escape mutants, we performed analysis of variations in the epitopes restricted by each patient's HLA-B allele. Similar molecular analysis has been able to identify that most mutations arising in the first weeks of the acute phase are the result of CTL response selective pressure [13,[19][20][21][22]. Concerning HICs, Migueles et al. [78] showed a low frequency of CTL-escape mutations for the KF11 epitope, despite its high level of CTL recognition in individuals carrying the B57 allele. Additionally, a more in-depth description of gag and nef gene evolution in B57 + elite suppressors showed that despite the predominance of immune-escape mutations in gag and nef quasispecies obtained from plasma viral RNA, these mutations are rare in proviral sequences [32,65,77,79].
In our study, non-synonymous mutations were found in Gag and Nef CTL epitope regions in all HIVcontrollers regardless of their rarity in the proviral compartment. Due to the low HIV proviral load inherent to ECs, a higher number of PBMCs was used for DNA extraction than for VCs (≅ 2 × 10 7 cells for ECs vs ≅ 1 × 10 7 cells for VCs) in order to assure a proviral input in the nested-PCR sufficient to assess the viral variability in each sample. Moreover, all samples were tested in triplicate and only those with at least 2 out of three positive nested-PCR amplification were used to prepare the NGS amplicons. These strategies were employed to prevent low input of viral copies on PCR that could lead to template resampling. Moreover, the high number of sequences generated from each sample and the higher sensitivity of NGS to access minority viral variants, in contrast to techniques such as single-genome amplification (SGA) or cloning [19,20], allowed the detection of those mutations. The occurrence of unique low-frequency mutations for both early and late visit samples from the same individual showed that in contrast to the results of Bailey et al. [65], possible escape mutants from the CTL response replicating in the plasma compartment can successfully integrate into host cells. Although this low frequency might appear to be insignificant, new CTL-escape mutants do not often arise in massive frequencies but can expand from very low to predominant conditions, as previously exemplified for SIV [80] and observed in our work for the following mutations: Gag-N126S (0.7 -> 99.8%) for VC10 and Gag-T280V (1.8 -> 98.8%) for VC15.
Comparing Gag and Nef mutations with significant frequency changes revealed that mutations within the analyzed epitopes predominantly occurred in Gag, whereas no pattern of mutation was present in Nef epitopes, within or in adjacent regions. Although both types of mutations can generate CTL escape through different mechanisms, mutations within epitope are more easily associated with the escape profile, as it directly affects epitope anchoring and TCR recognition [18,81]. This observation may be related to the predominance of the CTL response associated with Gag during the chronic phase of infection [13,15,22,34], generating higher selective pressure in this gene and resulting in greater diversity of the epitopes.
In general, the low number of patients in each group is a limitation to identifying statistically significant associations between the level of viremia control and the emergence dynamics of mutations related to CTL epitopes. The low viral load observed for ECs was also a limitation to assessing escape mutations in the plasma compartment, which reflects the variants that are effectively replicating in the host. However, by evaluating the proviral reservoir, which represents the pool of viruses that can be a source of plasma viral particles, we were able to assess a greater number of mutations with significant frequency changes either in VCs or ECs. Although some ECs, such as EC08, did not present any significant change across visits, all VCs showed mutations that characterized variant replacement with regard to both Nef and Gag. In those patients, we were able to observe co-circulation of more than one mutation in the same position, which is indicative of greater dynamic quasispecies turnover. Reversion to wild-type amino acids, as based on the reference subtype, was also observed for VCs, and reversion of escape mutations has been extensively described in the literature as a common viral mechanism of evolution related to the CTL response [81][82][83][84][85][86].

Conclusion
Although none of the observed mutations could be confirmed as a CTL-escape mutation due to the lack of CD8 + T cell functional analyses, the present study shows that despite low or undetectable levels of viral replication among HICs, genetic variability occurs in viral quasispecies in the proviral compartment. Amino acid substitutions across visits and the existence of low-frequency mutants, even in ECs, indicate that potential CTL-escape mutants exist and are present in those individual reservoirs. This fact implies that situations leading to a disequilibrium of the host-virus relationship can result in the spread of CTL-escape variants with pathological consequences. More studies are necessary to address why those adapted variants do not achieve replicative success in ECs.

Additional file
Additional file 1. Table S1. Primer set used in the present study. Table S2. Gag Epitopes selected for study according to the HLA-B alleles carried by the HICs. Table S3. Nef Epitopes selected for study according to the HLA-B alleles carried by the HICs. Table S4. NGS mapping and coverage statistics of gag and nef distributed according to the patients. Table S5. Full list of Gag mutations and associated epitopes recognized by HLA alleles carried by the HICs. Table S6. Full list of Nef mutations and associated epitopes recognized by HLA alleles carried by the HICs