HIV-1 subtype C envelope characteristics associated with divergent rates of chronic disease progression

Background HIV-1 envelope diversity remains a significant challenge for the development of an efficacious vaccine. The evolutionary forces that shape the diversity of envelope are incompletely understood. HIV-1 subtype C envelope in particular shows significant differences and unique characteristics compared to its subtype B counterpart. Here we applied the single genome sequencing strategy of plasma derived virus from a cohort of therapy naïve chronically infected individuals in order to study diversity, divergence patterns and envelope characteristics across the entire HIV-1 subtype C gp160 in 4 slow progressors and 4 progressors over an average of 19.5 months. Results Sequence analysis indicated that intra-patient nucleotide diversity within the entire envelope was higher in slow progressors, but did not reach statistical significance (p = 0.07). However, intra-patient nucleotide diversity was significantly higher in slow progressors compared to progressors in the C2 (p = 0.0006), V3 (p = 0.01) and C3 (p = 0.005) regions. Increased amino acid length and fewer potential N-linked glycosylation sites (PNGs) were observed in the V1-V4 in slow progressors compared to progressors (p = 0.009 and p = 0.02 respectively). Similarly, gp41 in the progressors was significantly longer and had fewer PNGs compared to slow progressors (p = 0.02 and p = 0.02 respectively). Positive selection hotspots mapped mainly to V1, C3, V4, C4 and gp41 in slow progressors, whereas hotspots mapped mainly to gp41 in progressors. Signature consensus sequence differences between the groups occurred mainly in gp41. Conclusions These data suggest that separate regions of envelope are under differential selective forces, and that envelope evolution differs based on disease course. Differences between slow progressors and progressors may reflect differences in immunological pressure and immune evasion mechanisms. These data also indicate that the pattern of envelope evolution is an important correlate of disease progression in chronic HIV-1 subtype C infection.

Results: Sequence analysis indicated that intra-patient nucleotide diversity within the entire envelope was higher in slow progressors, but did not reach statistical significance (p = 0.07). However, intra-patient nucleotide diversity was significantly higher in slow progressors compared to progressors in the C2 (p = 0.0006), V3 (p = 0.01) and C3 (p = 0.005) regions. Increased amino acid length and fewer potential N-linked glycosylation sites (PNGs) were observed in the V1-V4 in slow progressors compared to progressors (p = 0.009 and p = 0.02 respectively). Similarly, gp41 in the progressors was significantly longer and had fewer PNGs compared to slow progressors (p = 0.02 and p = 0.02 respectively). Positive selection hotspots mapped mainly to V1, C3, V4, C4 and gp41 in slow progressors, whereas hotspots mapped mainly to gp41 in progressors. Signature consensus sequence differences between the groups occurred mainly in gp41.
Conclusions: These data suggest that separate regions of envelope are under differential selective forces, and that envelope evolution differs based on disease course. Differences between slow progressors and progressors may reflect differences in immunological pressure and immune evasion mechanisms. These data also indicate that the pattern of envelope evolution is an important correlate of disease progression in chronic HIV-1 subtype C infection.

Background
The rate of disease progression in HIV-1 infected individuals is determined by a complex interplay of viral characteristics, host genetic factors, immune responses and environmental factors. The high viral replication rate, the lack of proof-reading mechanism by the HIV reverse transcriptase enzyme, and high recombination rate are characteristics that ensure that the virus continuously mutates and evolves, resulting in both HIV diversification and viral escape from host immune responses [1,2]. Viral diversity and the constant generation of new viral quasispecies that may not be recognized or eliminated by the host immune mechanisms, particularly contemporaneous virus-specific cytotoxic CD8+ T-cells or neutralizing antibodies, are major impediments for the development of an efficacious HIV-1 vaccine [3,4].
The HIV-1 envelope (Env) subunits gp120 and gp41 are the only viral proteins that are exposed on the virus surface, and they are under continuous host selective pressure, as they are key determinants of the target host cell range and are important targets of neutralizing antibodies and CD8 T cell responses. Specific Env sequence characteristics such as the overall amino acid diversity, the number of putative N-linked glycosylation sites (PNGs), and the length of variable loops have been shown to influence or correlate with antibody neutralization sensitivity, cell tropism, co-receptor utilization and virus transmission [5][6][7]. Studies of Env diversity can also provide important clues for selective forces that may significantly influence the rate of disease progression or alternatively identify specific regions of the Env protein that comprise important targets of effective immune pressure which may be important considerations in rational HIV-1 vaccine design.
In HIV-1 subtype B, the relationship between HIV-1 Env diversity and disease progression is complex, as illustrated by a series of studies. In one early study, HIV-1 Env hypervariable region 3 (V3 loop) diversity was shown to increase with time [8]. A subsequent study showed that Env hypervariable regions 3 to 5 (V3 to V5) diversity was directly associated with duration of patient survival, positive selection for change, and inversely correlated with the rate of disease progression as measured by the slope of CD4+ T cell loss [9]. Another study that examined Env C2-V5 sequences in men followed for 6 to 12 years following seroconversion demonstrated a complex pattern of viral diversity characterized by an early phase of linear increases in divergence and diversity, followed by an intermediate phase with increase in divergence but stabilization or decline of diversity, and a final phase showing stabilization or reduction in divergence and continued stability or decline in diversity [10]. In another study, analysis of C2-V5 Env sequences among typical progressors versus slow progressors showed that the typical progressors exhibited higher diversity, lower intra-and inter-sample divergence, evidence of lower host selective pressure and increases in both synonymous and non-synonymous substitutions over time while only non-synonymous substitutions increased in slow progressors [11].
The aforementioned studies and a comprehensive body of similar studies on HIV-1 diversity, divergence, and host selective forces that may impact on disease progression have been performed on HIV-1 subtype B [10,[12][13][14][15][16][17][18]. Furthermore, these studies clearly demonstrate that patterns of Env diversity, divergence, and associated selective pressures identified can differ according to the stage of disease, the sampling methodology, the region of Env analyzed, the founder virus, and the host genetic background.
HIV-1 subtype C is the most rapidly spreading subtype worldwide [19,20], and an effective global vaccine will have to show efficacy against this subtype. A number of studies have explored Env diversity and diversification within HIV-1 subtype C [21,22] but data on this subtype remain relatively limited, despite accumulating evidence that this subtype may differ significantly from HIV-1 subtype B in certain biological properties mediated by the Env gene [21][22][23][24][25]. In particular, possible differences in Env diversity, divergence, and selective pressures between HIV-1 subtype C-infected individuals with divergent rates of disease progression remain understudied.
In this study, we used single genome amplification and sequencing to explore the evolution of the Env gp160 protein. Specifically, we investigated differences in diversity and divergence in 4 slow progressors and 4 progressors of black African descent infected with HIV-1 subtype C. Further, we investigated differences in Env features such as the extent of putative N-linked glycosylation, lengths of the variable and constant regions of gp160, and positive selection in slow-progressors and progressors in order to assess the correlation of these variables with rates of disease progression.

Participants
Participant samples were retrospectively identified from the Sinikithemba cohort, which is a prospective natural history study of HIV-1 infected individuals based at McCord Hospital, Durban, South Africa as previously reported [26]. Ethics approval was obtained from the University of KwaZulu-Natal Biomedical Research Ethics Committee and all participants gave written informed consent to participate in the study. CD4 counts were performed at three month intervals whereas viral loads were done at six month intervals.
For this substudy, CD4 count was chosen as the primary determinant of disease progression for stratification into slow progressor and progressor categories. Both slow progressors and progressors were selected on the basis of a CD4 cell counts >500 cells/μl at study entry time point. However, at study exit, slow progressors maintained a CD4 count above 500 cells/μl or a viral load less than 10,000 viral RNA copies/ml. In contrast, progressors declined in CD4 counts to below 500 cells/μl and had a viral load above 10,000 copies/ml. The overall average follow up time was 19.5 months. All individuals were antiretroviral therapy naive before and during the window of evaluation. When the virological and immunological data became available beyond the study window (follow-up of an average of 39.8 months for slow progressors and 36.8 months for progressors, we analyzed these parameters relative to the study entry criteria and they remain statistically different for the progressors only (p = 0.03 for both CD4 and viral load).

Sample Collection, CD4 T cell counts and Plasma Viral Load
Blood was drawn from each subject into EDTA tubes and plasma was separated by centrifugation and stored at −80°C until use. Viral load was measured using the Amplicor Version 1.5 assay (Roche, Alameda CA, USA). CD4+ T-cell counts were enumerated by Trucount technology on a four colour FACS Calibur flow cytometer (Becton Dickinson, Franklin Lakes, New Jersey, USA).

Sequencing analysis of gp160
The full-length envelopes were sequenced in the forward and reverse directions using the ABI Prism Big Dye Terminator Version 3.1 cycle sequencing kit (Applied Biosystems, Foster City, CA), utilizing primers spanning the entire envelope and approximately 300 bp apart. Sequences were then resolved on the ABI 3130 XL genetic analyzer. Contigs were assembled and edited using the Sequencher v 4.8 software (Genecodes, Ann Arbor, MI). The sequences were aligned using Clustal W [28] and manually edited in the Genetic Data Environment (GDE 2.2). For phylogenetic analysis, subtype reference strains were obtained from the Los Alamos HIV sequence database http://www.hiv.lanl.gov/content/ sequence/NEWALIGN/align.html). Phylogenetic trees were generated in PAUP*4.0b10 using the TVM I + G model of substitution as determined by MODELTEST 3.7 [29]. Trees were rooted with a homologous region of Group O reference (O.CM.96). Maximum likelihood (ML) trees of sequences from individual patients were also drawn using the appropriate evolutionary model (as determined by MODELTEST 3.7) and rooted with the "Best-fit root" as determined by Path-O-Gen v1. 2 [31]. BEAUTi was used to generate the .xml file to generate the BEAST file. The GTR substitution model with estimated base frequencies and a site heterogeneity model of gamma + invariant sites were used. A relaxed, uncorrelated lognormal molecular clock model was chosen. The MCMC (Monte Carlo Markov Chain) length of chain was set at 30,000,000 to give an effective sample size (ESS) > 170. The number and location of putative N-linked glycosylation sites (PNGs) were estimated using N-GlycoSite (http://www.hiv.lanl.gov/content/sequence/GLYCOSITE/glycosite.html) from the Los Alamos National Laboratory database. Sequence diversity was calculated using the Maximum Composite Likelihood option in Mega 4.0 [32]. Characteristic differences between progressors and slow progressors including corresponding study entry and exit time-points were identified using VESPA (Viral Epidemiology Signature Pattern Analysis) [33]. Nucleotide substitution rates were calculated using baseml from the PAML software package [34]. Sites under positive selection were identified using the SLAC option in HyPhy [35] and CODEML as implemented in the PAML software package.
Positively selected sites and signature mutations were mapped onto the X-ray structure of a clade C HIV-1 gp120 (3LQA.pdb) [36] using the BIOPREDICTA module in the VLifeMDS software package (VLife Science Technologies, 2007). Gp41 was modeled in SWISS-MODEL [37] using 1ENV.pdb [38] as a template. Structures were rendered and annotated in PyMol [39].

Statistical analyses
Pairwise comparisons of different parameters including genetic diversity, PNGs, and length polymorphism between subjects in the two groups were calculated by the Mann-Whitney non-parametric test using the GraphPad Prism 5 software programme unless otherwise stated. Correlations were regarded as statistically significant with a p value < 0.05. All reported p values are for two-sided tests.

Genebank accession numbers
Sequences have been assigned the following GenBank accession numbers: GU216702-GU216737 and GU216739-GU216847.

Study participant characteristics
There were eight participants in this study, seven female and one male. The average age of the participants was 34 years old (range: 22-59 years). At study entry, both progressors and slow progressors did not differ in their CD4 T cell counts (medians of 621 cells/μl versus 571 cells/μl (p = 0.39) as shown in figure 1. However, at study exit the median CD4 count of slow progressors was 506 cells/μl, which is not significantly different from the CD4 count at study entry (p = 0.7), while the progressors' median CD4 count had significantly declined to 283 cells/μl, (p = 0.03). Slow progressors also had no significant difference for viral load (p = 1.0, data not shown) between study entry and exit time-points, whereas progressor participants had significantly lower viral load (p = 0.03, data not shown) at study entry compared to exit time-point. In addition, CD4 (figure 1) and viral load (data not shown) were statistically different for progressors only at the latest available time-point compared to study entry (p = 0.03 for both parameters). Furthermore, we used BEAST to estimate the approximate time of infection in both groups of participants. Slow progressors were estimated to be infected for a mean period of 8.2 years (range 4.75-15 years) compared with 2 years (range 0.75-3.75 years) for progressors.

Phylogenetic relationships
To analyze phylogenetic relationships and changes in envelope sequences in slow progressors and progressors over a period 19.5 month follow-up, a mean of 9 single genome full-length gp160 amplicons per participant per timepoint(range 4-11 amplicons) for the study entry and exit time-point were analyzed, for a total of 146 sequences. One of the slow-progressors (SK312) had a few putative functional Env amplicons which were included in the final analysis when compared to the other study participants. This was due to a low number of SGA-derived clones which was limited by the low viral load and plasma sample availability. All participants' consensus sequences bootstrapped confidently with subtype C reference strains, as determined by a Maximum Likelihood tree for each patient at each time point ( Figure 2A). As expected, consensus sequences from the study entry and study exit for each patient formed monophyletic groups.
Overall, there were no distinguishing phylogenetic patterns noted between sequences from the slow progressors and progressors (Figure 2A). Slow progressors showed a more diverse pattern characterized by either separate (sub)clusters at study entry and exit ( Figure 2B -SK035) or intermingling of sequences from early and exit time points ( Figure 2E -SK312). Additionally, phylogenetic clusters at study exit typically showed similar ( Figure 2C -SK036) or longer branch length ( Figure 2D, example subject -SK169), compared with that of the study entry sequences. However, individual participant sequence trees for the progressors tended to show segregation between entry and exit time-point sequences ( Figures 2F-I).

Intra-patient diversity analysis
Intra-patient diversity, defined as the mean pair-wise nucleotide distance, was calculated by measuring distances between all sequences from a single individual at a single time-point, and is shown alongside the phylogenetic trees (Figures 2B-I). Mean overall intra-patient diversity was 2.75% for the four slow progressors and 2.21% for the four progressors (p = 0.07). The mean baseline intra-patient nucleotide diversity for the slow progressors was 2.63% (range 1.8-3.3%) and 1.42% (range 1.0-2.0%) for the progressors, but this did not reach statistical significance (p = 0.08). Study exit time point mean intra-patient diversity was 2.88% (range 1.9-4.2%) and 3.0% (range 1.0-7.4%) for slow progressors Figure 1 CD4 of study entry, study exit and latest available time-point data for slow progressors and progressors. The red circles depict the data points for the slow-progressors. The blue squares depict data points for the progressors. Red bars and blue bars represent the p values for the slow progressors and progressors respectively. Black bars represent p values for inter-group comparison for the different time-points. NS = not significant. All comparisons between the study entry, study exit and latest available time-point parameters were performed using the Mann-Whitney unpaired t test, and p values are shown. Differences were regarded as statistically significant with a p value < 0.05. When slow progressors were compared to progressors, the analysis yielded significant differences when the CD4 at study exit and last available time-points were compared -as shown above (p = 0.04 and p = 0.02 respectively). Likewise viral load was significantly different between the groups at study exit and the latest available time-point (p = 0.03 and p = 0.02 respectively, data not shown). and progressors, respectively, which was not a significant difference (p-value = 0.56). Collectively, these data show that in this cohort, slow progressors trended to higher intra-patient sequence diversity compared to progressors although the differences did not reach statistical significance.

Nucleotide substitution rates in study entry and exit in slow progressors and progressors
To examine the evolution of the envelope gene over the study period, we calculated the rate of nucleotide divergence for each patient's env sequences. On average the nucleotide substitution rate was higher in the progressors (1.2 ×10 -2 nucleotide substitutions/site/year; range 6-17 ×10 -3 ), compared to the slow progressors (3 ×10 -3 nucleotide substitutions/site/year; range 0.1-7 ×10 -3 ), but did not differ significantly (p = 0.12). The nucleotide substitution rate appeared to follow the viral load pattern, such that there was a positive but non-significant linear correlation between divergence (nucleotide substitution rate) and the log 10 viral load (p = 0.12) -data not shown.

Heterogeneity of diversity in Env in slow progressors and progressors for the variable and constant regions
To assess whether there were overall differences in diversity between regions of env at study entry and exit, we analyzed distinct regions of the env gene separately  and compared diversity scores between the slow progressors and progressors for the five variable loops, three constant regions and gp41 over time as seen in Figure 3A. Significant diversity differences between slow progressors and progressors were noted for the C2 (p = 0.004), V3 (p = 0.01) and C3 (p = 0.005), with differences remaining significant for C2 and C3 even after applying Bonferroni correction for multiple comparisons (≤ 0.006). There was no significant difference in overall inter-patient percentage diversity between slow progressors and progressors for V1 (p = 0.12), V2 (p = 0.09), V4 (p = 0.29), C4 (p = 0.13), V5 (p = 0.08) and gp41 (p = 0.40).
Next, we assessed the differences in inter-individual env diversity patterns across env for study entry and exit time-points. The results of this analysis are summarized in Figure 3B for slow progressors and Figure 3C for progressors. There were no significant differences between the early and exit time-point intra-patient diversity for either of the groups in any of the regions.

Length polymorphisms and glycosylation patterns for the variable and constant regions
Overall length of certain regions and changes in the number of N-linked glycosylation sites (PNGs) in Env have been shown to influence the sensitivity or resistance of the virus to antibody neutralization and may also influence efficiency of interactions with receptors on the cell surface [7,40]. However, these characteristics have not been comprehensively analyzed for HIV-1 subtype C and most studies have focused on the V3 loop, which is an important but not exclusive determinant of viral tropism and cell entry [41]. We sought to determine whether Env sequence characteristics are associated with disease progression in HIV-1 subtype C. Table 1 depicts Env region length polymorphisms and numbers of PNGs in slow progressors and progressors over time. Mean V1-V2 length for progressors and slow progressors was 66 amino acids and 69 amino acids respectively (Table 1) but this difference was not statistically significant (p = 0.32). Similarly, we observed no differences in C4-V5 amino acid length (p = 0.29) or PNGs (p = 0.15), and length polymorphism for C2-V3 showed no significant difference between the groups. However, a significant difference was noted in the overall number of PNGs in C2-V3 between slow progressors and progressors (p = 0.009), a result that remained significant after Bonferroni test correction (p < 0.01). For C3-V4, slow progressors had a significantly higher mean of 85 (range 81-90) compared to 82 (range: 76-88) amino acids in progressors (p = 0.02), however analysis of PNGs indicated no difference between the groups (p = 0.96). Interestingly, there was a significant difference overall between the groups in the numbers of PNGs for C3 only in the progressors compared to the slow progressors (p = 0.0006) (data not shown). V1-V4 length overall was significantly different, with slow progressors displaying longer V1-V4 length of 286 amino acids (range 282-294) compared to progressors' 281 (range 276-292; p = 0.009). In contrast, we found that the numbers of PNGs for V1-V4 overall was significantly higher with a mean of 22, (range 20-23) in progressors compared to a mean of 20 (range 19-21) in slow progressors (p = 0.02). Gp41 length was significantly higher in progressors (range 245-252) compared to slow progressors (range 239-252; p = 0.02) ( Table 1). However, the number of PNGs in gp41 in slow progressors (range 3-5) was statistically different from those of progressors (range 2-4 PNGs; p = 0.02).

Positive selection pressure
The dN/dS (ω) ratio reflects non-synonymous (dN) substitutions to synonymous (dS) substitutions per codon site, with a value of >1 at any site indicating positive selection pressure [42]. The ω values for the whole of gp160, as well as the variable and constant regions within envelope, were calculated using the M1a and M2a models implemented in CODEML. The settings for the M1a (neutral) model were: model = 0, NSsites = 1, and for the M2a (selection) model were: model = 0, NSsites = 2. A Likelihood Ratio Test (2ΔlnL) was performed between the likelihood scores of the M1a (null) vs. M2a (alternative) models. A χ 2 test was performed using two degrees of freedom [34]. For V1, the M2a (selection) model was supported only in the slow progressors (p < 0.005). For V2 and V3, the null hypothesis (M1a) could not be rejected for both slow and typical progressors (p = 0.25), while the M2a model was supported for all remaining envelope regions (p < 0.005) for both groups.
Analysis of the entire Env gp160 in the two groups using CODEML and the SLAC option in HYPHY identified 9 common sites under positive selection in slow progressors and 5 sites in progressors. In slow progressors ( Figures 4A and 4B ; it has been previously reported that changes within this region may confer autologous antibody neutralization resistance [19]. For progressors (Figures 4C and 4D), 4 of 5 positively selected sites were located in gp41 (codons 607, 612, 641 and 821), while the remaining site, codon 350, was  Figure 3B Box and whisker plots of intra-patient diversity analysis for slow progressors for different regions of the Env gene for study entry and study exit. Figure 3C Box and whisker plots of intra-patient diversity analysis for progressors for different regions of the Env gene for study entry and study exit. located in the α-2-helix of C3 immediately downstream of V3. Two of the sites under positive selection in the progressors were either adjacent to, (codon 612) or located at a putative N-linked glycosylation site (codon 641).
One additional site identified using CODEML, codon 671, is located at a linear epitope NWFNIT, which is within the membrane proximal external region (MPER) of gp41, an epitope that is well recognized by a broadly neutralizing antibody (4E10) [43].

Signature sequence differences between slow progressors and progressors
To identify key differences between the groups, consensus sequences of slow progressors and progressors study entry and exit were generated in VESPA using an 80% threshold (i.e. sequence differences were in >80% of the sequences). Signature differences were noted at 6 amino acid positions between the progressors and slow progressors consensus sequences. Four of six of these differences occurred in gp41 (codons 607, 727, 770 and 837), and the remaining two were at codons 80 and 133. No signature differences were noted between the entry and exit time points within each group.
Except for an N to S/D mutation in the progressors at codon 80, which resulted in the gain of a casein-kinase-2 (CK2) phosphorylation site at codons 77-80, most of the signature changes were not at putative functional sites. Other changes, although not in the signature, but resulting in a change in putative functional sites in the progressors, are: a V to T mutation at codon 455 resulting in the gain of a myristoylation site at codon  451-456, a Q to K mutation at codon 665 (within the ALDSQWN epitope) resulting in the gain of a tyrosine kinase phosphorylation (TKP) site at codons 665-667, and an N to S mutation at codon 671 resulting in the gain of a CK2 phosphorylation site at codons 671-674 within the NWFDIT epitope. Interestingly, the loss of a putative N-linked glycosylation site in the progressors in the V4 region was compensated for by a gain of an Nlinked glycosylation site in the C3 region (codons 362-365). When these signature patterns were compared with the subtype B reference strain, it was noted that an L to V mutation at codon 800 in the subtype C signature sequences resulted in a loss of a putative leucine zipper (codons 793-814). Whether the gain or loss of putative functional sites influence viral pathogenesis needs to be confirmed with functional assays.

Discussion
In this study we aimed to identify env sequence characteristics that may distinguish progressors from slow progressors in a chronically HIV infected anti-retroviral naïve subtype C-infected cohort. We used a single Figure 4 Three dimensional structural illustrations of positions associated with positive negative and neutral selection. Locations were mapped onto a model of gp120 based on the X-ray structure of the gp120 core in complex with sCD4 and 21c Fab (3LQA.pdb) for slow progressors - Figure 4A and for progressors - Figure 4C. V1V2 and V3 loops were drawn onto the core for completeness. In the orientation shown, the cellular and viral membranes would be located above and below the protein respectively. Figure Figure 4D. Blue indicates strongly negatively selected positions (<-3). Purple and purple arrows denote changes in putative functional sites as shown in Figures 4B, 4C and 4D. Spheres indicate signature sequence differences. It should be noted that the gp120 core crystal structures which were modeled on the 3LQA.PDB structure, include amino acid residues from HXB2 position 86-491. The gp41 structure based on 1ENV. pdb includes amino acid residues from HXB2 position 541-662. Therefore all the positively and negatively selected sites are not indicated on the gp120 and gp41 structures.
genome amplification approach in order to accurately and comprehensively represent the diversity of viral quasi-species. Several indicators of evolutionary forces were used to elucidate putative differences between the groups including heterogeneity of envelope sequence diversity, Env length polymorphisms, numbers of PNGs, positive selection, and signature sequence characteristics.
Our study suggests that regions of Env are shaped by different evolutionary forces which may in turn leave viral sequence footprints that may distinguish slow progressors from progressors in chronic HIV-1 subtype C infection. It has previously been shown that in subtype B infection there may be Env region-specific differences in evolutionary forces between those with high versus low viral loads [9]. Our study demonstrated a non-significant trend towards increased intra-patient diversity in slow progressors, a finding consistent with other studies on HIV disease progression [44][45][46]. In contrast, a study of primary HIV-1 subtype C infection has found that increased envelope diversity is inversely correlated with CD4 T cell counts and is associated with rapid disease progression [47]. Together, these results may imply that evolutionary forces that drive HIV-1 subtype C diversification differ according to the phase of infection. On close examination of the envelope regions we found that diversity in C2, V3 and C3 was higher in slow progressors compared to progressors suggesting co-evolution of these regions. These findings are consistent with findings from other studies [48,49]. From a functionality standpoint it appears that, because the V3 loop is very important for viral entry, increased diversity in this region is a correlate of viral attenuation [24].
Length polymorphisms in the constant and variable envelope regions may also contribute to structural diversity in terms of glycan packing and protein folding of the virion structure. An unusual finding was that the longer V1-V4 in slow progressors had fewer PNG's whereas the longer gp41 domain contained fewer PNGs in progressors. Several studies have shown the association between neutralization sensitivity and shorter V1-V4 length [50,51]. In contrast, other studies have shown longer V1-V4 with extensive glycosylation mask neutralizing antibody sensitive epitopes in subtype C [6]; however, in subtype B no such association was found [52]. Our observations may imply that longer length regions may be masking neutralization sensitive epitopes as suggested by Gray et al. [47]. Additionally in progressors, a loss of a glycan in V4 was compensated for by a gain in a PNG within C3, implying a shifting glycan shield as suggested previously [7].
High dN/dS ratios indicative of strong diversifying selection due to humoral immune pressure [42], occurred mainly within gp41 in progressors, while slow progressors had a number of regions targeted. This suggests that the nature of antibody targets may differ between the groups. Interestingly, both groups had positive selection in the α-2-helix within C3. It has been suggested that, because the V4 loop is shorter in subtype C than in subtype B, the α-2 helix is more exposed and more antigenic [49,53,54]. Interestingly, position 607 of gp41 was positively selected in progressors and was also a signature sequence difference between progressors and slow-progressors, indicating that there may be putative humoral immune pressure driving escape at that position. Additionally, gp41 in progressors showed differences at two putative antibody sites. Firstly, ELDK-WAS was recognized by neutralizing antibody (nAb) 2F5, where DKW are the sentinel amino acids that determine sensitivity to 2F5 [43]. This appears in the majority of the slow progressors's sequences; however, it is substituted by DSW in all the progressors indicating a loss of a putative antibody recognition site. In addition there is a sequence change from Q at position 665 to K, making the overall progressor sequence ALDSWKN. Secondly, an N to S change at codon 671, which is within a linear epitope-NWFNITthat is recognized by nAb 4E10, may result in a loss of this recognition site. In addition, this codon was positively selected for in the progressors. The effect of the loss of these putative recognition sites during chronic disease progression is unknown. We propose that the high antigenic stimulation in progressors may elicit antibodies whose antiviral effectiveness may be limited. Together these results may imply that the virus uses multiple strategies to evade the immune system, including increased V1-V4 amino acid length, increased numbers of PNGs, and specific mutations resulting in the virus gaining selective advantages. Essentially, the cat and mouse game that persists during chronic infection as a result of the dichotomy between antigenic stimulation and immunological response, which impacts and influences viral characteristics, needs further investigation.
The limitations of the study are that firstly, we do not know the exact time of infection for these subjects. Therefore stratification of study subjects as progressors or slow progressors relied on short-term (19.5 months) follow-up immunological data, which may be an unrepresentative snap-shot of the entire natural history of disease progression for these participants. However, this concern was somewhat allayed by bioinformatic analysis of the study sequences that showed that consistent with the stratification, progressors in this cohort were more likely to have been infected for shorter period of time than slow progressors. Second, the sample size of the study cohort was relatively small, which may have limited our statistical power to identify differences. Third, we had a limited number of SGA-generated amplicons for one of the study participants in particular, due to their low viral load and sample volume limitation. In addition, many more env amplicons were generated than were included in the final analyses as some of the amplicons had sequences with stop codons. Fourth, although the slow progressors and progressors differed in markers of disease progression at study exit, more stringent selection criteria could potentially identify additional significant differences. Overall, therefore, the findings reported here will require duplication in larger cohorts with longer periods of follow-up and more significant differences in immunological and virological outcomes.

Conclusions
The dynamics of HIV-1 env evolution between chronic slow progressors and progressors are distinct. Single genome sequence analysis of circulating viruses in slow progressors and progressors indicate that diversity, Env length polymorphisms, sites under positive selection pressure, and PNGs consistently map to specific regions in slow progressors or progressors. Varied diversity across the env genome, the relationship between amino acid length, number of PNGs or sites under positive selection may provide further insight to the intrinsic differences between the viruses from both groups and the influence of the host's selective pressures which may be used to inform more effective vaccine design.