HIV-1 envelope glycoprotein signatures that correlate with the development of cross-reactive neutralizing activity

Background Current HIV-1 envelope glycoprotein (Env) vaccines are unable to induce cross-reactive neutralizing antibodies. However, such antibodies are elicited in 10-30% of HIV-1 infected individuals, but it is unknown why these antibodies are induced in some individuals and not in others. We hypothesized that the Envs of early HIV-1 variants in individuals who develop cross-reactive neutralizing activity (CrNA) might have unique characteristics that support the induction of CrNA. Results We retrospectively generated and analyzed env sequences of early HIV-1 clonal variants from 31 individuals with diverse levels of CrNA 2–4 years post-seroconversion. These sequences revealed a number of Env signatures that coincided with CrNA development. These included a statistically shorter variable region 1 and a lower probability of glycosylation as implied by a high ratio of NXS versus NXT glycosylation motifs. Furthermore, lower probability of glycosylation at position 332, which is involved in the epitopes of many broadly reactive neutralizing antibodies, was associated with the induction of CrNA. Finally, Sequence Harmony identified a number of amino acid changes associated with the development of CrNA. These residues mapped to various Env subdomains, but in particular to the first and fourth variable region as well as the underlying α2 helix of the third constant region. Conclusions These findings imply that the development of CrNA might depend on specific characteristics of early Env. Env signatures that correlate with the induction of CrNA might be relevant for the design of effective HIV-1 vaccines.


Background
The identification of the human immunodeficiency virus type-1 (HIV-1), as the causative agent of the acquired immunodeficiency syndrome (AIDS), initiated a longterm but as of yet unsuccessful search for an effective and safe HIV-1 vaccine. Ideally a vaccine should elicit both humoral and cellular immunity [1]. In particular the induction of cross-reactive neutralizing activity (CrNA) that is capable of neutralizing HIV-1 variants from different subtypes is appealing. Unfortunately, this has shown to be a difficult hurdle, and none of the vaccine candidates tested to date have been able to induce this type of humoral immunity [2][3][4]. The moderate efficacy against acquisition of infection in the RV144 vaccine trial [5], in which V1V2 IgG antibodies correlated inversely with the rate of infection [6], has lead to optimism in the HIV-1 vaccine field. In this trial, only very low titer, tier 1 neutralizing antibodies (NAbs) were detected and efficacy may improve with a vaccine that is capable of eliciting a crossreactive neutralizing humoral immune response.
The HIV-1 envelope glycoprotein complex (Env) mediates entry into host cells and is the sole target for NAbs [7,8]. A functional Env consists of heterotrimers of three gp120 subunits, non-covalently linked to three gp41 molecules that anchor the Env spike in the viral membrane. Gp120 has five conserved (C1-C5) and five variable regions (V1-V5) [9,10]. The conserved regions form the core of the protein and are crucial for binding to the CD4-receptor on target cells [11][12][13]. The variable regions are highly diverse in sequence as a consequence of high replication rates, recombination, deletions, insertions and mutations [14]. Gp120 and gp41 contain 20-35 and 3-5 potential N-linked glycosylation sites (PNGS), respectively. N-linked glycans compose approximately half of the molecular mass of gp120 [15] and are required for correct protein folding, binding to lectin receptors on immune cells, as well as for immune evasion [16][17][18][19][20].
Within weeks to months after primary infection, HIV-1 Env specific antibodies appear [21] that are generally limited in their neutralizing activity and restricted to early autologous viruses [22][23][24][25]. The first antibodies are directed against gp41, predominantly to the immunodominant epitope, followed by non-NAbs against gp120, which in clade B-infected individuals often target the V3 region [26]. Subsequently, weakly neutralizing V3 antibodies capable of neutralizing heterologous tier 1 HIV-1 isolates appear [27] followed by NAbs with other epitope specificities from which the virus rapidly escapes through sequence changes in the variable loops and an increasing number of PNGS. Specifically, an increase in length and number of PNGS of the V1V2 region plays a role in HIV-1 resistance to NAbs [22,[28][29][30][31][32].
The extensive glycosylation of gp120, amounting tõ 50% of its molecular weight, was long thought to contribute to an immunologically "silent face" and serve as a "glycan shield" [8,73]. The recent identification of many glycan-dependent bNAbs shows that the silent face may not be so immunologically silent after all, and that this shield can be penetrated and/or used by bNAbs. The epitope cluster on gp120 which is targeted by the bNAbs PGT121-130, PGT135 and 2G12 [60][61][62]67,74], requires a specific glycan at position 332, although there is a difference in how the different bNAbs approach this glycan. In addition, neutralization activity in serum of infected humans and macaques, which developed CrNA, is dependent on recognition of the epitope involving this glycan at position 332 [36,75,76]. Interestingly, Moore et al. recently showed that the 332 glycan was absent on the Env of early viruses from two clade C infected individuals but emerged as a means of escape from autologous neutralizing responses, thereby creating 332 glycan-dependent bNAb epitopes [77]. For these reasons it seems that the glycosylation at position 332 plays a substantial role in the development of CrNA.
Although the existence of bNAbs in natural infection is testimony that the native Env spike can elicit bNAbs, it remains unknown why this occurs in only 10-30% of HIV-1 infected individuals. Here, we hypothesized that specific properties of early Envs could contribute to the development of CrNA. Information on such properties would obviously be valuable for vaccine design aimed at generating similar CrNA [78]. To test our hypothesis, we retrospectively examined env sequences from early HIV-1 clonal variants in clade B infected individuals that developed diverse levels of CrNA later on during infection. We found that CrNA development correlated with early HIV-1 variants with shorter V1 regions, lower probability of glycosylation, and specific amino acid usage. These properties might open up avenues for vaccine design.

Results
Short V1 sequences correlate with the development of cross-reactive neutralizing activity To study Env determinants that may influence the induction of CrNA, we retrospectively generated env sequences from early HIV-1 variants in 31 individuals who at 2-4 years post-SC had diverse levels of CrNA in their serum ( Figure 1) [35,79]. We chose this experimental setup because contemporaneous viruses usually already have escaped from CrNA [25,34] and early viruses are proposed to be a major determinant for the induction of CrNA [80]. The individuals were matched for time between SC and CrNA measurement, time between SC and clonal virus isolation, CD4 + T cells/μl blood at set-point and viral load at set-point (Additional file 1: Figure S1). Data on HIV-1 neutralizing activity in serum were available from previous studies (n = 292) [35,40,79]. In short; sera were tested by Monogram Biosciences [81] for cross-reactive neutralizing activity in a pseudovirus assay involving six tier two viruses with Envs from primary isolates of HIV-1 subtypes A (94UG103), B (92BR020 and JRCSF), C (93IN905 and MGRM-C-026) and CRF01_AE (92TH021). This six viral panel covered 93% of the variation in neutralization of a larger pseudovirus panel (n = 15) [39]. It has been shown that classification of CrNA in sera, as determined on this six virus panel, was highly correlated with CrNA determination on a larger 23 virus panel [35]. The geometric mean IC 50 titers in the sera of the selected individuals against a panel of 6 HIV-1 variants varied from 20 to 297; with an average of 98 (see Table 1 for details). Phylogenetic analysis of all sequences, using either neighbour-joining or maximum-likelihood methods, revealed clustering of sequences per individual, excluding contamination, but, clustering of HIV-1 Env sequences from individuals with similar geometric mean IC 50 titer was not observed (Additional file 2: Figure S2). We observed that the geometric mean IC 50 titer in serum correlated weakly with the mean length of V1 (( Figure 2B, r = −0.36; p = 0.049) but not with overall mean gp160 length (Figure 2A), nor with the total length of either the variable or the conserved regions (data not shown, r = −0.21; p = 0.25 and r = −0.052; p = 0.80, respectively).
Lower probability of overall glycosylation correlates with the induction of cross-reactive neutralizing activity We did not observe a correlation between the geometric mean IC 50 titer in serum, and the mean total number of PNGS in gp160 ( Figure 2C) or the mean number of PNGS in the individuals variable or constant regions (data not shown). It has been shown that NXT motifs have a two to three times higher probability of becoming glycosylated than NXS motifs [82,83]. The total number of PNGS may therefore not entirely reflect the actual extent of glycosylation of a given Env molecule. Interestingly, the mean number of NXS acceptor motifs relative to the mean total number of PNGS correlated positively with the geometric mean IC 50 titer in serum ( Figure 2D, r = 0.41; p = 0.037), while the opposite was true for the mean number of NXT acceptor motifs relative to the mean total number of PNGS. Thus, a higher relative number of NXS over NXT motifs, i.e. less probability of glycosylation was associated with the development of CrNA. However, we note that we cannot make statements on the actual glycosylation of individual NXS and NXT motifs.
Lower probability of glycosylation at position 332 correlates with the induction of cross-reactive neutralizing activity We noted that the glycosylation motif at position 332, which is important for a number of bNAb epitopes [52,[59][60][61][62]74], is always of the NXS type, and may thus not always be occupied by a glycan. In particular when the acceptor motif is NXS, the amino acid at position X is also important for determining the probability of glycosylation [84]. In all our sequences, the second position was occupied by either a leucine or an isoleucine. We found that individuals who developed CrNA had significantly more often an NLS motif at position 332, while individuals who did not develop CrNA had more frequently an NIS motif ( Figure 3A, p = 0.032). Studies with rabies virus glycoprotein showed that an NLS motif has a two times lower probability of becoming glycosylated compared to an NIS motif [84], which might imply that a lower probability of N-glycan attachment to N332 is associated with the development of CrNA. There is no straightforward way to directly assess glycan occupancy of position 332 on virus isolates, but we can study this indirectly by investigating the neutralization sensitivity to bNAbs that are dependent on the presence of a glycan at position 332, such as 2G12 and PGT126.
Thus, we tested the sensitivity of the viruses from our infected individuals against 2G12 and PGT126, and divided them into two groups, for the presence of NLS or NIS at position 332. For 2G12 we excluded viruses that lacked one or more of the essential 2G12 glycans 295, 332, The number of viruses neutralized, at least 3-fold higher than the IC 50 of the same serum with MLV, from the panel of 6 heterologous viruses from different subtypes.  [61,62,85] (i.e. the selected viruses should in theory all be sensitive to 2G12). We defined resistance as >50% infectivity at the highest concentration of bNAb (25 μg/ml and 5 μg/ml for 2G12 and PGT126, respectively; Figure 3B&C). We found that 7 out of 21 viruses (33%) with an NLS motif were resistant to 2G12, while only 2 of 14 viruses (14%) with an NIS motif were resistant ( Figure 3B). Furthermore, we found that 4 out of 30 viruses (13%) with an NLS motif were resistant to PGT126, while none of the viruses with an NIS motif were resistant ( Figure 3C). All NLS containing viruses that were resistant to PGT126 were also resistant to 2G12. Although these differences did not reach statistical significance they are consistent with a lower probability of glycosylation of NLS motifs, of which the presence is associated with the development with CrNA.
Sequence Harmony identifies specific amino acids that correlate with the induction of cross-reactive neutralizing activity The Sequence Harmony (SH) method ( [86] and www.ibi. vu.nl/programs/seqharmwww) was used to identify amino acid differences in Env between individuals who developed CrNA and who did not (Table 2). Low SH-scores indicate positions where the amino acid composition is different between the two groups; a score of 0 indicates that the amino acids at a given position are completely different between both groups, while the maximum score of 1 indicates that the amino acid compositions are indistinguishable. In addition, an empirical Z-score is calculated, reflecting the significance of the SH score obtained based on 100 random shuffling events of the sequences between the two groups. The SH analysis yielded 39 sites with a SH score below the defined cut-off values, indicating differences in amino-acids present at these sites between the CrNA and non-CrNA groups, all with high (negative) Z-scores indicating high significance ( Table 2). A phylogenetic analysis showed the absence of sequence group clustering (Additional file 2: Figure S2), excluding spurious findings based on phylogeny. Individual sequences were weighted in the calculation of the SH-scores, such that each individual had equal weight (see Methods for details). For comparison, we repeated the analysis with a single consensus sequence per individual. This yielded essentially identical scores (the correlation between SHscores and Z-scores is both r = 0.98, and p < 0.0001), but less detailed information on the specific amino acid changes (Additional file 3: Table S1 and Additional file 4: Figure S3). For the variable domains we used a cutoff of SH <0.7 and for the conserved domains a more relaxed cutoff of SH <0.85. This extended (higher) cut-off is necessary to find small differences within the conserved regions Structural mapping of the Sequence Harmony results reveals clustering of amino acids that are associated with the induction of CrNA To better understand the impact of the results gained with SH, we mapped the positions that differed significantly on the three-dimensional structure of Env ( Figure 4). A number of gp120 crystal structures were combined, as described in detail in the Methods section, to generate a structure of the Env spike that contains all gp120 residues ( Figure 4A and B). Gp41 was not included as it contains only one position that was identified by SH (at position 621). Although this model may not be an accurate representation of the Env spike with correct positions and conformations of the variable domains, it is useful to visualize the residues identified by the SH analysis. The carbon trace is indicated in red, while the residues selected by the SH analysis are shown in yellow space filling. Mapping of the residues identified by SH on the 3D structure of Env reveals that the residues that differ between the Envs, which are derived from individuals that did develop and those that did not develop CrNA, cluster together in specific domains. First, a number of residues cluster in the V1 ( Figure 4C and D). Second, a large cluster includes a major portion of the V4 as well as part of the underlying C3. Looking into these positions in more detail it is evident that a number of C3-residues located on the side of the α2 helix that faces the V4 are different between the two sequence groups and those residues that differ in the V4 are all close to the α2 helix ( Figure 4E and F). A third cluster of residues can be observed in the gp41-interactive domain, including residues in C1 and C2, although these residues are more scattered throughout the domain,  compared to those in the first two clusters. It is also obvious that some large regions in the protein do not contain any changes, most notably the CD4BS and the V2. Only one change is present in the bridging sheet, and also only one in the V3. The residues identified by SH can be further subdivided into thirteen subclusters based on their proximity to each other (<9 Å; Additional file 5: Table S2).

The induction of cross-reactive neutralizing activity does not correlate with sensitivity to bNAbs
We tested whether the early HIV-1 variants from individuals who developed CrNA were more sensitive to neutralization by bNAbs covering all known epitope clusters (gp120 outer domain: 447-52D, 2G12, PGT121 and PGT126; CD4BS: b12 and VRC01; quaternary epitopes: PG9, PG16 and PGT145; MPER: 4E10 and 2F5). We did not observe that the viral variants of individuals who developed CrNA were more sensitive to neutralization by different bNAbs compared to viruses from individuals who did not develop CrNA ( Figure 5). We also analyzed the neutralization data for each bNAb per individual and we observed that the neutralization pattern was not dependent on the level of CrNA (Additional file 6: Table S3). We next tested the sensitivity of these viruses to three different polyclonal HIVIg pools in which multiple epitope specificities should be present. Interestingly, early HIV-1 variants of individuals who developed CrNA showed a trend towards being more sensitive to neutralization by polyclonal HIVIg compared to early viruses from individuals who did not develop CrNA ( Figure 5), which could be consistent with them displaying a more open Env structure. However, this was only statistically significant for one of the three HIVIg pools (p=0.037).

Discussion
In the Amsterdam Cohort Studies on HIV infection and AIDS (ACS) of men who have sex with men (MSM) infected with HIV-1 subtype B, the prevalence of CrNA at 2-4 years post-SC is around 33% [40,79]. This roughly corresponds to what has been found in other cohorts [33,[37][38][39]. In this study we aimed to identify differences in the properties of early Env proteins in 31 HIV-1 infected individuals in regard to the development of CrNA. We observed that a short V1 region and a lower probability of N-linked glycosylation at PNGS correlated with the development of CrNA later in infection. More specifically, a lower probability of glycosylation at position 332 was associated with the development of CrNA. However, we note that the studies on the glycosylation efficiency of NXS/T motifs have been performed with a non-HIV protein [84] and the probability of glycosylation is likely to be context dependent. We can, therefore, not make definitive statements on the actual glycosylation of individual viruses or groups. Although some of the observed associations were marginally significant, and require confirmation in independent studies, they are in line with the hypothesis that HIV-1 Envs from individuals who develop CrNA have a more open structure, which allows for increased accessibility of bNAb epitopes, and possibly more efficient induction of bNAbs. To confirm this hypothesis we conducted a neutralization assay with early HIV-1 clonal variants from only the individuals who did or did not develop CrNA and found that early viruses of individuals who developed CrNA were indeed more sensitive to neutralization by one polyclonal HIVIg sample (but not by two other HIVIg samples), although we were not able to map this to specific bNAb epitopes. We next used SH for a more detailed analysis of amino acid differences at specific positions in Env and their role in the induction of CrNA. This resulted in the identification of 39 Env residues which differed significantly between the two groups, based on defined SH-scores and z-scores. These residues showed a remarkable clustering in V1, C3-V4, and somewhat more diffusely in C1-C2. In addition, a few individual changes were observed: one in V3, one in C4, three in the V5, and one in the gp41 ectodomain. Strikingly, we did not find any residues in the V2 domain, in the CD4BS, or in the triple layered structure that translates CD4 induced conformational changes to gp41 [88]. We note that the The z-scores display the accuracy of a given result; the more negative the z-score the less likely it is that the results was found by chance. d The 'consensus string' shows the residues found in each group on that specific site ordered by frequency were a lower case letter is present less than half the amount of the most abundant residue (upper case differences we found might be specific for individuals who are infected with HIV-1 subtype B, and should be reproduced in other non-subtype B cohorts. Our data revealed an inverse correlation between V1length and the development of CrNA. Two previous studies have looked for Env signatures associated with the presence of CrNA, but these studies used contemporaneous samples for Env analysis and assessment of neutralization breadth and potency, and it is not likely that the contemporaneous Envs represent the Envs that induced the CrNA [89,90]. Nevertheless, in the first study, involving clade C infected individuals, short V1V2 domains correlated with the presence of CrNA, but the V1 and V2 were not studied separately [90]. In the second study, involving individuals infected with different clades, short V2 and short V5 regions correlated with the presence of CrNA in the sera [89]. Again, the caveat in these studies is that the Env sequences and sera were derived from the same time point during chronic infection; therefore it cannot be excluded that the Env characteristics observed were a consequence rather than a cause of the CrNA in serum, due to viral escape from the CrNA. In contrast, we studied the Envs of early viruses (median 5 months post-SC), before the presence of CrNA, and related the characteristics to the development of CrNA later in infection. It is arguably more likely that the observed Env signatures we identified indeed contributed to the shaping of the CrNA. Although other choices could have been made in the timing of our sampling, Moore et al. showed that transmitted/founder viruses raise autologous neutralization, which triggers the evolution of escape variants around month 6 (roughly around the same time as our sampling of env sequences), which in turn induce CrNA [77].
The SH-analysis confirmed a role for V1, but in addition to the deletions/insertions (at positions 141, 145 and 146) identified a number of other amino acid changes in V1 that were associated with the development of CrNA (at positions 134-140, 142 and 151). It has long been known that V1 and V2 are not absolutely required for function [10,91], but that they play an important role in resistance to antibody neutralization [28,31,[92][93][94][95][96]. On a population level the continuous neutralizing antibody-driven evolution of V1V2 during the pandemic appears to have resulted in the elongation and increased glycosylation of the V1V2 domains [97]. It may therefore not come as a surprise that short V1 domains, with a specific amino acid composition, might affect the induction of CrNA.
We note, however, that we found no residues in V2 that differed between the two groups. This could suggest that V1 length and composition are more important determining factors in the induction of CrNA than V2 length and composition, at least in our study population.
Previous studies indicated that the Envs from transmitted/ founder HIV-1 variants contain fewer PNGS compared to Envs from viruses in chronic infection [98]. The increase in number of PNGS coincides with a decreasing sensitivity to autologous neutralization and is proposed to be driven by the pressure of neutralizing antibodies [8,22,29,30,32,99]. Furthermore, a reduced prevalence of CrNA in recently infected individuals compared to historic serum samples was associated with an increased number of PNGS in early Env sequences [97], suggesting that an increase in Env glycosylation over the course of the epidemic results in decreased induction of CrNA. Thus, an increased number of PNGS and a higher level of N-linked glycosylation could interfere with the induction of bNAbs, as these glycans may interfere with the binding of the B cell receptor to protein epitopes on Env. We did not observe a correlation with the absolute number of PNGS and the later presence of CrNA. However, we did observe that a lower probability of glycosylation at the PNGS in early HIV-1 variants was associated with CrNA presence later in infection. In previous Env sequence analyses, NXS and NXT PNGS motifs have always been treated equally, but NXS motifs are two to three times less likely to be glycosylated than NXT sequons [82,83,100]. Furthermore, when an NXS motif is present, the amino acid at the second position (the X) becomes highly relevant to determining the probability of glycosylation [84].
Zooming in on the NXS motif at position 332, the glycan which is critical for the binding of many known bNAbs [52,[59][60][61][62]74] and which was associated with neutralization escape from early strain-specific antibodies and induction of CrNA [77], we observed a correlation between the amino acid at the second position and the presence of CrNA in serum later in infection. Thus, individuals who developed CrNA had significantly more often an NLS motif at position 332 compared to the individuals who did not develop CrNA who usually had an NIS motif. An NLS motif is two to three times less likely to be glycosylated compared to an NIS motif [84], implying that a lower chance of glycosylation at position 332 is associated with the development of CrNA. The probability of glycosylation and how HIV-1 might use this is unchartered territory and should be considered in other studies on the role of Env glycosylation in the coevolution of the virus and the human immune system.
The association of a lower probability of glycosylation at position 332 with the induction of CrNA is counterintuitive when taking into account that many known bNAbs target this glycan [52,[59][60][61][62]74]. There are two possible explanations for this paradox. First, this glycan is not often targeted in our study population and its absence increases the accessibility of other bNAb epitopes. Second, it is not the presence of this glycan per se that facilitates the induction of bNAbs to this site, but the emergence of this site during infection, possibly by means of escape from Abs that targeted the surrounding region in the absence of the glycan. This later scenario was indeed observed in individuals infected with clade C Env [77]. The situation in our cohort was slightly different because the NXS motif at position 332 was present in all sequences, only the 2 nd position changed over time. Furthermore the absence of the PNGS at position 332 in transmitted subtype C was accompanied by the presence of a PNGS at position 334 [77], while we never observed a PNGS motif at position 334 in our sequences. The impact of the PNGS at one of these two positions, and its relation with subtype specific transmission and antibody escape and induction, requires further study in different cohorts.
The SH analysis identified multiple amino acid positions in the C3 and V4 region that differed significantly between individuals that did or did not develop CrNA. Inspection of a quaternary model of the Env spike showed that many of these positions form two subclusters on the α2-helix in the C3 region (HXB2 numbering 335-352) and the V4 loop ( Figure 4). One cluster contains 6 residues: 333 (which may modulate glycosylation at position 332; see above), 336 and 337 from C3, and 405-407 from V4. The second subcluster involves 5 residues: 343, 346 and 347 from C3, and 396 and 399 from V4. The C3 α2-helix interacts intimately with V4 and several of the differences found between individuals that did or did not develop CrNA might impact the interaction between the two subdomains. For example, it has been reported that interactions between residues 335 and 412, and 337 and 412 play an important role in the interaction of these two regions through electrostatic interactions [101]. In our SH analyses, the Env sequences of individuals who developed CrNA, position 337 is often occupied by a negatively charged amino acid (E), whereas this position is mostly positively charged (K) in individuals that did not develop CrNA. A similar charge reversal is observed for position 343 which interacts with a number of V4 residues. Thus, changes in electrostatic interactions may influence the interaction between C3 and V4. Moreover, Kirchherr et al. identified three amino acid substitutions in the V4 region (393G, 397G, and 413N) that were associated with greater neutralization potency and breadth [102]. Positions 393 and 397 are not observed in our SH-analysis but the neighboring residues, 396, 399 and 400, are. Position 413 is observed in our analysis and is part of a PNGS. Envs associated with CrNA more frequently had an Asparagine (N) at position 413. Gnanakaran et al. noted that the C3 α2-helix in clade B viruses is more hydrophobic and shielded from solvent by the V4 loop and glycosylation, while this domain is less hydrophobic, more exposed and more immunogenic in clade C viruses [101,103]. Collectively these findings imply that the way in which the V4 folds over and covers the α2-helix of C3 might affect the induction of bNAbs [77].
The impact of other residues highlighted by the SHanalysis is somewhat unclear. Several hits were found in C1 and C2, including the β-sandwich domain which is known to interact with gp41 [104]. The two sites identified by the SH-analysis at positions 461 and 463/464 in V5 are in close proximity to the CD4BS and might impact its accessibility. Two other sites which were significantly different between the two groups were at position 322 (V3) and 440 (C4). Position 322 is important for coreceptor usage, a conversion of a negatively charged residue to a positively charged residue is sufficient to switch a CCR5-using to a CXCR4-using virus [105,106]. All the sequences we used were derived from CCR5-using viruses (data not shown), ruling out any bias caused by differences in coreceptor usage. Interestingly Rosen et al. showed a very strong covariation between residues 322 and 440 that is influenced by charge [107], pointing at a potential functional electrostatic interaction between these two residues. If and how these changes contribute to the development of CrNA remains to be further studied.
We had hypothesized that early HIV-1 variants from individuals who later developed CrNA might have an increased sensitivity to bNAbs. We used multiple bNAbs covering four well known target epitopes, namely CD4BS (b12 and VRC01), gp120 outer domain (447-52D, 2G12, PGT121 and PGT126), quaternary V1V2 (PG9, PG16 and PGT145) and MPER (2F5 and 4E10). However, we did not observe a significantly increased sensitivity to neutralization by these bNAbs. One could argue that this lack of correlation is due to the fact that individuals with CrNA are analyzed as a group without considering the epitope specificity of their CrNA. Each individual might have developed different specificities, blurring the results for individual bNAbs. Sera of 14 of our individuals were previously analyzed for binding to a panel of gp120 core proteins and their corresponding CD4BS knockout mutants and showed the presence of CD4BS directing Abs [92], but we could not find a correlation between the binding of CD4BS Abs and neutralization sensitivity of early clonal virus variants to VRC01. Another explanation could be that the specificities responsible for CrNA in our individuals may be different from those of the monoclonal bNAbs we tested. We did observe differences in sensitivity to the polyclonal HIVIg: clonal viral variants from individuals who developed CrNA showed a trend towards being more sensitive to HIVIg might be consistent with a more exposed Env structure. In summary, we cannot conclude which bNAb epitopes were immunogenic in our study population.
In a previous study we showed that the kinetics of CrNA development coincided with the kinetics of the induction of autologous neutralizing response, which was directed against viruses isolated early after SC [34]. This suggests that the development of CrNA is driven by epitopes that are exposed on early viruses [80]. Alternatively, CrNA may be induced by early (or slightly later) HIV-1 variants that have already escaped from the initial autologous neutralizing response [77]. The latter observation is consistent with the hypothesis that HIV-1 escape variants selected by autologous NAbs, in turn, may elicit new NAbs with altered specificities that allow for broad reactivity [52]. We studied early Env variants from a relatively wide time span (2 to 14 months after SC), therefore we cannot rule out or support one of the above hypotheses.
Currently known bNAbs take years to develop and show accumulation of many somatic mutations, suggesting that their development is driven by years of antigen exposure and possibly reflecting the continuous co-evolution of HIV-1 and the immune response directed to it [51,52,80,108,109]. This would suggest a more prominent role for viral evolution and diversity in the development of bNAbs. Consistent with this, greater Env diversity early in infection was associated with greater NAb breadth later in infection in a cohort of antiretroviral therapy naïve Kenyan women that were mostly infected with subtype A [110]. In our present study, involving therapy naïve, subtype B infected MSM from the ACS; we did not observe a correlation between early Env diversity and the development of CrNA in serum (data not shown).

Conclusions
In summary, our current results show that sequence and structural characteristics of Env from early subtype B HIV-1 viruses may be associated with the development of CrNA in serum during infection. We observed that the presence of a short V1 and lower probability of glycosylation, specifically at position 332, were associated with the induction of CrNA. In addition, a number of amino acid changes that mostly clustered in V1 and C3-V4 correlated with the development of CrNA. Additional studies are needed to further clarify the role of these amino acids in the induction of CrNA, but the determinants for CrNA induction described here might facilitate the design of vaccines aimed at inducing bNAbs.

Individuals and viruses
Samples studied here were derived from participants of the ACS of MSM who were infected with HIV-1 subtype B. From all ACS participants, 292 were previously tested for the presence of CrNA in their sera at 2-4 years post-SC [79]. Sera were tested for CrNA on a panel of 6 heterologous viruses from different subtypes [39] and ranked based on their geometric mean IC 50 titer and on the number of viruses from the panel that were neutralized. For the present study we selected cohort participants for whom the date of seroconversion was known, who had a follow-up of at least 4 years, and were therapy naïve at the time of screening for CrNA. Moreover, clonal HIV-1 variants had to be available. These criteria were fulfilled by 32 individuals who seroconverted between 1984 and 1996 [35,40,79]. One of these individuals was defined as elite neutralizer, according to the definition of Simek et al. [39] with a geometric mean IC 50 titer of 782. We excluded this individual from our study as it is the subject of other follow-up studies, leaving us with 31 individuals fulfilling our criteria (Table 1).
Clonal virus variants were previously obtained from cocultures of peripheral blood mononuclear cells (PBMCs) from HIV-1 infected individuals and 3-day phytohemagglutinin (PHA) stimulated PBMCs from healthy donors [111,112]. As virus isolation by coculturing PBMCs from infected individuals with stimulated healthy donor PBMCs can result in selection of a variants that are more fit for replication in PBMCs in vitro, we used a protocol in which limiting numbers of PBMCs from infected individuals and stimulated healthy donor PBMCs were mixed in multiple parallel cultures. This allows for the isolation of multiple replication competent clonal variants, avoiding the outgrowth and loss of slowly replicating variants [111,112].
Moreover, to prevent sequence changes during in vitro culture, the number of passages in PBMCs was kept to a minimum. This method yields in clonal sequences that are very similar to sequences derived from single genome amplification [113]. The ACS are conducted in accordance with the ethical principles set out in the declaration of Helsinki, and written consent was obtained prior to data collection. The study was approved by the Academic Medical Center Institutional Medical Ethics Committee.

Gp160 sequence analysis
The HIV env genes from proviral-DNA isolated from PBMCs, that were infected in vitro with a single clonal HIV-1 variant, were PCR amplified and subsequently sequenced as described previously [114][115][116]. Complete gp160 sequences could be analyzed for 26 individuals, whereas only C1-V5 env sequences could be analyzed for individuals ACH19566, 19342, 18860, 18839 and 18829 (Table 1). Nucleotide sequences were aligned using ClustalW in the software package of BioEdit [117], and edited manually. The reference sequence HXB2 was included in the alignment to number each aligned residue according to the corresponding position in this reference. Genetic analyses were performed on gp160 sequences starting at nucleotide position 91, thereby excluding the Env signal peptide. The total length of the gp160 sequences and the separate regions were calculated by counting the number of amino acids. The number of PNGS and number of NXT or NXS motifs were identified using N-Glycosite [118] at the Los Alamos HIV database website (http://www.hiv.lanl.gov/content/ sequence/GLYCOSITE/glycosite.html). Sequences with double PNGS were counted by N-Glycosite as followed: NNSS as +1 NXS motif, NNTT as +1 NXT sequon, and NN[TS][ST] as +1 NXT motif. A proline at the second position (site pattern NP[ST]) is strongly disfavored for glycosylation and therefore excluded as a PNGS. Net electrostatic charges of gp160 were calculated by counting all charged amino acid residues per sequence, where residues R and K counted as +1, H as +0.293, and D and E as −1. Intra-individual genetic diversity of the complete Env sequences generated from the earliest time point were analyzed for 23 individuals, using the Kimura-2 parameter model of evolution in the MEGA 4.1 software package (http://www.megasoftware.net).

Phylogenetic analyses
A Maximum Likelihood (ML) tree was constructed with complete HIV-1 env gp160 sequences from 26 individuals with diverse levels of CrNA. The best-fit nucleotide substitution model (GTR+I+G), selected by hierarchical likelihood ratio test (hLTRs, Model Test 3.7 [119] was implemented in the heuristic search for the best ML tree applying TBR branch-swapping algorithm using PAUP*4.0 [120], starting with a Neighbor-Joining (NJ) tree constructed under the Hasegawa-Kishino-Yano (HKY85) model of evolution [121]. The robustness of the NJ phylogeny was assessed by bootstrap analysis with 1,000 rounds of replication.

Neutralization assay
Neutralization sensitivity to known bNAbs and polyclonal HIVIg pools was tested in a PBMC based assay, for a minimum of one and a maximum of five clonal virus variants per individual. PBMCs were isolated from buffy coats obtained from healthy seronegative blood donors and cultured as described previously [112]. The neutralization sensitivity was tested for the twelve individuals who developed CrNA and the nine individuals who did not develop CrNA. The individuals who developed CrNA were included in this group following the criteria of having a geometric mean IC 50 titer ≥ 90 in serum and the ability to neutralize ≥ 5 viruses from the panel of six heterologous viruses in a pseudovirus assay developed by Monogram Biosciences. The panel consisted of six pseudoviruses with envelope sequences from primary isolates of HIV-1 subtypes A (94UG103), B (92BR020 and JRCSF), C (93IN905 and MGRM-C-026) and CRF01_AE (92TH021) [35]. Individuals of whom the sera had a geometric mean IC 50 titer ≤ 45 in serum and neutralized ≤ 2 viruses from the panel of 6 heterologous viruses were included in the group that did not develop CrNA (Table 1). In total eleven broadly reactive neutralizing monoclonal antibodies (bNAbs) were tested; IgG1 b12 (kindly provided by NIBSC; EVA3065), 2G12, 2F5 and 4E10 (kindly provided by NIBSC; ARP3277, EVA3063, ARP3239), PG9 and PG16 (AIDS reagent program #12149 and 12150), VRC01 (AIDS reagent program #12033), 447-52D (kindly provided by NIBSC; ARP3219)) and PGT121, PGT126 and PGT145 (kindly provided by IAVI NAC repository); and three pools of polyclonal HIVIg sera: pool 1 (AIDS reagent program #3957 lot. nr. 12-100158), pool 2 (AIDS reagent program #3957 lot. nr. 14-120074), and pool 3 (AIDS reagent program #3957 lot. nr. 11-098130; NABI lot. nr. IHV-50-111 and lot. nr. IHV-250-114 [122]). HIVIg is a pool of concentrated antibodies from the blood from HIV-positive asymptomatic persons with high levels of HIV-1 specific antibodies.
From each virus isolate, a final inoculum of 20 50% tissue culture infective doses (TCID50) was incubated for 1 h at 37°C with each specific monoclonal antibody in threefold serial dilution. The starting dilution was 25 μg/ml for b12, 2G12, 2F5, 4E10 and 447-52D, and 5 μg/ml for PG9, PG16, VRC01, PGT121, PGT126 and PGT145. To test the neutralization sensitivity of the virus clones against HIVIg a final inoculum of 20 TCID50 was incubated for 1 h at 37°C in threefold serial dilution with a starting dilution of 1500 μg/ml. Subsequently, 10 5 PHA-stimulated PBMCs, derived from healthy blood donors, were added to the mixture. After incubation with HIVIg an additional washing step was performed with phosphate-buffered saline after 4 h of incubation at 37°C. On days 7 and 11, virus production in culture supernatants was analyzed with a p24 antigen capture enzyme-linked immunosorbent assay [123]. The percent neutralization was calculated by determining the reduction in p24 production in the presence of the bNAbs or HIVIg compared to the cultures with virus only. When possible, IC 50 s were determined by linear regression. When the analyses required the assignment of one IC 50 value per individual, we took the median of the IC 50 values of the individual virus clones. For calculations, viruses with IC 50 values below the lowest dilution or above the highest dilution were assigned the lowest or highest dilution value, respectively.

Statistical analyses
Statistical analyses on the gp160 sequence and the neutralization data were performed using Graphpad Prism v5.01. Differences between sequences (length, number of PNGS, NXS or NXT motifs and net electrostatic charge) were compared using a Spearman correlation test, with the geometric mean IC 50 titer across the six heterologous viral panel as the absolute x-value. Differences and correlations were considered statistically significant when P values were ≤ 0.05. In cases of normal distribution of the data we used the unpaired t-test (for example to compare the mean neutralization titers for the eleven known bNAbs and polyclonal HIVIg pools between individuals that did or did not develop CrNA). When neutralization titers were not distributed normally, a Mann-Whitney test was performed to compare the two groups. For each individual, the median IC 50 values of each bNAb or HIVIg were used. Difference in NIS vs. NLS glycosylation at position 332 and its correlation with the presence of CrNA was compared using a Fischer's exact test.

Multiple sequence alignment for Sequence Harmony
A multiple sequence alignment was performed on 91 Env sequences from 21 individuals, starting at nucleotide position 91, which excludes the Env signal peptide, with a minimum of one and a maximum of eleven sequences per individual. In total we obtained 58 sequences from twelve individuals who developed CrNA and 33 sequences from nine individuals that did not develop CrNA (Table 1). In order to create a Multiple Sequence Alignment of sufficient consistency in the variable regions, three of the most widely used tools, Praline, Muscle and Clustal [124][125][126], were used with a range of settings (different Pam/Blosum matrices, different gap open and gap extend penalties) and evaluated. None gave completely satisfactory results, so the final alignment was based on Praline with global pre-profiling and default settings. The variable regions were extensively adjusted manually using the Jalview alignment viewer and editor [126], by optimizing the alignment of sequence patterns such as the NXS/T glycosylation sites and guided by the conservation scores reported by Jalview, which are based on the conservation of physicochemical properties, without penalizing the introduction or elongation of gaps (Additional file 7: Figure S4).

Comparison of viral sequences with Sequence Harmony
The Sequence Harmony (SH) method ( [90] and www.ibi. vu.nl/programs/seqharmwww) was used to analyze amino acid differences between the env sequences of the twelve individuals who developed CrNA and the nine individuals who did not develop CrNA ( Table 2). The SH algorithm is an entropy-based method, which detects positions within an alignment that display compositional differences in related protein sequences divided in two groups, and might therefore be linked to functional differences. In addition, an empirical Z-score is calculated, reflecting the significance of the SH-score obtained based on 100 random shuffling events of the sequences between the two groups. For details see reference [86] and the online documentation on the web server (www.ibi.vu.nl/programs/seqharmwww). SH measures the overlap in distribution of amino acid types between two subgroups (A and B; in this case env sequences obtained from individuals who developed CrNA and individuals who did not) at a certain position (i) in the sequence alignment as follows: where p A i;x indicates the observed frequency in group A for amino acid type x at position i in the sequence, and p B i;x analogously for amino acid frequencies observed in group B sequences. The final SH score is calculated by SH i = ½ ( SH i A/B + SH i B/A ). Therefore, an SH score of 0 indicates amino acid positions that are specific for one of the sequence groups, whereas an SH score of 1 indicates a complete overlap at this amino acid position between the two groups. In our dataset, the number of sequences included per individual varies from 1 to 11, which makes the analysis biased towards individuals with larger numbers of sequences included. To adjust for this bias we extended the SH method by including a weight for each sequence. We assigned a weight w=1/N p to each sequence as the inverse of the number of sequences N p included for individual p. Each individual will therefore have the same impact on the total score in each group. Cut-off scores were set as SH<0.7 for the residues in the variable regions and <0.85 for residues in the conserved regions, i.e. a less strict selection in the conserved region to allow also small(er) differences to be detected. The lower (negative) the z-score, the less likely that the results were found by chance.

Structural analysis
SH yielded a list of amino acid differences in Env between individuals who did developed CrNA and who did not, but structural analysis informs us on the location of these positions in the protein structure. A structure of the complete Env trimer is not available. In order to assist the inspection of the positions that were different between the two groups as identified by SH, we built a model that contained all gp120 domains. We started with the gp120 structure from 3JWD [104], because it contains the N-and C-termini of gp120. To this structure we added the V1V2 loops from 3U4E [64] and using 1GCG [127] to obtain an overlap with the 3JWD structure; as well as the V3 and V4 regions from 2B4C [128]. These structures were overlaid using the 'super' and 'pair_fit' functions of PyMol (The PyMOL Molecular Graphics System, Version 1.5.0.4 Schrödinger, LLC). No additional loop modeling was performed, so the actual conformation of the V1V2, V3 and the V4 regions as presented should be considered an arbitrary visualization of the residues identified by SH. Note that the glycans are absent from this model. The sites selected by the SH analysis (i.e., the residues scoring below the cut-off, as described above) were mapped on the three-dimensional structure of the resulting model using PyMol. Additional file 2: Figure S2. Title of data: Genetic relationships between viruses from individuals with diverse levels of CrNA in serum. Description of data: Complete gp160 sequences derived from 26 individuals with varying levels of CrNA in serum were aligned and a ML tree was constructed. Individuals with high, intermediate and low CrNA are indicated in red, blue or green, respectively.

Availability of supporting data
Additional file 3: Table S1. Title of data: Sequence Harmony results with consensus sequences. Description of data: The Sequence Harmony (SH) method ( [90] and www.ibi.vu.nl/programs/seqharmwww) was used to analyze amino acid differences between the consensus env sequences of the twelve individuals who developed CrNA and the nine individuals who did not develop CrNA, for details see Methods. The SH algorithm is an entropy-based method, which detects positions within an alignment that display compositional differences in related protein sequences divided in two groups, and might therefore be linked to functional differences. In addition, an empirical Z-score is calculated, reflecting the significance of the SH-score obtained based on 100 random shuffling events of the sequences between the two groups. Cut-off scores were set as SH<0.7 for the residues in the variable regions and <0.85 for residues in the conserved regions, i.e. a less strict selection in the conserved region to allow also small(er) differences to be detected. The lower (negative) the z-score, the less likely that the results were found by chance.
Additional file 4: Figure S3. Title of data: Multiple sequence alignment of gp160 sequences from CrNA and non-CrNA individuals used for SH analysis. Description of data: Multiple sequence alignment of 91 complete gp160 sequences from 21 individuals, starting at nucleotide position 91, excluding the Env signal peptide, with a minimum of one and a maximum of eleven sequences per individual. In total 58 sequences from twelve individuals who developed CrNA and 33 sequences from nine individuals that did not develop CrNA, CrNA and non-CrNA respectively, are depicted.
Additional file 5: Table S2. Title of data: Subclustering of positions identified by SH. Description of data: Six sites are isolated in the structure (and sequence) and form a cluster of their own. Three small clusters are purely sequential. Another sequential cluster (T138, N139, T140, N141, S142) is not present in any of the crystal structures analyzed, but aligns with a similar region (T138→T135, N139→I136, N141→N137 in 3U4E) which makes it cluster with another residue (K151 at 4Å). Finally, there are four other clusters containing sequentially distant residues with on average slightly more than four residues per cluster. In one case this clustering is tight, with a distance around 3Å, the three other distances involved are around 6-7Å. One of these clusters consists of three sequential parts and involves one tight and one looser distance. Most clusters are restricted to within one of the V or C regions, except for the one that connects C1 and C2, and the two others that bridge C3 and V4. Almost all residues selected within the V4 region are in close contact with the α2-helix (336-353) of the C3 region.
Additional file 6: Table S3. Title of data: Median neutralization IC 50 titer per individual per bNAb. Description of data: Median neutralization IC 50 titer of the primary HIV-1 variants from 21 selected individuals to neutralization by eleven different bNAbs covering the four known epitope clusters, namely CD4BS (b12 and VRC01), gp120 outer domain (447-52D, 2G12, PGT121 and PGT126), quaternary V1V2 (PG9, PG16 and PGT145) and MPER (2F5 and 4E10), and to the three polyclonal HIVIg pools. Interestingly, early HIV-1 variants of individuals who developed CrNA were more sensitive to neutralization by polyclonal HIVIg pool 1 compared to early viruses from individuals who did not develop CrNA (p = 0.037), although this was found for the other two pools tested. Additional file 7: Figure S4. Title of data: Correlation of SH-scores and Z-scores. Description of data: (A) Correlation of the SH scores as found by Sequence Harmony; (B) Correlation of the Z-scores found by Sequence Harmony. Each amino acid difference in Env between individuals who did or did not developed CrNA is represented by one dot. The value on the x-axis represents scores found when using 91 Env sequences as described in Materials, and the value on the y-axis represents the scores found when using the consensus sequences of the 12 individuals who did and the 9 individuals who did not developed CrNA. Statistical analyses on the SH-scores and the Z-scores were performed using Graphpad Prism v5.01. Correlations between the two methods were compared using a Pearson correlation test. Differences and correlations were considered statistically significant when P values were ≤ 0.05.