In this study we have presented evidence suggesting that a unique HLA allele frequency distribution in a cohort of clade B-infected Mexican individuals has left unique footprints on HIV sequences at the population level. We studied HLA-mediated HIV evolution in a clade B-infected Mexican cohort, comparing our data with data from the IHAC cohort, the largest clade B-infected cohort used to assess HLA-mediated evolution so far, which is composed of individuals from Canada, Australia, and the USA  (Brumme ZL, John M, et al, PLoS ONE 2009, in press). The two cohorts were shown to present notably different immunogenetic backgrounds, with an important admixture of Amerindian genes in the population of the Central/Southern part of Mexico (Figure 4). These different immunogenetic backgrounds provided a chance to assess the role of different HLA allele distributions in HLA-mediated selection in two cohorts infected by viruses of the same clade. The present cohort was shown to reflect the typical characteristics of an HIV-infected Mexican cohort, enriched in individuals in relatively advanced stages of HIV disease and presenting a similar HLA allele frequency distribution to the general population (Figure 3), with some specific exceptions (e. g. B*39) that will have to be assessed further in future studies.
Previous studies have suggested that HIV evolution at the population level follows broadly predictable, highly conserved mutational patterns associated with host CTL selective pressure [16, 23, 24, 33–35]. Conclusions obtained from a direct comparison between different studies in different populations have been limited mainly due to the use of different methods and models for assessing HLA-mediated viral evolution in which important sources of confounding are frequently not accounted. We applied the recently described PDN model , which simultaneously accounts for HLA linkage disequilibrium, HIV codon co-variation and viral lineage effects, to clade B pol sequences (Additional file 1 : Figure S2) from the present cohort and compared the results to the immunogenetically distinct population of the IHAC cohort. Our data support the observations of highly conserved, universal, HLA-associated footprints in the HIV proteome at the population level, as many of the HLA – HIV codon associations found in the Mexican cohort have consistently been observed in the IHAC cohort, as well as in previous studies with diverse cohorts [24, 33–35, 37].
Interestingly, however, our data also suggest the existence of unique HLA-associated footprints in HIV, which could be influenced by specific HLA frequency distributions in different HIV-infected populations. The unique characteristics of HLA-mediated selection in the Mexican cohort was revealed not only by the presence of unique HLA-HIV codon pairs not detected in the IHAC cohort, but also by the presence of HIV positions previously identified as HLA associated, and with different HLA specificities and/or target amino acids in the two cohorts. The extent to which these unique HLA-associated footprints represent a real biological phenomenon and not a statistical effect will have to be further assessed with experimental data; nevertheless, evidence presented in this study strongly suggests the existence of real differences between the two cohorts. Although the Mexican cohort was much smaller than the IHAC cohort (the power to detect associations increases dramatically with sample size ), resulting in only 20% of the expected associations being confirmed in the present cohort, the fact that 53% of the HLA-HIV codon associations were novel in the Mexican cohort strongly suggests differences in HLA-mediated evolution between the two clade B-infected cohorts. Although these novel associations may represent false negatives from the IHAC cohort, that cohort is large enough that the false negative rate is expected to be quite small and any false negatives are likely to be rare events in it . It is also possible that the novel associations represent false positives in the present cohort; however, with an expected 20% false-positive rate due to the q < 0.2 threshold, the number of novel associations found in the Mexican cohort is striking. In addition, of the 18 novel HLA-HIV codon pairs in the Mexican cohort, at least 11 (61%) can be explained by confirmed or potential CTL epitopes (Figure 6), strongly arguing for the validity of these associations and for the existence of real biological differences in HLA-mediated selection between the two cohorts
The observation of point differences in the population consensus sequences of the two cohorts which were mapped to HLA-associated sites is a piece of evidence that further supports the differential impact of HLA selection in HIV evolution at the population level (Table 3). This was the case of position RT 277, associated with A*03 both in the Mexican cohort and in the IHAC cohort, in which the adapted form 277R has become fixed in the IHAC consensus while the susceptible form 277K has remained in the Mexican consensus. Not surprisingly, the frequency of A*03 was three times higher in the HOMER cohort than in the Mexican cohort (p = 7.08E-10, q = 1.00E-08), supporting an important role of HLA allele frequency in the fixation of HLA escape mutations at the population level (Figure 4, Table 3). Similarly, PR 93 was associated with B*15 in the IHAC cohort with the susceptible form 93I observed in this cohort's consensus, but the adapted form 93L observed in the Mexican consensus. Although no direct HLA association was detected in the Mexican cohort at this site (probably due to statistical power issues), position PR 93 was associated with other HIV sites, such as PR 71, which is HLA associated (Figure 6). Thus, changes in population consensus sequences may be linked with HLA-mediated selection. Also of interest is the observation that B*44 was associated with 5 of the 30 HLA-HIV codon pairs identified in the Mexican cohort. Two of these associations have been described in the IHAC cohort and two have been previously identified as HLA-associated positions with different HLA specificities (Figure 6). The strong influence of B*44 on HLA-mediated HIV evolution in the Mexican cohort could reflect differences in immunodominance hierarchies of CTL responses in the context of different HLA frequency distributions. Whereas strongly immunodominant CTL responses could be masking the effect of other less immunodominant responses in one cohort, these responses could have a greater impact on HLA-mediated HIV evolution in another cohort in which the immunodominant responses are infrequent. It is notable that the frequencies of many strongly immunodominant HLA alleles, such as B*57, B*27, B*08, B*07, A*03, and A*11 , are lower in the Mexican cohort compared to the IHAC cohort (q < 0.05) (Figure 4). It is possible that in the latter cohort, CTL responses restricted by these alleles could be masking the effect of other less immunogenic alleles that are frequently seen in the Mexican population. Indeed, this could be the case for Cw*07, the most frequent HLA-C allele group in the Mexican cohort, which explains 10% of HLA-HIV codon pairs observed in our analysis. These associations are unique to the Mexican cohort, and are supported by predicted epitopes and/or a strong statistical association (q < 0.05) (Figure 6).
The case of B*39 is also noteworthy, being the most frequent HLA-B allele group in the Mexican cohort with a frequency 7 times higher than that observed in the IHAC cohort (p = 1.80E-44, q = 1.21E-42) (Figure 4). B*39 explained another 10% of the HLA-HIV codon pairs identified in the Mexican cohort, suggesting either a strong influence of this allele in HIV evolution in the immunogenetic context of the Mexican cohort or a higher statistical power to detect associations. Interestingly, B*39-restricted associations were primarily escape associations (where possession of B*39 made it less likely to have the target amino acid in question), in which the target amino acid was a residue other than the consensus, suggesting that the consensus residue represents a possible escaped form for B*39 at this position (Additional file 1 : Table S3, Figure 6). This could be suggestive of a frequent role of the B*39 allelic group in HIV codon conservation in the Mexican cohort. This HLA-associated conservation of sites has been previously described with highly frequent HLA alleles that promote the accumulation of CTL adapted variants in different populations [22, 23].
Overall, two general key aspects could explain the observation of different associations in cohorts that are infected by viruses of the same clade but which have different HLA frequencies: 1) Different patterns of immunodominance, which argue for real differences in CTL epitope targeting; and 2) Different statistical power to detect associations, which argues for a statistical effect rather than a biological difference. For example, the absence of strong immunodominance patterns in certain populations could potentially facilitate the detection of HIV polymorphisms associated with less immunodominant alleles. Being able to confirm this possibility at the population level strongly relies on low false positive/negative rates. Although the false negative rate on the IHAC data is low, further experimental data is necessary to confirm this point. On the other hand, different HLA frequencies can simply change the statistical power to detect associations, thus supporting the importance of assessing HLA-mediated selection in a diverse set of cohorts. The possibility also exists that a simple statistical power issue could be resolved by combining different cohorts infected by the same viral clade to make a larger reference set, supporting the creation of a universal set of associations that could get updated periodically as new sequences are added. Such a sequence and association database would allow extrapolation from a large reference set to new demographic groups for which collection of cohorts would be difficult. The fact that 17 of the 23 novel HLA-HIV codon associations in the Mexican cohort involved HLA alleles whose frequencies were not significantly different from those in the IHAC cohort strongly suggests that their presence is not due to increased statistical power but rather may be due to differences in patterns of epitope targeting. Furthermore, immunodominance effects as well as statistical power issues depending on HLA frequencies could both exist in the same dataset. Examples of both phenomena have been described above for the Mexican cohort, suggesting that a set of immunogenetically diverse cohorts could greatly enrich HIV evolutionary studies without the need of very large cohorts. It should also be noted that a broad two-digit HLA allele grouping does not reveal all possible divergence in HLA pressure, as a number of HLA subtypes with different peptide-binding motifs can be defined at four-digit level within some allelic groups such as B*35, B*40, B*51, B*58, A*02, all with highly characteristic distributions in different populations [9, 49]. Thus, significant divergence in selection in some cases could be explained by different dominant four-digit subtypes of the broad allele group in the compared cohorts. This fact could have an impact on statistical power to detect associations defined by different subtypes within a broad allele group in different populations and further argues for the unique HLA-associated imprinting of HIV in different populations.
In summary, although important limitations exist for the analysis of HLA-mediated HIV evolution in the Mexican population, including the presence of false positive associations and the low power to detect associations, our analysis yielded strong evidence suggesting that unique characteristics in HLA-mediated HIV evolution in the Mexican cohort indeed exist. These include the striking proportion of unique HLA-HIV codon associations in the Mexican cohort (many of which can be supported by predicted or confirmed CTL epitopes), the presence of HLA-associated differences in the consensus sequence with respect to the HOMER consensus (which reflects differential fixation of CTL escape mutations at the population level with a high dependency on HLA frequency), and the existence of a high proportion of novel associations that involve HLA alleles whose frequencies were similar in the Mexican and the IHAC cohorts (which argues against a statistical power issue in detecting at least some of the significant associations).
To further characterize HLA-mediated HIV evolution, HLA-HIV codon and HIV codon-HIV codon associations were compared in free plasma virus and PBMC proviral DNA in the cohort of Mexican individuals. As shown by graphically depicting the PDNs for the two viral compartments, different mutational patterns and different HLA-HIV codon associations were seen in actively replicating plasma viruses and PBMC-archived proviruses at the population level. A significantly lower number of HLA-HIV codon associations was observed in proviral sequences and there were more distinct than shared HIV codon-HIV codon associations in the two compartments (Figure 7). This could be explained by the observation that proviral sequences frequently represent a stable reservoir of HIV sequences archived early in the course of the infection , whereas plasma viruses represent sequences from later in the course of infection. Thus, the proviral sequences may have been archived before some epitopes were targeted by host CTL responses, or before escape mutations had a chance of being selected at epitopes already being targeted by CTLs, resulting in fewer associations in proviral sequences than in the extant plasma sequences. Indeed, previous studies have shown the presence of HLA-associated escape mutations in plasma viruses that are rare in proviruses within infected individuals . Nevertheless, proviral HLA-HIV codon pairs could not be mapped to known epitopes of early escape  in the present data, although the possibility exists that a larger cohort and analyses in other viral genes could further support this correlation. However, given the differences in escape association that we have observed between the Mexican and IHAC cohorts and the observation that the cohort described  is immunogenetically similar to the IHAC cohort, it may be that the discordance between proviral escape associations reported here and previously reported early-escape epitopes reflects different patterns of CTL epitope targeting and kinetics between the two populations. The proviral associations in the Mexican cohort could thus represent early escape events in a Latin American cohort setting.
Surprisingly, some HLA associations detected for proviral sequences were not seen in the plasma virus dataset. Some of these HLA-HIV codon pairs observed exclusively in proviral sequences have fairly high q-values, possibly suggesting the presence of false positive associations. However, unique proviral associations could also suggest a chronological reshaping of HLA-mediated HIV evolution, reflecting rapidly reverting mutations which are lost soon after transmission to HLA-mismatched individuals. Alternatively, the existence of organ compartmentalization of HIV variants within an infected host and its relation to positive selection has been described . This phenomenon could explain population differences between actively replicating viruses coming from a specific compartment with characteristic selective pressures and archived proviruses, remaining as reservoir(s) originating from different anatomical and/or cellular compartments.
Shared associations between the plasma virus and the PBMC provirus compartments may reflect sites in the viral proteome with continuous CTL targeting throughout the chronic infection, a characteristic that might be of interest in the selection of candidate vaccine targets. On the other hand, these apparently more stable associations could also reflect epitopes with early CTL targeting that has stopped, but for which no reversion has occurred, suggesting low fitness costs for escape. If the latter case were true, some shared associations might be more likely to reach fixation at the population level in the future. This would have implications for our understanding and predictive capabilities of HIV adaptation in human populations.
Similarly, unique coevolving HIV codon pairs were detected in proviral sequences and in plasma virus sequences, perhaps reflecting different patterns of compensatory mutations to the different HLA escape mutations observed in the two compartments. Alternatively, unique proviral HIV codon-HIV codon pairs could be explained as a reorganization of mutational patterns in HIV evolution that reflect escape mutations selected in previous hosts as well as new mutations selected in the current host, while unique plasma virus HIV codon-HIV codon pairs could reflect sequential footprints left by viral adaptation to HLA-restricted responses in chronic infection in the current host. These observations bring up interesting consequences for our understanding of HLA-mediated HIV evolution, suggesting that the appearance and density of the PDNs for a specific population are highly dynamic and could vary in time. The dynamic development of CTL responses over the course of infection within an individual has been previously reported [51, 52]. Further studies in follow-up cohorts or in carefully stratified cross-sectional cohorts might be able to support or refute these observations.