Mapping of positive selection sites in the HIV genome in the context of RNA and protein structural constraints
© Snoeck et al; licensee BioMed Central Ltd. 2011
Published: 3 October 2011
The HIV-1 genome is subject to great variability. On the one hand, there are recognized pressures that target the virus and resulting in escape and adaptation. On the other hand, there is a requirement for sequence conservation because of functional and structural constraints. Mapping the sites of positive selective pressure on the viral genome generates a reference for understanding the limits to viral escape, and can serve as a template for the discovery of sites of genetic conflict with known or unknown host proteins.
Materials and methods
To build a thorough evolutionary, functional and structural map of the HIV-1 genome, full subtype B sequences were obtained from the Los Alamos database. We mapped sites under positive selective pressure (using the SLAC method in HYPHY), amino acid conservation, protein and RNA structure , overlapping coding frames, as well as CD8 T cell, CD4 T cell and antibody epitopes (based on the lists form the Los Alamos database website). Fisher exact test was used for univariate analysis, and either logistic regression or binary Firth's penalized-likelihood logistic regression was used in multivariate analysis. Statistical analyses were performed in R version 2.13.0 (http://www.r-project.org/).
Globally, 33% of amino acid positions were found to be variable and 12% of the genome was under positive selection. Because interrelated constraining and diversifying forces shape the viral genome, we included the variables from both classes of pressure in a multivariate model to predict conservation or positive selection: molecular structures (structured RNA and α-helix domains) predicted conservation while CD4 T cell and antibody epitopes were associated with positive selection. Gene-specific analyses showed instances of departure from the genome-wide estimates. For example, CD8 T cell epitopes were generally well conserved, except in gp120, where they were enriched for sites under positive selective pressure. Similarly, CD4 T cell epitopes were enriched for sites under positive selection genome-wide, but these epitopes were significantly more conserved in gag. Thus, the various constraints and selective pressures do not act evenly across the genome.
Because the global map of the viral genome also identifies positive selected sites that are not in canonical CD8 T cell, CD4 T cell or antibody epitopes, it identifies a class of residues that may be targeted by other host selective pressures, such as innate immunity effectors. As an example, sequence adaptation has been observed in the viral capsid in rhesus macaques upon cross-species transmission of SIVsm, due to selective pressure provided by restrictive TRIMS alleles. In addition to informing the combined analysis of host and viral genetic information, the results of this study may therefore help reveal novel mechanisms of antiretroviral response.
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.