Biodiversity is routinely applied to metagenomics of a variety of species, including the human microbiome, but only limited, if any, assessment of viromes in different ecological niches. Our study applies an efficient bioinformatic pipeline that we developed to assess the complexity of HIV-1 quasispecies in unique ecosystems within infected individuals. The power of pyrosequencing to generate extensive sequence data sets provides a foundation to apply population genetic analyses and extends the value for deep sequencing beyond analysis of rare variants that might indicate reduced sensitivity to drugs. Analysis of biodiversity based on sequence clustering provides a novel viral population profile for different environments independent of viral levels in cells or plasma, perhaps reflecting length of infection if sequences were archived in lineages of long-lived cells. Consistent with this model, complex viral population structure with high biodiversity appeared as early as eighteen months, or by four to six years, of infection in some individuals. Yet, similar periods of infection in other individuals were characterized by monomorphic viral populations with low complexity, indicating that biodiversity of V3 populations represents complex combinations of factors; for example, changes in viral fitness in the environmental landscape in response to host immunity, host target cells, or coreceptor evolution under selective pressure.
Another novel aspect of our study involved a combination of cross-sectional deep sequencing with conventional longitudinal sequences to provide high-resolution detection of evolutionary intermediates, which may be less fit or infrequent in peripheral blood, but nonetheless contribute to the genetic flexibility of the population. The specific order of amino acid substitutions over time may reflect important epistatic interactions that could focus detection of compensatory mutations contributing to fitness in the genetic landscape to other regions of the virus genome. Deep sequencing data sets fill in the evolutionary landscape and increase the power to infer the temporal accumulation of amino acid substitutions, or provide a basis for rational functional analysis of ancestral envelopes and the progeny that emerge from recurring viral population bottlenecks.
An apparent paradox from our analyses is the contribution by low-frequency, presumably less-fit viral variants, rather than the dominant variants, to next generation plasma HIV-1 populations with enhanced fitness. Low-frequency variants expand the fitness landscape for virus populations, while providing an array of evolutionary options to maximize survival in a changing ecosystem . Low frequency cell-associated HIV-1 quasispecies may represent residual genomes from a past dominant population archived in long-lived cells, a sequestered reservoir that only infrequently finds its way into the peripheral blood, and/or progenitors that gives rise to the next generation of dominant variants in the plasma. Transient dominance of a population leaves a molecular trail that persists as low frequency variants archived in peripheral blood. In agreement with studies of heterosexual HIV-1 transmission , archeological evidence of the earliest viral populations was found in our study of pediatric cells as long as four years after infection by maternal transmission, suggesting those early viruses, or at least their V3 domains, endure during the natural history of infection.
While the study focused on HIV-1 populations in human environments, the approach is applicable to an array of viruses with complex populations, including other subtypes or recombinant forms of HIV-1, hepatitis C or hepatitis B viruses, as well as the repertoire of related viruses that infect animals. Increased depth of sampling and extended length of the target region now possible by pyrosequencing combined with efficient bioinformatic pipelines provides a basis for developing quantitative measures of the ebb and flow of viral populations in changing environments.