Molecular clock of HIV-1 envelope genes under early immune selection

Park, Sung Yong; Love, Tanzy M. T.; Perelson, Alan S.; Mack, Wendy J.; Lee, Ha Youn

doi:10.1186/s12977-016-0269-6

Research
Open access
Published: 01 June 2016

Molecular clock of HIV-1 envelope genes under early immune selection

Sung Yong Park¹,
Tanzy M. T. Love²,
Alan S. Perelson³,
Wendy J. Mack⁴ &
…
Ha Youn Lee ORCID: orcid.org/0000-0001-7260-2383¹

Retrovirology volume 13, Article number: 38 (2016) Cite this article

4602 Accesses
7 Citations
4 Altmetric
Metrics details

Abstract

Background

The molecular clock hypothesis that genes or proteins evolve at a constant rate is a key tool to reveal phylogenetic relationships among species. Using the molecular clock, we can trace an infection back to transmission using HIV-1 sequences from a single time point. Whether or not a strict molecular clock applies to HIV-1’s early evolution in the presence of immune selection has not yet been fully examined.

Results

We identified molecular clock signatures from 1587 previously published HIV-1 full envelope gene sequences obtained since acute infection in 15 subjects. Each subject’s sequence diversity linearly increased during the first 150 days post infection, with rates ranging from $1.54 \times 10^{ - 5}$ to $3.91 \times 10^{ - 5}$ with a mean of $2.69 \times 10^{ - 5}$ per base per day. The rate of diversification for 12 out of the 15 subjects was comparable to the neutral evolution rate. While temporal diversification was consistent with evolution patterns in the absence of selection, mutations from the founder virus were highly clustered on statistically identified selection sites, which diversified more than 65 times faster than non-selection sites. By mathematically quantifying deviations from the molecular clock under various selection scenarios, we demonstrate that the deviation from a constant clock becomes negligible as multiple escape lineages emerge. The most recent common ancestor of a virus pair from distinct escape lineages is most likely the transmitted founder virus, indicating that HIV-1 molecular dating is feasible even after the founder viruses are no longer detectable.

Conclusions

The ability of HIV-1 to escape from immune surveillance in many different directions is the driving force of molecular clock persistence. This finding advances our understanding of the robustness of HIV-1’s molecular clock under immune selection, implying the potential for molecular dating.

Background

The molecular clock serves as a focal link between molecular evolution at a microscopic level and species evolution at a macroscopic level [1, 2]. The molecular clock hypothesis has been examined in a wide range of species both at the genomic and protein levels. Representative supporting data include (1) quantitative associations between amino acid sequence differences of homologous proteins and fossil-based divergence times of different organisms [3–5] and (2) linear relationships between the amount of nonsynonymous nucleotide substitutions and mammalian species divergence times [6].

Probing for an HIV-1 intrahost molecular clock is an important task because we can trace an infection back to transmission using sequences from a single time point if the molecular clock can be applied to an HIV-1 population within an infected individual. Accurately dating HIV-1 transmission is crucial for identifying risk behaviors that lead to transmission, monitoring prevention efforts, and informing when each immune response develops and matures. Estimates on the timing of infection can help us define immune correlates for protection using data from HIV-1 vaccine and prevention trials; for instance, knowledge of the time of HIV-1 acquisition will be important in determining the antibody titer threshold for protection in the Antibody Mediated Prevention (AMP) study [7]. Furthermore, the ability to molecularly date the HIV-1 gene pool expands the opportunity to determine HIV-1 incidence using recently developed genomic assays [8, 9].

The hypothesis that HIV-1 evolves in a clock-like manner has been tested; however, a consensus has not been reached. Rigorous statistical evaluations have been conducted on a diverse array of HIV-1 sequence data of different genomic regions, revealing both clock-like and non-clock-like behaviors [10–15]. The molecular clock hypothesis, while its existence itself is controversial, has been widely applied to estimate phylogenies and branching times of HIV-1 inter-host and intra-host populations: strict or relaxed molecular clocks were used to (1) date the ancestor of the main group of HIV-1 [16–18], (2) reconstruct the spread dynamics of HIV-1, estimating the location and timing of early transmission [19], and (3) quantify the intra-host HIV-1 envelope diversification rate in a range of $1.72 \times 10^{ - 5}$ per base per day to $4.32 \times 10^{ - 5}$ per base per day [20–23].

The HIV-1 gene population within an infected individual shows heavy selection signatures and fast-paced evolution due to a rapid turn-over rate and high mutation rate. Following transmission, an HIV-1 population evolves through the interplay of random mutations and immune selection in a complex setting of population growth and decline before reaching a stable virus load. This dynamic phase is a period of heightened immune selection pressure, which commences an evolutionary arms race between the virus and the immune system. Around 1 month post infection, the first CD8+ T cell responses targeting the founder viruses lead to rapid viral escapes with amino acid changes in CD8+ T cell epitope sequences at a rate as fast as 0.42 per day [24, 25]. This rate implies that a minor mutant present in 5 % of the total viral population could become the dominant lineage making up 95 % of the population in just 2 weeks. In the wake of the early CD8+ T cell responses, initial neutralizing antibody responses develop at around 3 months post infection, resulting in an ongoing pattern of viral escape and antibody evolution [26–28]. Understanding the effect of strong immune selection on HIV-1’s molecular clock is of interest as selection is often thought to be a rate-changing factor [2, 29], driving a genealogy to depart from that of random evolution by placing preference for particular lineages and perturbing the molecular clock.

In this study, we empirically and theoretically examine evidence of molecular clock conservation under selection by combining gene sequence data with mathematical models for HIV-1 evolution. We analyzed previously published HIV-1 envelope gene sequences collected from within 1 month of infection with sample intervals of days and weeks and traced HIV evolution at the onset of immune selection. Our primary goal is to examine whether a selection-induced heterogeneous phylogeny can conform to a strict molecular clock. By mathematically quantifying deviations from the molecular clock in an array of selection scenarios, we define conditions for the existence of a molecular clock.

Results

We examined HIV-1 diversification patterns under immune selection from serial measures of HIV-1 envelope gene sequence diversity. We analyzed 1587 previously published HIV-1 whole envelope gene sequences obtained serially from 15 acutely infected individuals [27, 30–32]. Figure 1a plots HIV-1 envelope gene sequence diversity dynamics during the 150 days following the first sample. To avoid the uncertainty of when each subject’s first sample was taken, all subsequent data points are presented in terms of the increase in diversity and the time following the first sample.

A mixed effects model was used to analyze the 15 subjects’ diversity dynamics over time (see “Methods”). At the population level, in the first 150 days following the first sample, the linear diversification of all 15 subjects’ HIV sequences was statistically significant (p < 0.0001), while quadratic attenuation was not (p = 0.76) (Fig. 1a). At the individual level, each subject’s sequence population showed statistically significant linear relationship (Table 1). This rate of linear diversification ranged from $1.54 \times 10^{ - 5}$ to $3.91 \times 10^{ - 5}$ per base per day with a population mean of 2.69 (±0.29) × 10⁻⁵ (Table 1), which is close to the HIV-1 diversification rate under the neutral evolution assumption (i.e. all HIV-1 infected cells produce the same number of secondary infected cells in a single replication cycle), $2.16 \times 10^{ - 5}$ per base per day [33–35]. This neutral evolution rate was approximated as $2\varepsilon /\tau$ with the viral generation time $\tau = 2$ days and HIV-1 single cycle base substitution rate $\varepsilon = 2.16 \times 10^{ - 5}$ per base per cycle [35]. We found that 12 out of the 15 subjects’ rates of diversification matched the neutral evolution rate (Table 1). We did not observe any differences in the linear diversification rate between males and females (p = 0.70, ANOVA), contradicting a recent study that reported a greater evolution rate in risk groups with a higher proportion of men [36]. We then traced HIV-1 diversity over a longer time frame of 2 years, as shown in Fig. 1b, a quadratic attenuation became significant at the population level (p = 0.028). Figure 2 plots each subject’s diversity dynamics over 2 years of infection with the best-fit of a mixed effect model (Additional file 1: Table S1). The quadratic leveling-off of diversity was significant over 2 years in 4 of the 15 subjects, CH077, CH131, CH159 and CH505 (see p values for each quadratic term in Additional file 1: Table S1). Our observation suggests that an intrahost HIV-1 population evolves in a clock-like manner close to the error rate of HIV-1 reverse transcriptase for the first 150 days following infection and starts to slowly level off afterwards.

Table 1 The rate of HIV-1 envelope gene sequence diversification with standard errors during the 150 days from the first sample in 15 subjects whose sequence data come from references [27, 30–32]

Full size table

We next examined spatial patterns of mutations across the envelope gene. Figure 3a shows mutations away from the founder sequence along subject CH042’s envelope gene sequences; the locations of mutations are clustered on putative selection sites rather than randomly scattered. Previously, epitope mapping uncovered regions susceptible to immune selection, and thus viral escapes via mutation in these 15 subjects [27, 30–32]. Experimentally identifying all selection sites, however, was not feasible. Alternatively, a statistical approach can provide a comprehensive list of putative selection sites based on patterns in sequence samples. We defined putative selection sites as nucleotide positions showing more base substitutions from the founder nucleotide than would be expected to occur by chance in the absence of selection. To designate putative selection sites, we first measured the mutant frequency: the proportion of sequences at a given time that do not match the founder sequence at a particular nucleotide site. Figure 3b plots the mutant frequency distribution of all sites along 25 full envelope gene sequences obtained from subject CH042 at 676 days from the first sample date. In the absence of selection, the number of sequences, k, at a given time post infection, t, that do not match the founder sequence at a particular nucleotide site would follow a binominal distribution,

$$P(k,t) = \frac{{N_{S} !}}{{k!(N_{S} - k)!}}\left( {\frac{t}{\tau }\varepsilon } \right)^{k} \left( {1 - \frac{t}{\tau }\varepsilon } \right)^{{N_{S} - k}} ,$$

(1)

where $N_{S}$ is the number of sampled sequences, $\varepsilon$ is the HIV-1 single cycle base substitution rate and $\tau$ is the viral generation time [20–23]. The best fit of the binomial distribution to subject CH042’s mutant frequency distribution is presented by the dashed line in Fig. 3b. This fit defines a threshold mutant frequency such that sites exhibiting mutations from the founder sequence above the threshold mutant frequency are designated as putative selection sites.

Indeed, some statistically identified selection sites match experimentally confirmed peptides reactive to autologous CD8+ T cells, including VQKEYAFFYK (169–178) and QFRNKTIVF (gp160 352–361) [32]. Some of these selection sites we identified that are consistent with known CD8+ T cell epitopes are also restricted by the same HLA type for each individual. Additional file 1: Table S2 links, when applicable, each designated selection site to a known CD8+ T cell epitope in the Los Alamos National Laboratory HIV-1 Molecular Immunology Database (http://www.hiv.lanl.gov/content/immunology/maps/maps.html). On average, around 60 % of the statistically identified selection sites were located within known CD8 T cell epitope regions (Additional file 1: Table S1).

Clear immune selection signatures were visible when we compared diversity dynamics between selection and non-selection sites. Subject CH042’s sequence data showed that diversity increased more rapidly within designated selection sites than it did outside of them (Fig. 3d). The slope of the selection sites’ diversity increase was around 19 times greater than that of non-selection sites in subject CH042 over 150 days. We observed a similar pattern in subject CH256 (Fig. 3e) and all other subjects. Our mixed effects model estimated that, on average, selection sites diversify around 65 times more rapidly than non-selection sites during the first 150 days post infection (Table 1). Finding the majority of base substitutions in sites which cover between 0.12 and 3.14 % of the full envelope gene indicates that the capacity to select viral variants is highly concentrated within immune targeted sites.

While patterns of multiple mutant forms were prevalent in selection sites, some sites become homogeneous after the first 150 days since infection; for example, in 12 putative selection sites of subject CH131, the same single nucleotide replaced the founder nucleotide in all sequences at each respective site 670 days after the first sample date. In this subject, we observed a smaller increase in genetic diversity than predicted by the molecular clock (Fig. 2) due to these homogeneous selection sites. Some selection sites were homogeneous in four other subjects, CAP045, CH077, CH185, and CH256, showing deviations from the constant evolution rate caused by decreased diversity, as shown in Fig. 2. Nonetheless, hard sweep signatures are lacking within these five subjects because the frequency of heterogeneous selection sites remains substantial, ranging from 38.5 to 75.5 %; in the other 10 subjects, the homogeneous sites make up fewer than 20 % of all identified putative selection sites. While homogeneity creates notable deviations from clock-like evolution after the first 150 days post infection, it is most commonly not the primary determinant of HIV-1 selection patterns.

Next, we sought to understand the link between clock-like evolution and clustered mutation at immune selection sites, which were simultaneously observed in a considerable number of envelope gene sequences from the 15 individuals. We developed a model of HIV-1 gene evolution and within-host viral dynamics during the early phases of immune selection. As illustrated in Fig. 4a, in our model the founder lineage initially replicates in the absence of immune selection, producing R ₀ secondary infected cells from a single infected cell [30, 35]. Each replication cycle involves HIV-1 reverse transcriptase-mediated base substitution errors with rate ε. Departing from neutral evolution, at the onset of selection, at generation g _s1, a single infected cell harboring an escape virus is assumed to arise and begin producing R ₁ daughter cells, while the replicative capacity of the wild-type infected cells is significantly compromised due to immune recognition, producing only sR ₀ daughters on average, with 0 < sR ₀ < 1. Thus, the selection coefficient of the wild-type virus relative to the escape mutant is S = 1 − (sR ₀/R ₁). During the viral decline phase, the wild-type population is rapidly cleared by immune selection while the proportion of the mutant-type population increases within the total population, leading to viral escape. After the viral set point is reached (generation g _s in Fig. 4a), all existing mutant-type infected cells are assumed to repopulate themselves without increasing the population size ($R_{1} = 1$). In this model, the total infected cell count and viral load, proportional to the former, mimic what is observed through the natural course of an HIV-1 infection—an exponential increase followed by a rapid decline, with a steady population level thereafter (Fig. 4b).

Despite modification of HIV-1 genealogy by immune selection, our model showed that any given wild-type virus pair most likely coalesces at the transmission point [Additional file 1: Eq. (S14)]. The genealogy of mutant-type pairs followed a neutral evolution scenario wherein all mutant descendants originate from a single mutant ancestor. As shown in Fig. 4c, the coalescence probability of the mutant-type pairs peaked when the first mutant virus appears and exponentially decreases afterwards [Additional file 1: Eq. (S17)]. Our calculation also showed that a wild-mutant virus pair most likely coalesces at the transmission point with the same trend as does a wild–wild pair [Additional file 1: Eq. (S20)]. The most recent common ancestor (MRCA) for both wild–wild and wild-mutant pairs was most likely the founder virus, whereas for mutant–mutant pairs it was most likely the first mutant virus. Therefore, the total population coalescence distribution depended on the ratio between the wild-type population and the mutant one. When the mutant population level was comparable to that of the wild-type, the coalescent profile peaked at the origin of an infection, as shown in Fig. 4c.

The coalescent profile of all virus pairs permits us to evaluate diversity dynamics, from which we can assess deviations from the molecular clock. The deviation from a constant molecular clock can be quantified as the difference between the sequence diversity and the reference neutral clock diversity. As detailed in Additional file 1, when the mutant population was prevalent at the viral set point, the clock deviation, $\Delta_{1}$, was approximated as

$$\Delta _{1} \simeq 2\varepsilon \left\{ {g_{s1} + \frac{1}{{R_{1} - 1}} - \frac{1}{{R_{0} - 1}}} \right\}$$

(2)

Here Eq. (2) indicates that the clock deviation is mainly affected by the time that the escape lineage arises (g _s1); the later it appears, the greater the deviation. The replicative capacity of the mutant-type virus (R ₁) and wild-type virus (R ₀) also contribute to the clock deviation. On the other hand, the deviation does not depend on the selective disadvantage of the wild-type population (s). Likewise, the clock deviation is approximately constant regardless of when viruses are sampled (g) after the mutant lineage arises.

Our findings demonstrated that a fraction of selection sites showed an extreme level of diversity (Fig. 3), revealing the presence of multiple escape lineages. Accordingly, we generalized the model to more rigorously address how the clock deviation changes as more mutant lineages accrue. The generalized N-mutant model showed that all virus pairs except intra-mutant pairs (those within a single mutant lineage), including wild–wild, wild-mutant, and inter-mutant pairs, most likely coalesce at the founder virus [see Additional file 1: Eqs. (S66), (S69), and (S72)]. The deviation from the molecular clock decreases as more distinct mutant lineages appear, approximated as the single mutant model deviation divided by the number of mutant lineages, N,

$$\Delta _{N} \simeq \frac{{\Delta _{1} }}{N}.$$

(3)

This calculation indicates complete molecular clock conservation in the large N limit. For instance, Fig. 4d shows that when two mutant viruses appear, the clock deviation becomes half of that of the single mutant model. The conservation of the molecular clock in a selection-induced heterogeneous phylogeny can be understood from our demonstration that distinct mutant lineages most probably coalesce at the origin of an infection. As more distinct mutant lineages appear, the proportion of virus pairs coalescing to the initial transmission point increases within the total viral population, decreasing the deviation from the molecular clock. Ergo, even when the transmitted/founder lineage is entirely eliminated from the viral population, this clock property allows us to assess the time since infection based on sequences of escape mutant populations, permitting molecular dating of HIV-1 infections.

In the presence of multiple escape lineages, our model predicted that the most probable coalescent point of any two distinct mutant lineages would be the same as that of wild-type virus pairs: the transmission point. We tested this prediction by comparing the diversity among mutant-type pairs from distinct lineages and the diversity among wild-type pairs. To clearly designate distinct mutant lineages, we selected four subjects whose envelope sequences at a chosen time showed the signature of escape within only one epitope. Table 2 lists the CD8+ T cell epitope sequences of both founder and mutant lineages for each of these four cases. Here, the wild-type lineage is designated in reference to the consensus sequence of the earliest time point sample of each individual. Each mutant lineage was grouped based on amino acid sequence variations within the CD8+ T cell epitopes. As in Table 2, more than one escape lineage derived from the same epitope existed in all four cases. As predicted by our model, the diversity among mutant-type pairs from distinct lineages and wild-type pairs are highly similar to one another (Fig. 5). Contrarily, the diversity within each mutant lineage is considerably smaller than that of the founder lineage (Fig. 5), which is in good agreement with the prediction that intra-mutant lineage pairs coalesce at a later generation than do wild-type lineage pairs.

Table 2 Founder and escape lineages of a given epitope from each subject’s single time point sequence data

Full size table

Discussion

The present study examined the rate of HIV-1 envelope gene diversification within 15 individuals who were serially surveyed from the acute stage of infection. HIV-1 sequence diversity increased linearly for the first 150 days of infection with a population mean of $2.69 \times 10^{ - 5}$ per base per day. This clock-like evolutionary pattern showed variations in speed across the subjects, ranging from $1.54 \times 10^{ - 5}$ to $3.91 \times 10^{ - 5}$ per base per day. This rate, estimated from comprehensive sequence data obtained by single genome amplification and Sanger sequencing, is comparable to previous estimates of the intra-host HIV-1 nucleotide substitution rate, ranging from $1.72 \times 10^{ - 5}$ to $4.32 \times 10^{ - 5}$ per base per day [20–23]. While previous estimates were based on sequence data collected from either chronic infections or post seroconversion, our estimates are made using sequence data collected initially before immune selection and sampled with intervals of days and weeks. In this way, we were able to monitor the first HIV evolution at the onset of immune selection. Understanding the molecular clock in these early phases is necessary to characterize transmission using a single time point sample. The rate of diversification for 12 of the 15 subjects was statistically comparable to the neutral evolution rate, $2.16 \times 10^{ - 5}$ per base per day, which was previously estimated based on HIV-1 single cycle mutation rate and viral generation time [30, 35]. Over 2 years of infection, HIV-1 diversification began to level off quadratically in 4 of the 15 subjects.

We then investigated the spatial distribution of mutations away from the transmitted/founder sequence across the envelope gene. Mutations were concentrated at putative selection sites, while HIV-1 sequence populations temporally diversify in a clock-like manner with rates consistent with the neutral evolution rate. We classified putative selection sites as positions that showed more base substitutions from the founder sequence than statistically expected. Around 60 % of the statistically identified selection sites were found to be in known CD8+ T cell epitopes from the Los Alamos Immune Database (http://www.hiv.lanl.gov). We then quantified the diversity dynamics within the selection sites as compared to those within non-selection regions. On average, among the 15 subjects examined here, these viral escape sites diversify more than 65 times faster than do non-selection sites, indicating that the majority of mutations accumulate in immune selection sites spanning a small fraction of the entire envelope gene sequence.

Our observations emphasize the ability of multiple escape variants to arise from diverse amino acid changes at given selection sites, with ample evidence as presented in Table 2 [24, 30]. There are several different mutational pathways through which HIV can escape from immune pressure. Viral escapes from cytotoxic T cell responses can be mediated by non-synonymous mutations that can directly abrogate peptide-MHC binding [37, 38]. Escape can also occur via impaired recognition of viral peptide-MHC complexes by cytotoxic T cells [39, 40], or mutations that compromise intracellular epitope processing, for instance, by preventing NH₂-terminal trimming of the epitope [41]. Antibody escape patterns are also heterogeneous; site-directed mutagenesis has identified multiple resistant variants within the viral envelope CD4 binding site [42]. A considerable amount of sequence variation within the D, V1 and V5 loops, and the CD4-binding site of the HIV-1 envelope has been reported within a subject who developed broadly neutralizing antibody responses [27]. These diverse mechanisms for avoiding immune surveillance sustain multiple mutant lineages during HIV-1 escape.

To address how the molecular clock prevails in a dynamic environment that favors various escape mutants, we proposed a mathematical model describing HIV-1 replication and evolution after transmission. Our model approach allowed us to evaluate the deviation from a constant molecular clock under different immune selection scenarios. When a mutant lineage arises from immune pressure, the most dominant factor of deviation from the clock was the timing of the escape mutant’s appearance; the later it appeared, the greater the deviation. Importantly, the clock deviation was inversely proportional to the number of distinct mutants; when more distinct mutant lineages appeared, viral evolution more closely resembled the molecular clock. This reduction in the clock deviation is due to the fact that distinct mutant lineages most likely coalesce at the founder virus, and thus the greater number of different mutant lineages increases the proportion of virus pairs coalescing to the initial transmission event within the total viral population. Therefore, the capacity for HIV-1 to escape in multiple directions maintains the clock-like evolution of the overall HIV-1 intra-host population.

The presence of multiple mutant lineages can be linked to soft selective sweeps that occur when beneficial mutations are supplied at a rate equal to or greater than once per site per generation [43]. A chronically infected individual is expected to have around $10{}^{8}$ productively infected CD4+ T cells [44]. At the peak of viremia during acute infection, there will be an even greater number of productively infected cells, each being produced by one or more reverse transcription events. Thus, with the mutation rate of $\varepsilon = 2.16 \times 10^{ - 5}$ per base per cycle [33, 35], each selection site is likely to have developed mutations desirable for viral escape before the onset of immune responses. The high mutation rate in parallel with the large HIV-1 population size renders the appearance of multiple mutant lineages very probable, which ensures molecular clock persistence under selection. Similarly, in speciation events, genetic polymorphisms are presumably a major source of multiple mutant lineages in light of the much smaller mutation rate, around $10^{ - 8}$ per base per generation [45].

There are several factors preventing complete adherence to clock-like HIV-1 evolution. Recombination can alter coalescing patterns and thereby perturb clock-like diversification [46]. Hypermutation mediated by APOBEC3G/F can cause the rejection of a single rate molecular clock [47]. Virus latency and compartmentalization may result in viral lineages with different number of replication cycles since the founder virus, as compared to other lineages [48]. Linked homogeneous selection sites can also result in departures from the molecular clock; we observed that temporal deviations from the molecular clock were associated with a greater number of homogeneous selection sites. However, we did not observe hard sweep signatures even within subjects with homogeneity because the fraction of heterogeneous selection sites remained substantial, ranging from 38.5 to 75.5 %. Immune selection patterns are characterized predominantly by soft sweeps, in contrast to HIV-1 drug resistance evolution which involves both hard and soft sweeps [43].

We observed subject-to-subject variations in the rate of early HIV diversification. As previously shown, one of the main parameters affecting the neutral evolution rate is the viral generation time [35]. There is considerable difference in the viral generation time, which is estimated from the slope of plasma HIV-1 RNA decline during antiretroviral therapy [34, 49–52]. However, the accuracy of estimates of viral generation time is complicated by a lack of knowledge of the in vivo drug efficacy in the patients under study. In addition to viral generation time, we may examine other individual-level factors that contribute to variability in diversification rates. For instance, while a recent study observed a greater evolution rate in risk groups with a higher proportion of men [36], we did not observe any differences in the evolution rate between males and females.

The present study provides better opportunities for molecular dating of early HIV infections with a single time point sample. Our observation of clock-like evolution under immune selection validates our approach of dating an early HIV infection using a patient’s Hamming distance distribution [30, 35, 53]. Furthermore, the observed variability in the clock rates necessitates expanding the current method to model the population variability, which could allow for greater precision in estimation of time since infection with a single time point sample. Additionally, the model should be applied to long-term infected individuals with precaution by considering the observed quadratic attenuation in the diversity dynamics for a time frame over 2 years. Developing an accurate tool for estimating the timing of infection is required to meet the growing need for defining immune correlates for protection in on-going vaccine and prevention trials [7].

Conclusions

In this study of HIV-1 intrahost evolution, we demonstrated that a molecular clock can hold even when a gene phylogeny becomes increasingly complex as the population evolves under selection. By tracing the evolution of HIV-1 at the onset of immune selection, we discovered that (1) 12 out of the 15 subjects’ evolution rate conform to the neutral evolution rate during the first 150 days post infection and (2) in contrast to this regular temporal evolution pattern, mutations were highly clustered on selection sites that diversified more than 65 times faster than non-selection sites. Our mathematical model provides a link between clock conservation and multiple modes of HIV escape. HIV escape complexity was shown to ensure a constant clock-like diversification over time within the first 150 days of infection. The indication of a molecular clock functioning under heavy selection may allow us to date an HIV-1 gene population back to its transmission point, thereby providing crucial information for HIV-1 prevention efforts and grounds for genome-based HIV incidence measures [8, 9].

Methods

Sources of published sequence data

A total of 1587 HIV-1 full envelope gene sequences were obtained from 15 subjects’ published data in references, [27, 30–32]. All subjects’ first samples were taken during the acute stage of infection; the first sample of subject CH505 was estimated to have been obtained 4 weeks after infection and the first sample of the other 14 subjects was obtained during Fiebig stage I/II. The sequences from subjects CAP045, CH042, CH131, CH159, CH162, CH164, CH185, CH198, CH256 and CH505 were subtype C and the sequences from subjects CH040, CH058, CH077, SUMA0874 and WEAU0578 were subtype B [27, 30–32]. The subjects did not receive antiretroviral therapy during the period that serial samples were taken [27, 30–32]. Serial HIV-1 envelope gene sequences from the other 11 subjects in these references were excluded in our analysis for the following reasons: subject CAP239 had only two time point samples, subject CH607 received ART, subject CAP210 and CH470 showed the signature of more than a single founder variant, and subjects 1051, 1056, 1058, 1059, 6247, CH607, and TT31P were followed for a period of less than 1 month.

Mixed effect model for HIV-1 diversity dynamics

We used a linear mixed effects model to analyze the diversity increase over time from the first sample among the 15 subjects. Random coefficients were specified to allow for individual subject deviations from population average regression coefficients for linear and quadratic associations of diversity with time. A mixed effects model for HIV-1 intrahost diversity dynamics is written as,

$$d_{i} (t) = (a + \eta_{i} )t + (b + \mu_{i} )t^{2} ,$$

(4)

where $d_{i} (t)$ is subject i’s diversity increase at time $t$, measured in days, from the first sample, $a + \eta_{i}$ is the linear and $b + \mu_{i}$ is the quadratic coefficient of subject $i$. Here, $a$ is the population linear diversification rate and $b$ is the population quadratic rate. Restricted maximum likelihood was implemented in SAS Proc Mixed to estimate and test the population average and random coefficients. Mixed model estimates were used to evaluate individual subject estimates of linear and quadratic diversity rates.

References

Kumar S. Molecular clocks: four decades of evolution. Nat Rev Genet. 2005;6(8):654–62.
Article CAS PubMed Google Scholar
Hedges SB, Kumar S. Discovering the timetree of life. In: Kumar S, Hedges SB, editors. The Timetree of life. New York: Oxford University Press; 2009. p. 3–18.
Margoliash E. Primary structure and evolution of cytochrome C. Proc Natl Acad Sci USA. 1963;50:672–9.
Article CAS PubMed PubMed Central Google Scholar
Doolittle RF, Blomback B. Amino-acid sequence investigations of fibrinopeptides from various mammals—evolutionary implications. Nature. 1964;202(492):147–52.
Article CAS PubMed Google Scholar
Kumar S, Hedges SB. A molecular timescale for vertebrate evolution. Nature. 1998;392(6679):917–20.
Article CAS PubMed Google Scholar
Kumar S, Subramanian S. Mutation rates in mammalian genomes. Proc Natl Acad Sci USA. 2002;99(2):803–8.
Article CAS PubMed PubMed Central Google Scholar
http://ampstudy.org.
Park SY, Love TM, Nelson J, Thurston SW, Perelson AS, Lee HY. Designing a genome-based HIV incidence assay with high sensitivity and specificity. AIDS. 2011;25(16):F13–9.
Article CAS PubMed PubMed Central Google Scholar
Park SY, Goeken N, Lee HJ, Bolan R, Dube MP, Lee HY. Developing high-throughput HIV incidence assay with pyrosequencing platform. J Virol. 2014;88(5):2977–90.
Article PubMed PubMed Central Google Scholar
Posada D, Crandall KA. Selecting models of nucleotide substitution: an application to human immunodeficiency virus 1 (HIV-1). Mol Biol Evol. 2001;18(6):897–906.
Article CAS PubMed Google Scholar
Gojobori T, Moriyama EN, Kimura M. Molecular clock of viral evolution, and the neutral theory. Proc Natl Acad Sci USA. 1990;87(24):10015–8.
Article CAS PubMed PubMed Central Google Scholar
Leitner T, Albert J. The molecular clock of HIV-1 unveiled through analysis of a known transmission history. Proc Natl Acad Sci USA. 1999;96(19):10752–7.
Article CAS PubMed PubMed Central Google Scholar
Jenkins GM, Rambaut A, Pybus OG, Holmes EC. Rates of molecular evolution in RNA viruses: a quantitative phylogenetic analysis. J Mol Evol. 2002;54(2):156–65.
Article CAS PubMed Google Scholar
Salemi M. The intra-host evolutionary and population dynamics of human immunodeficiency virus type 1: a phylogenetic perspective. Infect Dis Rep. 2013;5(Suppl 1):e3.
Article PubMed PubMed Central Google Scholar
Lemey P, Rambaut A, Pybus OG. HIV evolutionary dynamics within and among hosts. AIDS Rev. 2006;8(3):125–40.
PubMed Google Scholar
Korber B, Muldoon M, Theiler J, Gao F, Gupta R, Lapedes A, Hahn BH, Wolinsky S, Bhattacharya T. Timing the ancestor of the HIV-1 pandemic strains. Science. 2000;288(5472):1789–96.
Article CAS PubMed Google Scholar
Worobey M, Gemmel M, Teuwen DE, Haselkorn T, Kunstman K, Bunce M, Muyembe JJ, Kabongo JM, Kalengayi RM, Van Marck E, et al. Direct evidence of extensive diversity of HIV-1 in Kinshasa by 1960. Nature. 2008;455(7213):661–4.
Article CAS PubMed PubMed Central Google Scholar
Salemi M, Strimmer K, Hall WW, Duffy M, Delaporte E, Mboup S, Peeters M, Vandamme AM. Dating the common ancestor of SIVcpz and HIV-1 group M and the origin of HIV-1 subtypes using a new method to uncover clock-like molecular evolution. FASEB J. 2001;15(2):276–8.
CAS PubMed Google Scholar
Faria NR, Rambaut A, Suchard MA, Baele G, Bedford T, Ward MJ, Tatem AJ, Sousa JD, Arinaminpathy N, Pepin J, et al. HIV epidemiology. The early spread and epidemic ignition of HIV-1 in human populations. Science. 2014;346(6205):56–61.
Article CAS PubMed PubMed Central Google Scholar
Edo-Matas D, Lemey P, Tom JA, Serna-Bolea C, van den Blink AE, van ‘t Wout AB, Schuitemaker H, Suchard MA. Impact of CCR5delta32 host genetic background and disease progression on HIV-1 intrahost evolutionary processes: efficient hypothesis testing through hierarchical phylogenetic models. Mol Biol Evol. 2011;28(5):1605–16.
Article CAS PubMed PubMed Central Google Scholar
Maljkovic Berry I, Ribeiro R, Kothari M, Athreya G, Daniels M, Lee HY, Bruno W, Leitner T. Unequal evolutionary rates in the human immunodeficiency virus type 1 (HIV-1) pandemic: the evolutionary rate of HIV-1 slows down when the epidemic rate increases. J Virol. 2007;81(19):10625–35.
Article PubMed Google Scholar
Novitsky V, Wang R, Rossenkhan R, Moyo S, Essex M. Intra-host evolutionary rates in HIV-1C env and gag during primary infection. Infect Genet Evol. 2013;19:361–8.
Article CAS PubMed Google Scholar
Lemey P, KosakovskyPond SL, Drummond AJ, Pybus OG, Shapiro B, Barroso H, Taveira N, Rambaut A. Synonymous substitution rates predict HIV disease progression as a result of underlying replication dynamics. PLoS Comput Biol. 2007;3(2):e29.
Article PubMed PubMed Central Google Scholar
Goonetilleke N, Liu MK, Salazar-Gonzalez JF, Ferrari G, Giorgi E, Ganusov VV, Keele BF, Learn GH, Turnbull EL, Salazar MG, et al. The first T cell response to transmitted/founder virus contributes to the control of acute viremia in HIV-1 infection. J Exp Med. 2009;206(6):1253–72.
Article CAS PubMed PubMed Central Google Scholar
Ganusov VV, Goonetilleke N, Liu MK, Ferrari G, Shaw GM, McMichael AJ, Borrow P, Korber BT, Perelson AS. Fitness costs and diversity of the cytotoxic T lymphocyte (CTL) response determine the rate of CTL escape during acute and chronic phases of HIV infection. J Virol. 2011;85(20):10518–28.
Article CAS PubMed PubMed Central Google Scholar
Richman DD, Wrin T, Little SJ, Petropoulos CJ. Rapid evolution of the neutralizing antibody response to HIV type 1 infection. Proc Natl Acad Sci USA. 2003;100(7):4144–9.
Article CAS PubMed PubMed Central Google Scholar
Liao HX, Lynch R, Zhou T, Gao F, Alam SM, Boyd SD, Fire AZ, Roskin KM, Schramm CA, Zhang Z, et al. Co-evolution of a broadly neutralizing HIV-1 antibody and founder virus. Nature. 2013;496(7446):469–76.
Article CAS PubMed PubMed Central Google Scholar
McMichael AJ, Borrow P, Tomaras GD, Goonetilleke N, Haynes BF. The immune response during acute HIV-1 infection: clues for vaccine development. Nat Rev Immunol. 2010;10(1):11–23.
Article CAS PubMed PubMed Central Google Scholar
Margoliash E, Smith EL. Structure and functional aspects of cytochrome c in relation to evolution. In: Bryson V, Vogel HJ, editors. Evolving genes and proteins. New York: Academic Press; 1965. p. 221–42.
Google Scholar
Keele BF, Giorgi EE, Salazar-Gonzalez JF, Decker JM, Pham KT, Salazar MG, Sun C, Grayson T, Wang S, Li H, et al. Identification and characterization of transmitted and early founder virus envelopes in primary HIV-1 infection. Proc Natl Acad Sci USA. 2008;105(21):7552–7.
Article CAS PubMed PubMed Central Google Scholar
Salazar-Gonzalez JF, Salazar MG, Keele BF, Learn GH, Giorgi EE, Li H, Decker JM, Wang S, Baalwa J, Kraus MH, et al. Genetic identity, biological phenotype, and evolutionary pathways of transmitted/founder viruses in acute and early HIV-1 infection. J Exp Med. 2009;206(6):1273–89.
Article CAS PubMed PubMed Central Google Scholar
Liu MK, Hawkins N, Ritchie AJ, Ganusov VV, Whale V, Brackenridge S, Li H, Pavlicek JW, Cai F, Rose-Abrahams M, et al. Vertical T cell immunodominance and epitope entropy determine HIV-1 escape. J Clin Invest. 2013;123(1):380–93.
CAS PubMed PubMed Central Google Scholar
Mansky LM, Temin HM. Lower in vivo mutation rate of human immunodeficiency virus type 1 than that predicted from the fidelity of purified reverse transcriptase. J Virol. 1995;69(8):5087–94.
CAS PubMed PubMed Central Google Scholar
Markowitz M, Louie M, Hurley A, Sun E, Di Mascio M, Perelson AS, Ho DD. A novel antiviral intervention results in more accurate assessment of human immunodeficiency virus type 1 replication dynamics and T-cell decay in vivo. J Virol. 2003;77(8):5037–8.
Article CAS PubMed PubMed Central Google Scholar
Lee HY, Giorgi EE, Keele BF, Gaschen B, Athreya GS, Salazar-Gonzalez JF, Pham KT, Goepfert PA, Kilby JM, Saag MS, et al. Modeling sequence evolution in acute HIV-1 infection. J Theor Biol. 2009;261(2):341–60.
Article CAS PubMed PubMed Central Google Scholar
Vrancken B, Baele G, Vandamme AM, van Laethem K, Suchard MA, Lemey P. Disentangling the impact of within-host evolution and transmission dynamics on the tempo of HIV-1 evolution. AIDS. 2015;29(12):1549–56.
Article PubMed Google Scholar
Carlson JM, Le AQ, Shahid A, Brumme ZL. HIV-1 adaptation to HLA: a window into virus-host immune interactions. Trends Microbiol. 2015;23(4):212–24.
Article CAS PubMed Google Scholar
Bronke C, Almeida CA, McKinnon E, Roberts SG, Keane NM, Chopra A, Carlson JM, Heckerman D, Mallal S, John M. HIV escape mutations occur preferentially at HLA-binding sites of CD8 T-cell epitopes. AIDS. 2013;27(6):899–905.
Article CAS PubMed Google Scholar
Phillips RE, Rowland-Jones S, Nixon DF, Gotch FM, Edwards JP, Ogunlesi AO, Elvin JG, Rothbard JA, Bangham CR, Rizza CR, et al. Human immunodeficiency virus genetic variation that can escape cytotoxic T cell recognition. Nature. 1991;354(6353):453–9.
Article CAS PubMed Google Scholar
Iglesias MC, Almeida JR, Fastenackels S, van Bockel DJ, Hashimoto M, Venturi V, Gostick E, Urrutia A, Wooldridge L, Clement M, et al. Escape from highly effective public CD8+ T-cell clonotypes by HIV. Blood. 2011;118(8):2138–49.
Article CAS PubMed PubMed Central Google Scholar
Draenert R, Le Gall S, Pfafferott KJ, Leslie AJ, Chetty P, Brander C, Holmes EC, Chang SC, Feeney ME, Addo MM, et al. Immune selection for altered antigen processing leads to cytotoxic T lymphocyte escape in chronic HIV-1 infection. J Exp Med. 2004;199(7):905–15.
Article CAS PubMed PubMed Central Google Scholar
Dreja H, Pade C, Chen L, McKnight A. CD4 binding site broadly neutralizing antibody selection of HIV-1 escape mutants. J Gen Virol. 2015;96(7):1899–905.
Article CAS PubMed Google Scholar
Pennings PS, Kryazhimskiy S, Wakeley J. Loss and recovery of genetic diversity in adapting populations of HIV. PLoS Genet. 2014;10(1):e1004000.
Article PubMed PubMed Central Google Scholar
Haase AT, Henry K, Zupancic M, Sedgewick G, Faust RA, Melroe H, Cavert W, Gebhard K, Staskus K, Zhang ZQ, et al. Quantitative image analysis of HIV-1 infection in lymphoid tissue. Science. 1996;274(5289):985–9.
Article CAS PubMed Google Scholar
Roach JC, Glusman G, Smit AF, Huff CD, Hubley R, Shannon PT, Rowen L, Pant KP, Goodman N, Bamshad M, et al. Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science. 2010;328(5978):636–9.
Article CAS PubMed PubMed Central Google Scholar
Posada D. Unveiling the molecular clock in the presence of recombination. Mol Biol Evol. 2001;18(10):1976–8.
Article CAS PubMed Google Scholar
Simon V, Zennou V, Murray D, Huang Y, Ho DD, Bieniasz PD. Natural variation in Vif: differential impact on APOBEC3G/3F and a potential role in HIV-1 diversification. PLoS Pathog. 2005;1(1):0020–8.
Article CAS Google Scholar
Liu Y, Nickle DC, Shriner D, Jensen MA, Learn GH Jr, Mittler JE, Mullins JI. Molecular clock-like evolution of human immunodeficiency virus type 1. Virology. 2004;329(1):101–8.
Article CAS PubMed Google Scholar
Perelson AS, Essunger P, Cao Y, Vesanen M, Hurley A, Saksela K, Markowitz M, Ho DD. Decay characteristics of HIV-1-infected compartments during combination therapy. Nature. 1997;387(6629):188–91.
Article CAS PubMed Google Scholar
Kilby JM, Lee HY, Hazelwood JD, Bansal A, Bucy RP, Saag MS, Shaw GM, Acosta EP, Johnson VA, Perelson AS, et al. Treatment response in acute/early infection versus advanced AIDS: equivalent first and second phases of HIV RNA decline. AIDS. 2008;22(8):957–62.
Article CAS PubMed PubMed Central Google Scholar
Notermans DW, Goudsmit J, Danner SA, de Wolf F, Perelson AS, Mittler J. Rate of HIV-1 decline following antiretroviral therapy is related to viral load at baseline and drug regimen. AIDS. 1998;12(12):1483–90.
Article CAS PubMed Google Scholar
Louie M, Hogan C, Hurley A, Simon V, Chung C, Padte N, Lamy P, Flaherty J, Coakley D, Di Mascio M, et al. Determining the antiviral activity of tenofovir disoproxil fumarate in treatment-naive chronically HIV-1-infected individuals. AIDS. 2003;17(8):1151–6.
Article CAS PubMed Google Scholar
Love TM, Park SY, Giorgi EE, Mack WJ, Perelson AS, Lee HY. SPMM: estimating infection duration of multivariant HIV-1 infections. Bioinformatics. 2016;32(9):1308–15.
Article PubMed Google Scholar

Download references

Authors’ contributions

SP, WJM, AP, and HL analyzed the sequence data and performed the statistical analyses. SP, TL and HL formulated the mathematical models. All authors read and approved the final manuscript.

Acknowledgements

We thank Richard Neher and Su-Chan Park for helpful discussions and Nolan Goeken and Jason Kaufman for reviewing this manuscript.

Competing interests

The authors declare that they have no competing interests.

Funding

This work was supported by NIH Grants R01-AI083115 and R01-AI095066 (HYL). Portions of this work were done under the auspices of the US Department of Energy under Contract DE-AC52-06NA25396 and supported by NIH Grants R01-AI028433, R01-0D011095 and UM1-AI100645 (ASP).

Author information

Authors and Affiliations

Department of Molecular Microbiology and Immunology, Keck School of Medicine, University of Southern California, 1450 Biggy Street, Los Angeles, CA, 90089, USA
Sung Yong Park & Ha Youn Lee
Department of Biostatistics and Computational Biology, University of Rochester, Rochester, NY, 14642, USA
Tanzy M. T. Love
Theoretical Biology and Biophysics, Los Alamos National Laboratory, Los Alamos, NM, 87545, USA
Alan S. Perelson
Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA, 90089, USA
Wendy J. Mack

Authors

Sung Yong Park
View author publications
You can also search for this author in PubMed Google Scholar
Tanzy M. T. Love
View author publications
You can also search for this author in PubMed Google Scholar
Alan S. Perelson
View author publications
You can also search for this author in PubMed Google Scholar
Wendy J. Mack
View author publications
You can also search for this author in PubMed Google Scholar
Ha Youn Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ha Youn Lee.

Additional information

Sung Yong Park and Tanzy M. T. Love contributed equally to this work

Additional file

12977_2016_269_MOESM1_ESM.docx

Additional file 1. Mathematical models for HIV evolution under immune selection, Table S1 (The rate of HIV gene sequence diversification over 2 years from the first sample in 15 subjects), and Table S2 (Documented CD8+ T cell epitopes for statistically designated selection sites from 15 subjects).

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article

Park, S.Y., Love, T.M.T., Perelson, A.S. et al. Molecular clock of HIV-1 envelope genes under early immune selection. Retrovirology 13, 38 (2016). https://doi.org/10.1186/s12977-016-0269-6

Download citation

Received: 22 February 2016
Accepted: 11 May 2016
Published: 01 June 2016
DOI: https://doi.org/10.1186/s12977-016-0269-6

Molecular clock of HIV-1 envelope genes under early immune selection