### Limiting Dilution and nested PCR

cDNA template was diluted to ~1 viral genome per microliter. The dilution factor necessary to achieve single viral genomes was defined as the template dilution for which only 30% of reactions produced a product. According to a Poisson distribution, the cDNA dilution that yields PCR products in no more than 30% of wells contains one amplifiable cDNA template per positive PCR more than 80% of the time. This was empirically determined using a dilution series and varied between samples and cDNA preps. The dilution series and PCR reactions were set up using a QIAGEN BR3000 liquid handling robot (QIAGEN, Valencia, CA). All PCR reactions used Phusion High-Fidelity polymerase (Finnzymes, Espoo, Finland). A nested PCR approach was used for all amplifications. The following primers designed to amplify a region of the viral Nef gene were used for the first round of PCR: 5'-CAAAGAAGGAGACGGTGGAG-3' and 5'-CATCAAGAAAGTGGGCGTTC-3'. Second round PCR was conducted using 2 ul of the first round PCR product and the following internal primers were used for nested PCR: 5'-TCAGCAACTGCAGAACCTTG-3' and 5'-CGTAACATCCCCTTGTGGAA-3'. For all PCR reactions, the following conditions were used: 98C for 30 s, 30 cycles of: 98C for 5 s, 63C for 1 s and 72C for 10 s, followed by 72C for 5 min. PCR products were run on a 1.5% agaroe gel. PCR products were purified using the Chargeswitch kit (Invitrogen, Carlsbad, Calfornia, USA) according to the manufacturer's instructions. Samples were bi-directionally sequenced susing ET-terminator chemistry on an Applied Biosystems 3730 Sequencer (Applied Biosystems, Foster City, California, USA) and the internal primers described above. DNA sequence alignments were performed using CodonCode Aligner version 2.0 (CodonCode Corporation, Dedham, Massachusetts, USA).

### Modeling Sequence Evolution in Primary HIV-1/SIV Infection

The details of our model for characterizing sequence evolution in acute HIV-1 infection will be described by Lee et al. (HY Lee, EE Giorgi, BF Keele, B Gaschen, GS Athreya, JF Salazar-Gonzalez, KT Pham, PA Geopfert, JM Kilby, MS Saag, EL Delwart, MP Busch, BH Hahn, GM Shaw, BT Korber, T Bhattacharya, and AS Perelson, Modeling Sequence Evolution in Acute HIV-1 Infection, submitted for publication). We provide here an overview of the salient features of the model and its underlying assumptions. After transmission we assume that a systematic infection starts with a single infected cell in a new host. The number of secondary infections caused by one infected cell placed in a population of cells fully susceptible to infection is called the basic reproductive number, *R*
_{0}. The available data in humans infected with HIV-1 and in monkeys infected with SIV and SHIV show that virus grows exponentially until a viral load peak is attained a few weeks after infection [41–43]. Following the peak, viral levels decline and establish a set-point. At the set-point each infected cell, on average, successfully infects one other cell during its lifetime.

We assumed a homogeneous infection in which the virus grows exponentially with no selection pressure, no recombination, and a constant mutation rate across positions and across lineages. Cell infections occur randomly by the viruses released from an infected cell. Viral production starts on average about 24 hours after a cell is initially infected [44, 45], and most likely continues until cell death. While each of the *R*
_{0} infections could occur at different times, we took a first step in assessing the role of asynchrony by assuming the infections occur at two different times. The average time to new infection defines the viral generation time, *τ*. Each new infection entails a single round of reverse transcription introducing errors in the proviral DNAs with the number of mutations given by the Binomial distribution, *Binom* (*n*; *N*
_{
B
}, *ε*), where *n* is the number of new base substitutions. Binomial distribution implies that base substitutions occur independently with the probability of *ε* at each site of SIV genome with the length *N*
_{
B
}in each reverse transcription cycle. The Monte-Carlo model explicitly emulates all the new infection procedures with mutations, tracking the population of proviral *nef* genes of the infected cells by introducing base substitutions as infection propagates in a new host.

In Ref. [2], we determined that the MC simulation and the mathematical model showed a good agreement with the level of sequence diversity sampled from acute HIV-1 subjects presumably infected with a single variant. Based on the prediction made by the model, the group of identical sequences, usually the consensus sequence of sampled strains, was presumed to be the initial founder strain established by the systematic infection in each host. The parameters used in the acute HIV-1 model were: i) the average generation time of productively infected cells, defined as the average time interval between the infection of a target cell and the subsequent infection of new cells by progeny virions, estimated as 2 days [44], ii) HIV-1 single cycle forward mutation rate, estimated as *ε* = 2.16 × 10^{-5} per site per cycle [46], and iii) the basic reproductive ratio, defined as the number of newly infected cells that arise from any one infected cell when almost all cells are uninfected, estimated as *R*
_{0} = 6[41]. In the asynchronous infection model, the first time at which a newly infected cell infects other cells, *τ*, is chosen as 1.5 days. The length of *nef* gene, *N*
_{
B
}, we simulated is 792. We used these parameter values to analyze our data set. For example, calculated *R*
_{0} values during primary SIV infection from viral ramp-up slope ranged from 2.2 to 68 [43], which justifies the choice of *R*
_{0} = 6. Improvement of the model requires more accurate estimations for these basic parameters during SIV early infection.

The mutation rate, *ε*, and the generation time, *τ*, control the rate of increase in divergence and hence diversity. The larger the mutation rate, the faster the genomes mutate, hence the steeper the growth in diversity. The greater the generation time, the slower the genomes diversify, hence the smaller the growth in diversity. The slope of diversification is approximately proportional to *ε*/*τ*. On the other hand, *R*
_{0} mainly controls the growth in the infected cell population size. As the viral population grows, the number of cells one infected cell infects decreases due to the fact that fewer cells are available for infection. The basic reproductive ratio, *R*
_{0}, affects the rate of evolution in a relatively minor way. Low values (e.g. 2 ≤ *R*
_{0} ≤ 4), slow down the growth in the infected cell population, thus affecting the speed of evolution. For example, from *R*
_{0} = *6* to *R*
_{0} = *2* there is a 15.9% increase in the slope of diversity. On the other hand, for *R*
_{0} ≥ 6, the dependence of the rate of diversification on *R*
_{0}is reduced. The slope of diversity increases by 5.5% as we increase *R*
_{0} from 6 to 10. The dynamics of diversity do not depend on the number of initial infected cells.

Once we sample a finite number of sequences from the MC simulation at a given time, we first measure the Hamming distance (*HD*
_{0}) between each sampled sequence and the founder sequence and the Hamming distance (HD) between sequences sampled at the same time. Here Hamming distance is the number of base substitutions between two sequences. Based on the calculated *HD*
_{0} and *HD*, we define the basic measurements for quantifying the evolution of HIV-1 sequence populations. Divergence is defined as the average *HD*
_{0} per base from the initial founder strain; diversity is defined as the average intersequence Hamming distance per base among sequence pairs at a given time; variance is defined as the variance of the intersequence per base HD distribution; maximum HD is defined as the measured maximum HD between all sequence pairs sampled, and sequence identity is defined as the proportion of sequences identical to the founder strain. Both the MC simulation and mathematical calculation showed that divergence, diversity, and variance increase linearly as a function of time and sequence identity decays exponentially as a function of time [Fig. 2]. These behaviours are characteristics of neutral evolution, characterized as Poisson distribution and star-phylogeny topology. It has been shown that the distribution of pairwise genetic distances is an approximate Poisson in the evolution of mitochondrial DNA [28]. To address the issue of the finite size of samples, we repeated MC simulations sampling a finite number of nef genes at a given time and computed 95% CIs for each quantity. Then we examined whether the measurement of SIV *nef* gene samples was compatible with the model prediction or not. To infer the number of days elapsed since infection based on sampled strains, first we fit the Poisson distribution to the observed distribution of Hamming distances between sampled *nef* genes and the transmitted *nef* gene; we then determined the mean of the Poisson distribution and calculated days post infection using Eq. (2).

A key property of the Poisson distribution arising from neutral evolution without selection and recombination is that the level of diversity is comparable to that of variance. We used this property to examine whether sampled strains had evolved from a single founder strain or not. In each MC run, we obtained the values of diversity and variance from the sampled sequences with a given sample size at each time and located those values in the plane of diversity and variance. By repeating MC simulations, we collected all the values of diversity and variance and computed 95% CIs in the plane of diversity and variance. The computed 95% CIs form a conical region within which diversity and variance of the sampled sequences from the animal with homogeneous infection (i.e. infections with a single founder strain without any selection pressure or recombination) are expected to be located [Figure 5]. As we sample more, the conical region becomes smaller [Figure 5]. Another requirement for homogeneous infection is that the sequence diversity should be less than the upper limit of the 95% CIs of the diversity at a given time following infection with a single virus strain.