Genetic characterization of the complete genome of a highly divergent simian T-lymphotropic virus (STLV) type 3 from a wild Cercopithecus mona monkey

Background The recent discoveries of novel human T-lymphotropic virus type 3 (HTLV-3) and highly divergent simian T-lymphotropic virus type 3 (STLV-3) subtype D viruses from two different monkey species in southern Cameroon suggest that the diversity and cross-species transmission of these retroviruses are much greater than currently appreciated. Results We describe here the first full-length sequence of a highly divergent STLV-3d(Cmo8699AB) virus obtained by PCR-based genome walking using DNA from two dried blood spots (DBS) collected from a wild-caught Cercopithecus mona monkey. The genome of STLV-3d(Cmo8699AB) is 8913-bp long and shares only 77% identity to other PTLV-3s. Phylogenetic analyses using Bayesian and maximum likelihood inference clearly show that this highly divergent virus forms an independent lineage with high posterior probability and bootstrap support within the diversity of PTLV-3. Molecular dating of concatenated gag-pol-env-tax sequences inferred a divergence date of about 115,117 years ago for STLV-3d(Cmo8699AB) indicating an ancient origin for this newly identified lineage. Major structural, enzymatic, and regulatory gene regions of STLV-3d(Cmo8699AB) are intact and suggest viral replication and a predicted pathogenic potential comparable to other PTLV-3s. Conclusion When taken together, the inferred ancient origin of STLV-3d(Cmo8699AB), the presence of this highly divergent virus in two primate species from the same geographical region, and the ease with which STLVs can be transmitted across species boundaries all suggest that STLV-3d may be more prevalent and widespread. Given the high human exposure to nonhuman primates in this region and the unknown pathogenicity of this divergent PTLV-3, increased surveillance and expanded prevention activities are necessary. Our ability to obtain the complete viral genome from DBS also highlights further the utility of this method for molecular-based epidemiologic studies.

STLVs have been identified in diverse Old World monkeys and apes. STLV-1 has been found in at least 20 different Old World primate species in Africa and Asia, and phylogenetic analysis shows that STLV-1s cluster by geography rather than by host species suggesting they are easily transmitted among NHPs [2,3,5,16,17]. There are currently seven recognized PTLV-1 subtypes (A to G) that are comprised of genetically related HTLV-1 and STLV-1 strains from different primate species. The close relatedness and clustering of the various HTLV-1s and STLV-1s into distinct subtypes suggests that at least seven independent cross-species transmission events formed the genetic diversity of HTLV-1. Currently STLV-2 is comprised of only two strains, STLV-2(PP1664) and STLV-2(PanP), both of which were identified in two different troops of captive bonobos (Pan paniscus) [6].
Here, we report the first full-length genome sequence of STLV-3(Cmo8699AB) from a wild C. mona monkey. We confirm that this virus is a highly divergent and novel STLV-3. Across the genome, we found evidence that STLV-3d(Cmo8699AB) is unique from other PTLVs. Robust phylogenetic analysis of major gene regions of STLV-3d(Cmo8699AB) as well as new tax sequences from the divergent STLV-3d(Cni3034) and STLV-3d(Cni3038) viruses demonstrate that STLV-3d(Cmo8699AB) is a novel and ancient lineage outside the diversity of all known PTLV-3, thus strongly supporting its subtype D designation. Detailed examination of the complete genome predicted that all enzymatic, structural, and regulatory genes were intact. Viral replication and pathogenic potential shown or hypothesized for other PTLV-3s have yet to be determined [14,15,30]. Given the inferred ancient origin of STLV-3d(Cmo8699AB), its prevalence in two primate species from the same geographical region, and the documented propensity for STLVs to cross species boundaries, STLV-3d may be more widespread than currently realized. These results underscore an unknown public health concern for STLV-3d, particularly in a region with frequent exposure to NHPs through hunting and butchering.

DNA preparation and PCR-based genome walking
Using the NucliSens nucleic acid isolation kits (Biomérieux, Durham, NC) as previously described [24], nucleic acids were extracted from two dried blood spots (DBS) each collected by two different hunters from a wildcaught C. mona monkey (Cmo8699AB) and a C. nictitans monkey (Cni7867AB). Due to the limited DBS material available, we successfully maximized DNA yield through additional elution of nucleic acids from the silica beads with water. DNA from Cni3034 and Cni3038 were prepared from whole blood using the Qiagen DNA extraction protocol (Valencia, CA). DNA quality and yield were evaluated in a semi-quantitative PCR amplification of the βactin gene as previously described [31,32] and confirmed with the QuantiT dsDNA HS Assay kit (Invitrogen, Carlsbad, CA). A minimum total input of 10 ng of DNA was used in each reaction mixture with standard PCR conditions. DNA preparation and PCR assays were performed in different laboratories specifically equipped for the processing and testing of only NHP samples according to established precautions to prevent contamination.

Sequence and phylogenetic analysis and dating the origin of STLV-3d(Cmo8699AB)
Comparison of the full-length, gap-stripped PTLV-3 genomes was performed with the SimPlot program (Version 3.5.1) where STLV-3d(Cmo8699AB) was the query sequence using the F84 (ML) model and a transition/ transversion ratio of 2.0 [33]. RNA secondary structure of the LTR region was predicted using the mfold web server program [34] found at http://mfold.bioinfo.rpi.edu/. Prediction of splice acceptor (sa) and splice donor (sd) sites was performed using the NetGene2 program available at the web server http://www.cbs.dtu.dk/services/NetGene2/ [35]. Identification and analysis of ORFs were performed using the ORF Finder program available at http:// www.ncbi.nlm.nih.gov/projects/gorf/.
Percent nucleotide divergence was calculated using the DNASTAR MegAlign 7.2 software (http://www.DNAS TAR.com). For phylogenetic analysis two datasets were used. To investigate the phylogenetic relationship between PTLV, the first dataset included tax sequences from complete PTLV genomes available at GenBank and the new STLV-3 tax sequences from Cmo8699AB, Cni7867AB, Cni3034, Cni3038, and Lal9859 obtained in the current study, respectively. For further phylogenetic resolution of STLV-3d among PTLV, a larger dataset was used and included concatenated gag, pol, env, and tax sequences from complete PTLV genomes available at Gen-Bank and the complete genome of STLV-3d(Cmo8699AB) determined here. Sequences were aligned using the Clustal W program, followed by manual editing and removal of indels. Nucleotide substitution saturation was assessed using pair-wise transition and transversion versus divergence plots using the DAMBE program [36]. Unequal nucleotide composition was measured by using the TREE-PUZZLE program [37]. Nucleotide substitution models and parameters were estimated from the edited Clustal W sequence alignments by using Modeltest v3.7 [38]. A variant of the general time reversible (GTR) model, which allows six different substitution rate categories (r A ↔ C = 2.62, r A ↔ G = 13.07, r A ↔ T = 2.79, r C ↔ G = 2.26, r C ↔ T = 4.54, r G ↔ T = 1) with gamma-distributed rate heterogeneity (α = 0.7071) and an estimated proportion of invariable sites (0.3436) was determined to best fit the data for the tax only alignments. The best model for the concatenated gag-pol-env-tax alignment was GTR+G, with six different rate substitutions (r A ↔ C = 2.53, r A ↔ G = 11.47, r A ↔ T = 2.58, r C ↔ G = 2.15, r C ↔ T = 4.3, r G ↔ T = 1) and gammadistributed rate heterogeneity (α = 0.366). Phylogenetic trees were inferred using Bayesian analysis implemented in the BEAST software package [39] and with maximum likelihood (ML) using the PhyML program available online at the webserver http://atgc.lirmm.fr/phyml/ [40]. Support for branching order of the ML-inferred trees was  8699TR5  TTT GGT AGG GAT TTT TGT  TAG GAA GG   2560   Inner  7867EF2  TCC TTG TAT CTT TTT CCC  CAT TGG   8699TR1  AAG GTA TTG TAG AGG CGA  GCT GAC   2147 evaluated using 500 bootstraps. Two independent BEAST runs consisting of 10 -100 million Markov Chain Monte Carlo (MCMC) generations for the tax only and PTLV concatamer alignments, respectively, with a sampling every 1,000 generations, an uncorrelated log-normal relaxed molecular clock, and a burn-in of 100,000 to 1 million generations. Both the constant coalescent and the Yule process of speciation were used as tree priors to infer the viral tree topologies. Convergence of the MCMC was assessed by calculating the effective sampling size (ESS) of the runs using the program Tracer (v1.4; http:// beast.bio.ed.ac.uk/Tracer). All parameter estimates showed significant ESSs (> 300). The tree with the maximum product of the posterior clade probabilities (maximum clade credibility tree) was chosen from the posterior distribution of 9,001 sampled trees (after burning in the first 1,000 sampled trees) with the program TreeAnnota-tor version 1.4.6 included in the BEAST software package [40]. Trees were viewed and edited using FigTree v1.1.2 http://tree.bio.ed.ac.uk/software/figtree.
Divergence dates for the most recent common ancestor (MRCA) of STLV-3d(Cmo8699AB) were obtained by using both the tax only and the concatenated gag-pol-envtax alignments, using Bayesian inference and using a relaxed molecular clock in the BEAST program. The PTLV evolutionary rate assumed a global molecular clock model and was estimated according to the formula: evolutionary rate (r) = branch length (bl)/divergence time (t) [27]. Divergence dates were obtained from well-established genetic and archaeological evidence for the timing of migration of the ancestors of indigenous Melanesians and Australians from Southeast Asia [14,16,29,41]. The PTLV evolutionary rate was estimated by using the diver-STLV-3d(Cmo8699AB) genomic organization (a) and schematic representation of PCR-based genomic walking strategy (b) Figure 1 STLV-3d(Cmo8699AB) genomic organization (a) and schematic representation of PCR-based genomic walking strategy (b). (a) Non-coding long terminal repeats (LTR), coding regions for all major proteins (gag, group specific antigen; pro, protease; pol, polymerase; env, envelope; rex, regulator of expression; tax, transactivator). (b) Short tax and LTR sequences (fragments A, G, H, and I) were amplified using generic primers as previously described [7,27,31]. Using a previously described PCR-based genomic walking strategy [14], the complete proviral sequence (8913-bp) was then obtained by using STLV-3d-specific primers located within each major gene region in combination with generic PTLV primers (fragments B -F). Amplicon sizes are approximated with the solid bars. The positions of predicted donor (sd) and acceptor (sa) splice sites are shown in parentheses.
gence time of 40,000 -60,000 years ago (ya) for the Melanesian HTLV-1 lineage (HTLV-1mel) and 15,000-30,000 ya for the most recent common ancestor of HTLV-2a/ HTLV-2b native American strains as strong priors in a Bayesian MCMC relaxed molecular clock method implemented in the BEAST software package [39]. The use of two calibration points has previously been shown to provide more reliable estimates of PTLV substitution rates than a single calibration date [41,42]. The upper and lower divergence times estimated from anthropological data were used to define the interval of a strong uniform prior distribution from which the MCMC sampler would sample possible divergence times for the corresponding node in the tree.
The predicted Tax and Gag proteins of STLV-3d(Cmo8699AB) were the most conserved proteins with the highest similarity (90 and 89%, respectively) to other prototypical PTLV-3 strains ( Table 2). The highest genetic divergence between STLV-3d(Cmo8699AB) and other PTLV-3s was found in the non-coding LTR region (26-29%), and in the protease (Pro) (21-24%) and Rex (28 -31%) proteins (Table 2). These genetic relationships are further illustrated in a similarity plot analysis comparing STLV-3d(Cmo8699AB) with other prototypical PTLV-3s across the entire genome (Fig. 2), where the highest and lowest sequence identities were observed in the tax and LTR regions, respectively.

Evolutionary relationship of STLV-3d to other PTLVs
Analysis of the two PTLV datasets for nucleotide substitution saturation using pair-wise transition and transversion versus divergence plots revealed that transitions and transversions plateaued at the 3 rd codon positions (cdp) indicating sequence saturation (data not shown) as previously observed [42]. In contrast, transitions and transversions increased linearly for the 1 st and 2 nd cdp without reaching a plateau indicating they still retained enough phylogenetic signal (data not shown). The BEAST and PhyML programs were then used to infer phylogenetic relationships of PTLV sequences using only 1 st and 2 nd cdp and the bestfit parameters defined above. The final nucleotide alignment lengths were 630-bp and 4126-bp for the tax only and viral concatamer sequences, respectively. Robust phylogenetic analysis of concatenated gag-pol-env-tax STLV-3d(Cmo8699AB) (Fig. 3) and tax sequences (Fig. 4) as well as sequences from other PTLV inferred a novel PTLV-3 subtype with very high posterior probabilities and bootstrap support. STLV-3d(Cmo8699AB) formed a distinct lineage from known PTLV-3 East African (subtype A) and West and Central African (subtype B) clades (Fig 3). Fulllength genome sequences were not available for West African STLV-3c found in four C. nictitans or from STLV-3b sequences identified in L. albigena and C. cephus from Cameroon [20,26] for these analyses. However, phylogenetic analysis using longer tax sequences we obtained from two of these STLV-3 subtype C viruses (Cni3034 and Cni3038) and from a single L. albigena (Lal9859NL) indeed inferred a fourth distinct molecular subtype containing the STLV-3d(Cmo8699AB) and Cni7867AB tax sequences (Fig. 4). The new STLV-3(Lal9589NL) sequence clustered with other subtype B sequences from West-Central Africa (Fig. 4). Moreover, we identified another STLV-Similarity plot analysis of the full-length STLV-3d(Cmo8699AB) and prototypical PTLV-3 genomes using a 200-bp window size in 20 step increments on gap-stripped sequences Figure 2 Similarity plot analysis of the full-length STLV-3d(Cmo8699AB) and prototypical PTLV-3 genomes using a 200-bp window size in 20 step increments on gap-stripped sequences. The F84 (maximum likelihood) model was used with an estimated transition-to-transversion ratio of 2.28. HTLV-3b(Pyl43) was not included in the analysis because of its high identity (> 99%) to STLV-3b(CTO604) and because of a 366-bp deletion in the pX region of this virus [15]. Identification of a highly divergent STLV-3 subtype inferred by phylogenetic analyses of partial PTLV tax sequences (630-bp) Figure 4 Identification of a highly divergent STLV-3 subtype inferred by phylogenetic analyses of partial PTLV tax sequences (630-bp). First and second codon positions were used to generate PTLV phylogenies by sampling 10,000 trees with a Markov Chain Monte Carlo method under a relaxed clock model, and the maximum clade credibility tree, i.e. the tree with the maximum product of the posterior clade probabilities, is shown. Maximum likelihood trees were also inferred using the program PhyML and identical tree topologies were obtained with both methods. Posterior probabilities of inferred Bayesian topologies (numerator) and bootstrap support (1,000 replicates) for PhyML topologies (denominator) are provided at major nodes. STLV-3d and other new sequences generated in the current study from STLV-3c and STLV-3b-infected animals are boxed. Branch lengths are proportional to median divergence times in years estimated from the post-burn in trees with the scale at the bottom indicating 20,000 years. 3 subtype D strain, STLV-3d(Cni7867AB) from a C. nictitans in the same geographic region that has 99% identity to STLV-3(Cmo8699AB) in the LTR-gag, pol-env, and tax-LTR regions and clusters tightly within the STLV-3 subtype D clade (Fig. 4). Combined, these results strongly support the identification and taxonomic classification of STLV-3(Cmo8699AB) and STLV-3(Cni7867AB) as a new PTLV-3 subtype. As has been shown before using individual genes, the phylogeny of the PTLV-3 clade in relation to PTLV-1, PTLV-2, and PTLV-4 was not completely resolved in the current Bayesian inference and clustered weakly with PTLV-2 and PTLV-4 using the gag-pol-env-tax concatamer and with PTLV-1 when using the tax only dataset (Figs. 3, 4).

Divergence dates for the most recent common ancestor of STLV-3d(Cmo8699AB)
Additional molecular analyses were performed to estimate the divergence times of the MRCA of the potential new PTLV-3 subtype lineage using the 1 st and 2 nd cdp alignments and Bayesian inference and two independent fossil calibration points. The posterior mean evolutionary rate for PTLV was estimated to be 6.29 × 10 -7 and 5.36 × 10 -7 substitutions/site/year (Table 3) for the concatenated gene and the tax only alignments, respectively, which is consistent with rates determined previously both with and without enforcing a molecular clock [14,[21][22][23]29,41]. The mean MRCA of STLV-3d(Cmo8699AB) is inferred to have split from PTLV-3a and PTLV-3b 115,117 ya (52,822 -200,926 ya, 95% high posterior distribution (HPD)) based on the PTLV concatamer alignments (Table  3) suggesting that this is the oldest PTLV-3 lineage identified to date. Using the conserved tax only alignment STLV-3c and STLV-3d shared a common ancestor about 18,452 ya (4,386 -36,666 ya 95% HPD) compared to 41,524 ya (17,149 -68,097 ya 95% HPD) for divergence of STLV-3a and -b ( Table 3). The inferred mean MRCA for the PTLV-3 group is 75,795 ya (33,342 -127,209 ya 9% HPD) and 120,574 ya (52,894 -201,260 ya 95% HPD) based on the tax only and PTLV concatamer alignments, respectively. The divergence dates for PTLV-3 inferred in the current analyses are higher than those reported previously because our analyses include the two new highly divergent STLV-3c and -d viruses which increase substantially  the MRCA date for this clade. All other PTLV divergence dates are consistent with those obtained recently using 1 st and 2 nd cdp of individual PTLV genes, including the finding of lower divergence dates using only highly conserved tax genes [42].

STLV-3d proteome analysis
The predicted protein translation of the STLV-3d(Cmo8699AB) genome revealed all major structural and enzymatic (Gag, Pro, Pol, and Env) and regulatory proteins (Tax and Rex) ( Fig. 1 and Table 2). Analysis of the overlapping open reading frames (ORFs) of gag and pro and pro and pol predicts that translation occurs by one or more successive -1 ribosomal frameshifts that align different ORFs. The conserved-slippage sequence (6(A)-8 nt-6(G)-11 nt-6(C) can be found in the gag-pro overlap of STLV-3d(Cmo8699AB). The pro-pol overlap slippage sequence has the same point mutation found among the other prototypical PTLV-3s (GTTAAAC versus TTTAAAC in HTLV-1, HTLV-2, and HTLV-4). Comparable to other PTLV-3s, the Gag protein of STLV-3d(Cmo8699AB) is composed of 420 amino acids (aa) and is predicted to cleave into three core protein products: p19 (matrix), p24 (capsid), and p15 (nucleocapsid). One of the most highly conserved PTLV domains, the Gag protein of STLV-3d(Cmo8699AB) has > 88% similarity to that of prototypical PTLV-3 subtypes ( Table 2). The highest amino acid similarity to other PTLV-3 subtypes is found in the p24 capsid protein (94 -96%), while the p15 nucleocapsid protein was the most divergent (80 -83%).
The predicted length of the STLV-3d(Cmo8699AB) Env glycoprotein is 493 aa, similar to the Env protein of STLV-3b(CtoNG604) and HTLV-3b(Pyl43) (10,35). The surface (SU) and transmembrane (TM) proteins are comparable to all other PTLV-3 subtypes at 315 aa and 178 aa, respectively. The TM protein is highly conserved across PTLV-3 subtypes (90 -92% similarity) including STLV-3d(Cmo8699AB). The high aa identity of the Gag p24 and Env proteins suggests that this divergent virus would be cross-reactive on standard HTLV-1/2 Western blot (WB) assays. Unfortunately, serum or plasma was not available from animals Cmo8699AB and Cni7867AB to confirm this hypothesis. The STLV-3d(Cmo8699AB) SU also contains highly conserved residues believed important for viral entry (data not shown) similar to those described recently for HTLV-3b(Pyl43) [47].
PTLV Tax proteins are important for the trans-activation of viral gene expression, viral replication and viral pathogenesis. Comparison of the Tax proteins of prototypical PTLVs and STLV-3d revealed the conservation of critical functional motifs including the nuclear localization signal (NLS), cAMP response element (CREB) binding protein (CBP)/P300 binding motifs, and nuclear export signal (NES) motifs (data not shown). Amino acid sequences (M1, M22, and M47) that are important for Tax1 transactivation and activation of κβ (NF-κβ) pathway [48] are also preserved in the STLV-3d(Cmo8699AB) and STLV-3d(Cni7867AB) Tax proteins (data not shown). The C-terminal transcriptional activating domain (CR2) at positions 313 -318 of the protein is important for CBP/P300 binding and up-regulation of transcription and is also present. The CR2 motif [(S/T)T(V/I)PFS] is conserved among all PTLV-3 subtypes and is identical to those found in STLV-3a subtypes. In the Tax C-terminus, STLV-3d also possesses a conserved PDZ-binding motif present in PTLV-1 and PTLV-3 Tax but not in PTLV-2 or HTLV-4 [14,30,42,49,50]. The PDZ domain has been shown to be an important binding site for Tax in mediating signal transduction and interleukin-2-independent growth induction for T-cell transformation [50,51]. Taken together, preservation of the predicted STLV-3d(Cmo8699AB) Tax protein sequence motifs suggests Tax interactions with cellular regulatory pathways similar to those of both PTLV-1 and PTLV-3. All functional motifs, including a potential PDZ domain, are present in the STLV-3d Tax, although 12 aa residues are missing from the N-terminus of the Tax proteins of STLV-3c(Cni3034 and Cni3038) obtained in the current study. This suggests that these motifs are highly conserved among the very divergent PTLV-3 group (data not shown).
The pX encoding region between env and the 3' LTR contains multiple coding regions shown to be important for HTLV-1 viral replication T-cell activation, and cellular gene expression with two of the open reading frames (ORFs) encoding for the ubiquitous Tax and Rex proteins [54]. Two putative splice donor sites with high confidence were predicted at positions 414 and 5058 in the LTR (sd-LTR) and Env (sd-Env), respectively, that code for the Env protein (Fig. 1). A conserved splice acceptor site is located at position 7552 that with the sd-Env site code for the singly spliced Tax and Rex proteins (Fig. 1). The positions of these putative splice junction sites are similar to those of other PTLV-3s [14,15,[21][22][23]27]. Analysis of the pX region of STLV-3d(Cmo8699AB) revealed only a single additional ORF (ORFI that begins with a methionine and is predicted to code for a proteins of 131 aa in length (Fig.  1)), in contrast to other PTLV-3s which have been predicted to have at least two additional ORFs in the pX region. BLAST analysis of the ORFI protein resulted in matches to miscellaneous fungal and mammalian proteins with very low identity (< 30%). Further studies are required to evaluate the function of the ORFI viral protein.
In vivo studies have demonstrated that the recently characterized basic leucine zipper (bZIP) factor found on the complementary minus-strand of the HTLV-1 RNA genome [55] can enhance viral infectivity and persistence [56]. Although originally discovered in HTLV-1 [55] and thus called the HTLV-1 bZIP (HBZ) protein, putative HBZ proteins have also been reported for all other PTLVs [14,15,42]. Consequently it has been proposed that HBZ be renamed as the HTLV antisense protein (ASP) [57]. As with other PTLVs, the ASP ORF of STLV-3d(Cmo8699AB) has a 21-aa arginine-rich region followed by 4 conserved leucine heptads and a leucine octet (Fig. 7), suggesting a similar inactivation pathway of cyclic AMP response element (CREB-2) and therefore, down-regulation of viral transcription [14,55]. Interestingly, the first "leucine" heptad in HTLV-1 and other PTLVs starts with another nonpolar amino acid: phenylalanine. This is unlike the leucine typically found in mammalian bZIP proteins. ASP has also been reported to modulate Tax activity by binding to the transcription factors JunB and c-Jun [58] as well as the ubiquitous AP-1 regulatory element [59]. The finding of an AP-1 site in the STLV-3d(Cmo8699AB) LTR may be a novel method for the regulation of viral transcription by ASP, as recently suggested for HTLV-3b(2026ND) [14]. Additional studies are necessary to validate and investigate a role for ASP in Tax expression and PTLV replication.

Discussion
Screening of human populations with high exposure to NHPs has resulted in the successful discovery of novel retroviruses, including HTLV-3, HTLV-4, and simian foamy virus (SFV) [1,7,8,32]. We have previously demonstrated that hunter-collected DBS specimens from wild-caught NHPs are not only an effective collection strategy to demonstrate STLV diversity but also allow for monitoring of retroviral cross-species transmission events at the primatehunter interface [24]. Using these primate DBS specimens, we recently identified novel STLV-3s in wild-caught C. mona and C. nictitans monkeys by analysis of partial gene sequences [24]. To characterize this new virus, we obtained its complete proviral genome using nucleic acids extracted entirely from two DBS, collected by two hunters in the field. To our knowledge, this is the first full-length genome of a simian retrovirus obtained entirely from DBS. The ability to generate a complete viral genome from the equivalent of about 0.25 ml whole blood demonstrates further the utility of this collection strategy for monitoring and characterizing viral diversity.
Robust phylogenetic analysis of both the conserved tax region and gag-pol-env-tax concatenated sequences inferred a novel lineage with high statistical support within the PTLV-3 clade that is highly divergent. The formation of a fourth lineage within the diversity of PTLV-3, containing STLV-3 sequences from two distinct primate species (C. mona and C. nictitans), strongly supports the proposed nomenclature and classification of this new virus as STLV-3 subtype D. The discovery of nearly identical STLV-3d(Cmo8699AB and Cni7867AB) viruses in two different primate species within the same region of Cameroon and the inferred ancient divergence of STLV-3d about 115,000 ya also suggests a higher prevalence and a more widespread distribution for this virus.
PTLVs have an ancient evolutionary history with the ancestral HTLVs being inferred to have first occurred many thousands of years ago following zoonotic transmission from STLV-infected NHPs [14,24,29,42,61]. This finding contrasts with the relatively recent emergence of the human immunodeficiency virus (HIV) from simian immunodeficiency virus-infected NHPs in the last century [62,63]. The recent discovery of HTLV-3 and HTLV-4 and novel STLV-1-like viruses among people who hunt and butcher NHPs suggests that these interspecies transmission events are not rare and are most likely contemporaneous [1,7]. From phylogenetic analysis it has been inferred that STLV-1 may have crossed species boundaries to humans on at least seven separate occasions resulting in the multiple HTLV-1 subtypes [28]. Given the inferred ancient origin of STLV-3d(Cmo8699AB) and PTLV-3, the wide geographic distribution of STLV-3 across Africa, the long history of human exposure to simians in Africa and the lack of screening for HTLV in blood banks in Africa, human infections with STLV-3d-like viruses might be expected to occur there. Thus, although HTLV-3 so far has only been identified in three persons from Cameroon and all three are subtype B viruses, it is tempting to speculate that like HTLV-1 diversity, HTLV-3 diversity will be driven by transmission of each of the four STLV-3 subtypes to humans. More surveillance studies at the NHP-human interface are needed to determine the prevalence, diver-Conservation of the antisense protein (ASP) of STLV-3d(Cmo8699AB) and other prototypical PTLV-3s Figure 7 Conservation of the antisense protein (ASP) of STLV-3d(Cmo8699AB) and other prototypical PTLV-3s. Conserved arginine-rich region and potential leucine zipper motifs are indicated.
Molecular differences between HTLV-1 and HTLV-2 Tax proteins have been proposed to modulate function, transmissibility, and pathogenesis [61]. We therefore examined the predicted protein sequences of STLV-3d(Cmo8699AB) to determine whether important functional and regulatory motifs were present to infer the replication-competency and pathogenic potential for this divergent viral subtype. All enzymatic, structural, and regulatory proteins were preserved in STLV-3d(Cmo8699AB), including the ubiquitous Tax binding domains CBP/P300, NES, and CR2, which are all important for viral transcription and transformation [64][65][66]. In addition, the presence of a PDZ-binding motif in the STLV-3d Tax, which has been shown to be critical in signal transduction and T-cell transformation of HTLV-1infected cells [50,51], suggests that the STLV-3d(Cmo8699AB) Tax is more similar to the Tax of PTLV-1 and other PTLV-3s, than it is to the Tax of PTLV-2 which lacks a PDZ motif [14,42]. Furthermore, as has been demonstrated with all PTLVs, STLV-3d also possesses a conserved ASP basic leucine zipper motif in the antisense strand between the env and tax/rex gene regions. ASP has been shown to participate in regulation of viral replication and possibly oncogenesis [50,51]. Combined, these findings show that the STLV-3d(Cmo8699AB) genome is intact, is likely to be replication competent, and may have a pathogenic potential similar to HTLV-1 which is also predicted for HTLV-3 subtype B; however, further studies are required to validate this hypothesis.
The LTR region of STLV-3d(Cmo8699AB) has two of the three 21-bp repeat Tax-responsive elements (TRE) typically found in the HTLV-1 and HTLV-2 LTRs. The three TREs (distal, central, and proximal) are involved in basal transcription and have been shown to confer Tax1, Tax2, and Tax3 responsiveness [67]. Studies have also shown that mutations in the central TRE compared to the distal or proximal TRE-1 result in the greatest loss of basal transcription levels [68]. As with all PTLV-3s, the STLV-3d(Cmo8699AB) LTR lacks only the distal TRE, which does not appear to have deleterious effects on gene expression and viral replication [30,69]. Nonetheless, more studies are necessary to determine if these differences will affect the transcriptional activity of STLV-3d(Cmo8699AB).
Another notable difference of STLV-3d from other PTLV-3s was observed in the leucine-rich activation region of the putative NES domain of the Rex protein involved in regulation of viral expression. STLV-3d(Cmo8699AB) has a single aa mutation from aspartic acid or glycine to alanine at position 94 similar to that seen in the HTLV-2 Rex protein (Rex2). Mutagenesis studies substituting alanine for serine residues in this region have demonstrated a significant reduction in the phosphorylation activation required for efficient RNA binding of Rex-2 [70]. These results suggest that the alanine mutation at aa position 94 of the STLV-3d(Cmo8699AB) Rex may also have a similar loss of biologic activity. The effects of these changes on the processing of viral transcripts and regulation of viral replication by the STLV-3d Rex will require further investigation.

Conclusion
In summary, complete genome analysis of STLV-3d(Cmo8699) reveals this novel virus is a highly divergent member of the PTLV-3 group that we name subtype D. We show by robust genetic analysis that STLV-3d(Cmo8699AB) has an ancient origin and an intact genome. Furthermore, we demonstrate that complete viral genomes can be obtained using limited amounts of genomic material extracted from DBS collected in the field. This collection strategy will facilitate the monitoring of viral diversity and cross-species transmission at the human-primate interface. Expanded surveillance will help us to better understand the epidemiology and public health importance of STLV zoonoses.
istry of Defense, Ministry of Scientific Research and Innovation and Ministry of Forestry and Fauna provided authorizations and support for this work. Use of trade names is for identification only and does not imply endorsement by the U.S. Department of Health and Human Services, the Public Health Service, or the Centers for Disease Control and Prevention. The findings and conclusions in this report are those of the authors and do not necessarily represent the views of the Centers for Disease Control and Prevention.