Here we report the first complete nucleotide sequence and genomic characterization of the recently discovered HTLV-4. We show that the genome of this novel human virus is genetically equidistant from HTLV-1, HTLV-2, and HTLV-3. Robust phylogenetic and molecular clock analysis confirms that HTLV-4 clearly falls outside the diversity of PTLV-1, PTLV-2, and PTLV-3, demonstrating that HTLV-4 is the only known member of a distinct PTLV group we call PTLV-4. Combined, these results strongly support the HTLV-4/PTLV-4 nomenclature proposed for this virus . The phylogenetic stability seen across HTLV-4 and other PTLV genomes also demonstrates the absence of major recombination events occurring in PTLV despite evidence of dual infections in humans and primates [9, 49]. Furthermore, these results support the distinct evolutionary history of HTLV-4 and other PTLVs demonstrating that they are not recent genetic recombinants from pre-existing viral genomes. This finding contrasts with other retroviruses like HIV in which frequent recombination contributes substantially to genetic diversity .
Bayesian MCMC statistical methods have recently been developed to accurately infer dates of evolutionary events, to investigate the origin of viral epidemics, and to estimate historical population dynamics [32, 51]. Molecular dating of the HTLV-4 predecessor using these robust methods suggests that this novel PTLV lineage originated almost 200 millennia ago, which predates the inferred origin of the ancestors of HTLV-1, HTLV-2, and HTLV-3 by about 76,000 – 191,000 ya . Two equally parsimonious hypotheses on the origin of HTLV-4 can thus be proposed by the inferred ancient existence of the PTLV-4 lineage. First, it is possible that HTLV-4(1863LE) is a current descendent of the ancestral PTLV-4 that infected humans as they evolved in Africa and represents a strain circulating within humans living in this geographic region. Interestingly, the inferred date of the HTLV-4 ancestor also coincides with the appearance of Homo sapiens sapiens, estimated to have occurred around 200 – 400 K ya, suggesting the emergent human lineage may have been a suitable host for the ancestral PTLV-4. If this is not just an evolutionary historical coincidence of both virus and host, then HTLV-4 may indeed be the oldest human deltaretrovirus as inferred from the molecular dating of all four HTLV groups. Alternatively, HTLV-4(1863LE) could also be the result of a more recent zoonotic infection with a very divergent STLV present in NHPs in the forests of Cameroon. Additional information on the diversity of HTLV-4 and its likely simian counterpart will be needed to determine whether HTLV-4(1863LE) truly originated as H. sapiens sapiens evolved, and persists in humans today, or represents a more recent zoonotic transmission from an NHP. As of yet, a simian counterpart of HTLV-4 has not been identified in Cameroon or elsewhere despite the identification of other novel STLVs in this region [9, 10, 22]. Nonetheless, the inability to find "STLV-4" may be due to sampling and screening biases in the selection of NHP species and the geographic locations examined [9, 32].
The inference of an ancient split of HTLV-4(1863LE) from the PTLV-2 lineage, combined with the wide geographic distribution of STLVs and a history of STLVs crossing into humans [2, 8–10, 18–21], all imply that HTLV-4 infection may be more prevalent. Repeated and historical cross-species infections of humans with various STLV-1 strains led to the emergence and dissemination of several HTLV-1 subtypes in West-Central Africa [2, 4–6]. Similar evidence suggests that the newly identified HTLV-3 infections also potentially arose from multiple, independent past or contemporary introductions of different STLV-3 strains into humans [6, 8, 31]. Given that both HTLV-1 and HTLV-2 followed human population migrations out of Africa and across the globe as humans evolved, HTLV-4 and HTLV-3 may also have spread globally. A more precise determination of the origin and distribution of HTLV-4 infection will require further studies, such as expanded surveillance in both humans and NHPs. However, serosurveys for HTLV-4 may be complicated by the inability to discriminate this infection from HTLV-2 since they both show similar WB profiles and the sensitivity of serological assays for identifying HTLV-4 is currently unknown [6, 35]. Thus, additional diagnostic tools are required to determine the level of HTLV-4 penetration into the general population and to search for the potential primate origin of HTLV-4(1863LE). Screening for HTLV-4 will be facilitated by the development and application of serologic and molecular assays based on the sequences reported here. For example, since the HTLV-4 Gag matrix and nucleocapsid and the envelope surface proteins are divergent from PTLV-1, PTLV-2, and PTLV-3 it may be possible to use them in serologic assays to differentiate the four PTLV groups.
Virus classification is a topic of ongoing discussion and suggestions for nomenclature are typically based on lumping or splitting of taxa into distinct groups. Deltaretrovirus species are classified by the International Committee on Taxonomy of Viruses (ICTV) by differences in genome sequence and viral oncogenes, antigenic properties, natural host range, and pathogenicity. For example, HTLV-1 and HTLV-2 are distinguished mostly by phylogenetic diversity and variable disease outcomes of each virus. Recently, a new deltaretrovirus species, STLV-5, was proposed based on limited analyses of small tax/rex sequences from a Macaca arctoides (strain MarB43) that was originally classified as STLV-1 [4, 10]. Herein, we show by using robust phylogenetic analysis of major coding regions and complete viral genomes that expansion of the current PTLV nomenclature from four to six putative major taxonomic species or groups should be considered. Our natural classification of PTLV groups is based on rigorous phylogenetic inference that demonstrates with high confidence the formation of very distinctive monophyletic lineages outside the diversity of all known viral groups, combined with genetic distances demonstrating the putative new lineage is nearly equidistant from all previously characterized groups, and the placement of the new PTLV groups near the root of the PTLV phylogeny. The first four PTLV phylogroups consist of HTLV-1/STLV-1, HTLV-2, HTLV-3/STLV-3, and HTLV-4. We confirm the existence of the putative STLV-5(MarB43) lineage, while the sixth group consists of the STLV-2(PanP) and STLV-2(PP1664) viruses. However, for simplicity we suggest maintaining the STLV-2 nomenclature historically used for this particular viral group. Each proposed new viral group clearly falls outside the diversity of their nearest PTLV relatives (PTLV-1 and HTLV-2, respectively), is monophyletic with strong bootstrap support and posterior probabilities, and are all roughly genetically equidistant from other PTLVs, and hence should all be classified as distinct viral species. As with all viral nomenclature, PTLV classification as proposed here will require approval of ICTV.
In addition to understanding viral evolutionary history, analysis of full-length genomes can also provide basic information on the replication and pathogenic potential of new viruses. Thus, we examined in detail the genetic structure and sequence of HTLV-4 to determine if important functional motifs involved in viral expression and HTLV-induced leukemogenesis are preserved [26–30, 44]. All enzymatic, regulatory, and structural proteins are well conserved in HTLV-4(1863LE), including conserved functional motifs in Tax that are important for viral gene expression and T-cell proliferation, suggesting HTLV-4 is replication competent. We also observed several important molecular features of the HTLV-4 genome involved in viral expression and pathogenicity that are either similar or distinct from other HTLVs. For example, the absence of a PDZ domain in the Tax protein of HTLV-4(1863LE), known to be important in cellular signal transduction and T-cell transformation [29–31], is similar to what is seen in HTLV-2 but not in HTLV-1 and HTLV-3 . The absence of PDZ suggests that the HTLV-4 Tax may be more phenotypically similar to the HTLV-2 than the HTLV-1 Tax. Furthermore, the high amino acid identity of the Tax4 and Tax2 proteins also suggests that Tax4 may function similarly to Tax2 . However, whether the absence of a PDZ domain in HTLV-4 is associated with an absence of specific cellular and/or clinical outcomes like HTLV-2 will require further investigation.
We also identified unique putative c-Myb and Pbx-1 transcription factor binding sites in the U3 region of the LTR of HTLV-4(1863LE). c-Myb is a proto-oncogene that is expressed in T cells induced by mitogen or antigenic stimulation and is involved in cell cycle progression and proliferation of T lymphocytes, such that continuous deregulation of cell cycling may play a role in leukemogenesis . c-Myb has been shown to bind to the HTLV-1 and feline leukemia virus LTRs to increase viral transcription [53, 54]. Like c-Myb, dysregulation of the homeoprotein Pbx-1 can also increase leukemogenesis by disturbing hematopoiesis . We demonstrate here that the potential c-Myb binding site in the HTLV-4 LTR specifically binds c-Myb, suggesting that it may also promote LTR-mediated viral expression and which may help overcome the loss of the distal 21-bp repeat element observed in the HTLV-4 LTR. For example, Pbx-1 has been demonstrated to up-regulate transcription of another retrovirus, murine leukemia virus (MuLV), by binding to conserved Pbx-1 transcription factor sites present in MuLV LTRs . The presence of putative c-Myb and Pbx-1 binding sites in the HTLV-4 LTR may provide novel mechanisms of transcriptional control at both the viral and cellular levels not previously known for HTLV. Nevertheless, involvement of the putative novel binding sites in viral transcription and leukemogenesis will require additional studies.
Although originally reported to be exclusive to HTLV-1 , we now provide additional evidence for a putative HBZ region among all PTLVs, including HTLV-4(1863LE). Despite the absence of canonical bZIP domains, preliminary experiments show that proteins are transcribed from the HTLV-3, and -4 antisense mRNAs and all were potent inhibitors of Tax induction of HTLV LTR activity with similar cellular localizations like that of the HTLV-1 HBZ (unpublished data). These results not only confirm the predicted HBZ sequences and proteins in these viruses but also demonstrate the potential importance of HBZ in PTLV replication. The finding of a potential bZIP region on the antisense strand of all PTLV genomes also indicates that the nomenclature for this protein should be renamed from HBZ to AEP for antisense encoding protein as suggested . The potential role of AEP in HTLV-induced oncogenesis may be less clear since HTLV-1 and HTLV-2 infection result in different clinical outcomes, while pathologies for HTLV-3 and HTLV-4 have not yet been reported. Additional studies are required to confirm the potential effect of the predicted AEP transcripts and proteins on HTLV-4 and PTLV expression and any role they may have on leukemogenesis.