An endogenous lentivirus in the germline of a rodent
Retrovirology volume 19, Article number: 30 (2022)
Lentiviruses (genus Lentivirus) are complex retroviruses that infect a broad range of mammals, including humans. Unlike many other retrovirus genera, lentiviruses have only rarely been incorporated into the mammalian germline. However, a small number of endogenous retrovirus (ERV) lineages have been identified, and these rare genomic “fossils” can provide crucial insights into the long-term history of lentivirus evolution. Here, we describe a previously unreported endogenous lentivirus lineage in the genome of the South African springhare (Pedetes capensis), demonstrating that the host range of lentiviruses has historically extended to rodents (order Rodentia). Furthermore, through comparative and phylogenetic analysis of lentivirus and ERV genomes, considering the biogeographic and ecological characteristics of host species, we reveal broader insights into the long-term evolutionary history of the genus.
The lentiviruses (genus Lentivirus) are an unusual group of retroviruses (family Retroviridae) that infect mammals and are associated with a range of slow, progressive diseases in their respective host species groups  (Table 1). They are most familiar as the genus of retroviruses that includes human immunodeficiency virus type 1 (HIV-1), but the group also includes viruses that infect a broad range of other mammalian groups. Lentiviruses are distinguished from other retroviruses by several characteristic features, including several unique accessory genes, a characteristic nucleotide composition [2, 3], and the capacity to infect non-dividing target cells .
All retroviruses replicate via an obligate step in which a DNA copy of the viral genome is integrated into a host cell chromosome . The integrated viral genome is flanked at either side by identical long terminal repeat (LTR) sequences (a form referred to as a ‘provirus’), each composed of functionally distinct U3, R and U5 regions. Occasionally, germline cells may be infected and subsequently go on to form viable progeny, so that integrated retroviral proviruses are vertically inherited as host alleles . Such endogenous retroviruses (ERV) insertions are relatively common features in vertebrate genomes [7, 8]. Phylogenetic studies indicate that, following genome invasion, ERVs can increase their germline copy number through a variety of mechanisms, including active replication . However, most ERV insertions are genetically fixed and highly degraded by germline mutation, reflecting their ancient origins. Frequently, deletion of the entire internal coding region occurs via homologous recombination between the provirus LTRs, so that only a ‘solo LTR’ sequence is left behind .
Even though their sequences are often extensively degraded, ERVs provide a valuable source of retrospective information about the long-term evolutionary interactions between retroviruses and their hosts . For example, identification of orthologous ERV insertions in related species provides a robust means of deriving minimum age calibrations for retrovirus groups, based on host species divergence estimates (which are in part informed by the fossil record) . More broadly, ERV sequences can be used to explore the long-term evolutionary history of ancient—presumably extinct—retrovirus groups [13, 14], and to inform our understanding of their interactions with host genes . ERV sequences can even be used to guide the reconstitution of ancient retrovirus proteins so that their biological properties may be empirically investigated in vitro [16,17,18].
Lentiviruses have only rarely been incorporated into the germline of host species. However, a handful of Lentivirus-derived ERV lineages have now been identified (Table 1), and these sequences demonstrate that viruses clearly recognisable as lentiviruses circulated in mammals many millions of years ago. For example, rabbit endogenous lentivirus K (RELIK) insertions were found to occur at orthologous positions in the rabbit (Oryctolagus cuniculus) and hare (Lepus europaeus) genomes, demonstrating that genome invasion occurred prior to divergence of these species ~ 12 million years ago (Mya) [12, 19]. Endogenous lentiviruses have also been identified in lemurs (family Lemuridae) [20, 21]; mustelids (family Mustelidae) [22, 23]; and dermopterans (order Dermoptera—a group of arboreal gliding mammals native to Southeast Asia) [24,25,26]. Together, these sequences provide a range of minimum age calibrations in the Miocene epoch (23.5–5.3 Mya), based on host species divergence date estimates derived from the fossil record [11, 22, 25]. Widespread circulation among mammals is further supported by molecular clock-based age estimates that extend into the Eocene epoch (56–33.9 Mya) [24, 26].
In this study we perform comprehensive screening of published mammalian genomes and identify a previously unreported endogenous lentivirus lineage in the genome of the South African springhare (Pedetes capensis), demonstrating that lentivirus host range extends to rodents. Furthermore, through comparative and phylogenetic analysis, incorporating all available data, we provide broader insight into the origins and long-term evolutionary history of lentiviruses.
Materials and methods
Genome screening in silico
We used database-integrated genome screening (DIGS)  to derive a non-redundant database of lentivirus-derived ERV loci contained in published genome sequence assemblies. In DIGS, the output of systematic, sequence similarity search-based ‘screens’ is captured in a relational database. The DIGS tool  is a Perl-based framework in which the Basic Local Alignment Search Tool (BLAST) program suite (version 2.2.31+)  is used to perform systematic similarity searches of sequence databases (e.g., genome assemblies) and the MySQL relational database management system (MySQL Community Server version 8.0.30) is used to record and organise output data. WGS data of 431 mammalian species were obtained from the National Center for Biotechnology Information (NCBI) genome database  (Additional file 1: Table S1). Query polypeptide sequences were derived from representative lentivirus species (Table 1). DNA sequences in WGS assemblies that disclosed significant similarity to lentivirus queries (as determined by BLAST e-value) were classified via comparison to published retrovirus genome sequences (again using BLAST). Consensus genome sequences for endogenous lentivirus lineages were extracted from the supplementary material of associated publications, as follows: RELIK ; PSIV1 ; PSIV2 ; MELV ; DELV .
We compiled a set of endogenous lentivirus loci (Additional file 2: Table S2) by using structured query language) to filter screening the classified, non-redundant results of >130,000 searches, selecting matches based on their degree of similarity to lentivirus reference sequences, or the taxonomic characteristics of the species in which they occur. Using this approach we separated putatively novel lentivirus ERV loci from both (a) orthologs or paralogs of previously characterised lentivirus ERVs, and (b) non-lentiviral sequences that cross-matched to lentivirus probes due to shared ancestry (e.g., clade II ERVs) [30, 31]. We confirmed that putative novel lentivirus ERVs were indeed derived from lentiviruses (rather than other, related retroviruses) through phylogenetic and genomic analysis as described below.
Phylogenetic and genomic analysis
Nucleotide and protein phylogenies were reconstructed using maximum likelihood (ML) as implemented in RAxML (version 8.2.12) . Protein substitution models were selected via hierarchical maximum likelihood ratio test using the PROTAUTOGAMMA option in RAxML. To estimate the ages of solo LTRs we measured divergence from an LTR consensus sequence and applied a neutral rate calibration, as described by Subramanian et al. . We used Se-Al (version 2.0) to visualise alignments and create consensus sequences .
Results and discussion
We systematically screened WGS data representing 431 mammalian species (Additional file 1: Table S1) for endogenous lentivirus loci using similarity search-based approaches. As probes we used a comprehensive set of polypeptide products derived from the reference genomes shown in Table 1. We identified a total of 842 distinct lentivirus-derived ERV loci, most of which represented members of previously described lentivirus ERV lineages (Table 2, ). However, we also identified lentivirus-derived sequences in the genome of a species group in which they have not previously been described—rodents (order Rodentia).
Matches to lentiviral Gag and Pol proteins were identified in WGS data of the South African springhare (Pedetes capensis), and the reverse transcriptase (RT) coding region encoded by one of these ERVs groups with previously described lentivirus species (Additional file 3: Fig. S1a). Initially, only four copies of Springhare endogenous lentivirus (SpELV) were identified in the P. capensis genome. However, we were able to identify the 5’ LTR of a partial provirus sequence by using sequences upstream from the gag ORF of the longest SpELV insertion (and spanning the region where a 5’LTR might be expected) as a query in BLASTn-based searches of the P. capensis genome assembly. This revealed the presence of a repetitive sequence showing the characteristic features of a retroviral LTR (i.e., ~ 500 nucleotides in length with terminal TG and CA dinucleotides) in the expected position upstream of the Gag ORF. Using this LTR sequence as input for screening enabled us to identify another 10 SpELV loci represented by solo LTR sequences (Table 3). We generated a consensus SpELV genome using all fourteen loci identified in our screen (Additional file 4: Fig. S2). We did not identify an envelope (env) gene associated with any SpELV insertions, nor did we identify any contigs containing complete proviruses with paired LTR sequences. Furthermore, because the longest provirus sequence we identified was truncated in pol we could not determine whether any accessory genes might have been encoded downstream of this gene. Nonetheless, the partial genome obtained in our analysis exhibits the characteristic features of lentivirus genomes, including (a) a primer-binding site specific for tRNA Lysine (Additional file 5: Fig. S3); (b) a Pro-Pol ORF expressed via -1 ribosomal frameshifting (Additional file 5: Fig. S3); (c) an adenine-rich (34%) genome (Additional file 6: Fig. S4) containing few CpG dinucleotides (0.29%); (d) a putative trans-activator response (TAR) element (Additional file 4: Fig. S2, Additional file 5: Fig. S3). We estimated the age of the SpELV lineage utilising a molecular clock-based approach in which divergence is calculated by comparing individual LTR sequences to an LTR consensus . We obtained age estimates in the range of 8–18 Mya for SpELV loci (Table 3), consistent with an origin in the Middle Miocene.
We used maximum likelihood-based phylogenetic approaches to reconstruct the evolutionary relationships between contemporary lentiviruses and the extinct lentiviruses represented by ERVs. Phylogenetic trees based on conserved regions of Gag-Pol clearly separate the Lentiviruses into two robustly supported subclades (Fig. 1). One (here labelled ‘Archaeolentivirus’) contains SpELV together with dermopteran endogenous lentivirus (DELV) which occurs in the germline of colugos (an unusual group of arboreal gliding mammals that are native to Southeast Asia) [24,25,26]. A second (here labelled ‘Neolentivirus’) contains all other endogenous lentivirus lineages and all known contemporary lentiviruses. We obtained relatively high support for internal branching relationships within the Neolentivirus clade–reconstructions support the existence of a distinct ‘primate’ group of neolentiviruses containing both simian and prosimian sub-lineages, and an ‘artiodactyl’ group incorporating both the bovine lentiviruses and the small ruminant lentiviruses. In addition, the primate lentiviruses group separately from all other neolentiviruses, which together constitute a ‘grasslands-associated’ clade comprised of lentiviruses that infect(ed) grassland-adapted host species (Fig. 1).
Plotting information about (a) known lentivirus distribution and (b) biogeographic range onto a time-calibrated phylogeny of boreoeutherian mammals provides some thought-provoking insights into lentivirus ecology and evolution (Fig. 2). Firstly, minimum age estimates established via orthology demonstrate that lentiviruses were widespread in the Miocene Epoch (i.e. ~ 20–5 Mya), both in terms of their host range and biogeographic distribution. It could potentially be significant that the diverse mammalian groups in which lentiviruses of the ‘grassland-associated’ clade are found (horses, bovids, mustelids and felids—see Fig. 1) all adapted to a grassland habitat during this period, in interconnected biogeographic areas (Laurasia and Africa) [36,37,38] (Fig. 2).
Regarding the ultimate origins of lentiviruses in mammals, molecular clock-based analyses of DELV insertions supports the presence of archaeolentiviruses in Asia (the only region where colugos occur) up to 60 Mya  – i.e., throughout most of the Cenozoic Era. The identification of SpELV shows that archeolentiviruses also circulated in springhare ancestors, which are found only in Africa. This raises the question of whether archeolentiviruses could have been present in the rodent-colugo ancestor that existed > 80 Mya  (Fig. 2). Such ancient origins would be consistent with the presence of primate lentivirus ancestors in the common ancestor of haplorrhine and strepsirrhine primates, and the arrival of lentiviruses in Madagascar ~ 60 Mya with founder populations of ancestral lemurs [40, 41] (Fig. 2). However, if extensive transmission between mammalian orders has occurred in the past, there would be other ways to account for observed lentivirus distributions without invoking such ancient origins.
We describe a novel endogenous lineage in the genome of the South African springhare. The identification of SpELV demonstrates that lentivirus host range has historically extended to rodents.
Availability of data and materials
All data are openly available in the Lentivirus-GLUE project hosted on GitHub: https://giffordlabcvr.github.io/Lentivirus-GLUE/.
Narayan O, Clements JE. Biology and pathogenesis of lentiviruses. J Gen Virol. 1989;70(Pt 7):1617–39.
van der Kuyl AC, Berkhout B. The biased nucleotide composition of the HIV genome: a constant factor in a highly variable virus. Retrovirology. 2012;9:92.
van Hemert F, van der Kuyl AC, Berkhout B. On the nucleotide composition and structure of retroviral RNA genomes. Virus Res. 2014;193:16–23.
Yamashita M, Emerman M. Retroviral infection of non-dividing cells: old and new perspectives. Virology. 2006;344(1):88–93.
Brown PO. Integration. In: Coffin JM, Hughes SM, Varmus HE, editors. Retroviruses. Cold Spring Harbor Laboratory Press: Cold Spring Harbor; 1997.
Gifford R, Tristem M. The evolution, distribution and diversity of endogenous retroviruses. Virus Genes. 2003;26(3):291–315.
Hayward A, Cornwallis CK, Jern P. Pan-vertebrate comparative genomics unmasks retrovirus macroevolution. Proc Natl Acad Sci U S A. 2015;112(2):464–9.
Johnson WE. Origins and evolutionary consequences of ancient endogenous retroviruses. Nat Rev Microbiol. 2019;17(6):355–70.
Stoye JP. Studies of endogenous retroviruses reveal a continuing evolutionary saga. Nat Rev Microbiol. 2012;10(6):395–406.
Belshaw R, et al. Rate of recombinational deletion among human endogenous retroviruses. J Virol. 2007;81(17):9437–42.
Gifford RJ. Viral evolution in deep time: lentiviruses and mammals. Trends Genet. 2012;28(2):89–100.
Keckesova Z, et al. Identification of a RELIK orthologue in the European hare (Lepus europaeus) reveals a minimum age of 12 million years for the lagomorph lentiviruses. Virology. 2009;384(1):7–11.
Halo JV, et al. Origin and recent expansion of an endogenous gammaretroviral lineage in domestic and wild canids. Retrovirology. 2019;16(1):6.
Diehl WE, et al. Tracking interspecies transmission and long-term evolution of an ancient retrovirus using the genomes of modern mammals. Elife. 2016;5: e12704.
Compton AA, Malik HS, Emerman M. Host gene evolution traces the evolutionary history of ancient primate lentiviruses. Philos Trans R Soc Lond B Biol Sci. 2013;368(1626):20120496.
Dewannieux M, et al. Identification of an infectious progenitor for the multiple-copy HERV-K human endogenous retroelements. Genome Res. 2006;16(12):1548–56.
Blanco-Melo D, Gifford RJ, Bieniasz PD. Co-option of an endogenous retrovirus envelope for host defense in hominid ancestors. Elife. 2017. https://doi.org/10.7554/eLife.22519.
Goldstone DC, et al. Structural and functional analysis of prehistoric lentiviruses uncovers an ancient molecular interface. Cell Host Microbe. 2010;8(3):248–59.
Katzourakis A, et al. Discovery and analysis of the first endogenous lentivirus. Proc Natl Acad Sci U S A. 2007;104(15):6261–5.
Gifford RJ, et al. A transitional endogenous lentivirus from the genome of a basal primate and implications for lentivirus evolution. Proc Natl Acad Sci U S A. 2008;105(51):20362–7.
Gilbert C, et al. Parallel germline infiltration of a lentivirus in two Malagasy lemurs. PLoS Genet. 2009;5(3): e1000425.
Han GZ, Worobey M. Endogenous lentiviral elements in the weasel family (Mustelidae). Mol Biol Evol. 2012;29(10):2905–8.
Cui J, Holmes EC. Endogenous lentiviruses in the ferret genome. J Virol. 2012;86(6):3383–5.
Hron T, et al. Endogenous lentivirus in Malayan colugo (Galeopterus variegatus), a close relative of primates. Retrovirology. 2014;11(1):84.
Han GZ, Worobey M. A primitive endogenous lentivirus in a colugo: insights into the early evolution of lentiviruses. Mol Biol Evol. 2015;32(1):211–5.
Hron T, et al. Life history of the oldest lentivirus: characterization of ELVgv integrations in the dermopteran genome. Mol Biol Evol. 2016;33(10):2659–69.
Zhu H, et al., Database-integrated genome screening (DIGS): exploring genomes heuristically using sequence similarity search tools and a relational database. bioRxiv, 2018.
Altschul SF, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nuc Acids Res. 1997;25:3389–402.
Kitts PA, et al. Assembly: a resource for assembled genomes at NCBI. Nucleic Acids Res. 2016;44(D1):D73-80.
Dewannieux M, Heidmann T. Endogenous retroviruses: acquisition, amplification and taming of genome invaders. Curr Opin Virol. 2013;3(6):646–56.
Gifford R, et al. Evolution and distribution of class II-related endogenous retroviruses. J Virol. 2005;79(10):6478–86.
Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014;30(9):1312–3.
Subramanian RP, et al. Identification, characterization, and comparative genomic distribution of the HERV-K (HML-2) group of human endogenous retroviruses. Retrovirology. 2011;8:90.
Rambaut A. Se-Al: sequence alignment editor. 1997: Edinburgh.
Gifford RJ. Lentivirus-GLUE. 2021; Available from: https://giffordlabcvr.github.io/Lentivirus-GLUE.
Ge D, et al. Evolutionary history of lagomorphs in response to global environmental change. PLoS ONE. 2013;8(4): e59668.
Toljagić O, et al. Millions of Years Behind: Slow Adaptation of Ruminants to Grasslands. Syst Biol. 2017;67(1):145–57.
Law CJ. Evolutionary shifts in extant mustelid (Mustelidae: Carnivora) cranial shape, body size and body shape coincide with the Mid-Miocene Climate Transition. Biol Lett. 2019;15(5):20190155.
Springer MS, et al. The historical biogeography of Mammalia. Philos Trans R Soc Lond B Biol Sci. 2011;366(1577):2478–502.
Karanth KP, et al. Ancient DNA from giant extinct lemurs confirms single origin of Malagasy primates. Proc Natl Acad Sci U S A. 2005;102(14):5090–5.
Poux C, et al. Asynchronous colonization of Madagascar by the four endemic clades of primates, tenrecs, carnivores, and rodents as inferred from nuclear genes. Syst Biol. 2005;54(5):719–30.
Gifford RJ, et al. Nomenclature for endogenous retrovirus (ERV) loci. Retrovirology. 2018;15(1):59.
Dimmic MW, et al. rtREV: an amino acid substitution matrix for inference of retrovirus and reverse transcriptase phylogeny. J Mol Evol. 2002;55(1):65–73.
Kumar S, et al. TimeTree: A Resource for Timelines, Timetrees, and Divergence Times. Mol Biol Evol. 2017;34(7):1812–9.
Zuker M. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 2003;31(13):3406–15.
We thank Daniel Blanco-Melo, Anne Emory, Ron Swanstrom and Greg Towers for helpful discussions.
RJG is funded by the Medical Research Council of the United Kingdom (MC_UU_12014/12). NIH funding. The funding bodies had no role in the design of the study and collection, analysis, and interpretation of data, or in writing the manuscript.
Ethics approval and consent to participate
Consent for publication
The authors declare no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Table S1. Whole genome sequence assemblies screened in this study.
Table S2. Details of all endogenous lentivirus loci identified in this study.
Figure S1. Phylogenetic and genomic characteristics of springhare endogenous lentivirus. (a) Maximum likelihood (ML) phylogeny based on an alignment of reverse transcriptase (RT) protein sequences and showing the reconstructed evolutionary relationships between lentiviruses and other retroviruses. Asterisks indicate nodes with bootstrap support > 70% (1000 replicates). The scale bar shows evolutionary distance in substitutions per site. (b) ML phylogeny showing reconstructed evolutionary relationships between SpELV long terminal repeat (LTR) sequences. Numbers next to nodes indicate bootstrap support values (1000 replicates). The scale bar shows evolutionary distance in substitutions per site. (c) Consensus genome structures of ancient lentiviral paleoviruses. DELV = Dermopteran endogenous lentivirus; RELIK = Rabbit endogenous lentivirus type K; Mustelidae endogenous lentivirus (MELV); BIV = Bovine immunodeficiency virus; SIV = Simian immunodeficiency virus; FIV = Feline immunodeficiency virus; Human immunodeficiency virus = HIV; Prosimian immunodeficiency virus = PSIV; RV = Retrovirus; LV = Leukemia virus.
Figure S2. The SpELV consensus sequence. Inverted repeats present at the ends of the 5′ long terminal repeat (LTR) sequence are highlighted in light grey. Regions of nucleic acid secondary structure, the transactivation responsive (TAR) element and primer binding site (PBS) are highlighted in dark grey. The locations of the proteins encoded by the gag and pol genes were determined by homology to the DELV consensus sequence [24,25,26].
Figure S3. The putative SpELV TAR (transactivation responsive region) element. Secondary structures were predicted using the MFOLD thermodynamic folding algorithm  and assessed by comparison to well-characterised examples in other lentiviruses.
Figure S4. Nucleotide compositional bias in lentivirus genomes. Nucleotide composition of whole genomes of Lentiviruses were normalised to length and plotted as percentages using R in R Studio (version 4.2.1). Reference genome sequences for each virus correspond to those given in Table 1. Bovine immunodeficiency virus (BIV), Dermopteran endogenous lentivirus (DELV), Equine infectious anaemia virus American strain (EIAV_Am), Feline immunodeficiency virus (FIV), Human immunodeficiency virus 1 (HIV_1M), Mustelidae endogenous lentivirus (MELV), Prosimian immunodeficiency virus 2 (PSIV); Rabbit endogenous lentivirus type K (RELIK), Springhare endogenous lentivirus (SpELV), Small ruminant lentivirus A (SRLV_A); Adenine (A), Guanine (G), Cytosine (C), Thymine (T).
About this article
Cite this article
Kambol, R., Gatseva, A. & Gifford, R.J. An endogenous lentivirus in the germline of a rodent. Retrovirology 19, 30 (2022). https://doi.org/10.1186/s12977-022-00615-2