Endogenous lentivirus in Malayan colugo (Galeopterus variegatus), a close relative of primates
© Hron et al.; licensee BioMed Central Ltd. 2014
Received: 30 June 2014
Accepted: 9 September 2014
Published: 4 October 2014
A significant fraction of mammalian genomes is composed of endogenous retroviral (ERV) sequences that are formed by germline infiltration of various retroviruses. In contrast to other retroviral genera, lentiviruses only rarely form ERV copies. We performed a computational search aimed at identification of novel endogenous lentiviruses in vertebrate genomes.
Using the in silico strategy, we have screened 104 publicly available vertebrate genomes for the presence of endogenous lentivirus sequences. In addition to the previously described cases, the search revealed the presence of endogenous lentivirus in the genome of Malayan colugo (Galeopterus variegatus). At least three complete copies of this virus, denoted ELVgv, were detected in the colugo genome, and approximately one hundred solo LTR sequences. The assembled consensus sequence of ELVgv had typical lentivirus genome organization including three predicted accessory genes. Phylogenetic analysis placed this virus as a distinct subgroup within the lentivirus genus. The time of insertion into the dermopteran lineage was estimated to be more than thirteen million years ago.
We report the discovery of the first endogenous lentivirus in the mammalian order Dermoptera, which is a taxon close to the Primates. Lentiviruses have infiltrated the mammalian germline several times across millions of years. The colugo virus described here represents possibly the oldest documented endogenization event and its discovery can lead to new insights into lentivirus evolution. This is also the first report of an endogenous lentivirus in an Asian mammal, indicating a long-term presence of this retrovirus family in Asian continent.
The lentiviruses have been described in several mammalian orders, including Primates, Artiodactyls, Perissodactyls, and Carnivores. They are the cause of a variety of chronic diseases and constitute a major public health concern, especially due to the HIV/AIDS pandemic. In contrast to other retroviral genera, lentiviruses rarely generate ERV copies . The ERVs are formed following germline infection and further vertical transmission of the integrated provirus . The presence of such genomic “viral fossils” enables the study of long-term evolutionary history and evolution of lentiviruses . The first endogenous lentivirus has been described in 2007 in the genome of European rabbit . Since then, there have been only a few additional reports of lentiviruses infiltrating into the genomes of hares, lemurs and ferrets -. We have performed a large-scale screening of all publicly available vertebrate genomes for the presence of endogenous lentivirus sequences. Here, we report the identification of the first endogenous lentivirus in the mammalian order Dermoptera, in the genome of the Malayan colugo (G. variegatus). We discuss the genomic and phylogenetic characteristics of this virus, which place it as one of the oldest described members of the lentivirus genus.
Further BLAST searches of the colugo genomic contigs revealed the presence of three complete ELVgv proviruses (provirus Ι at positions 11,594-19,841 of contig JMZW01084956; provirus ΙΙ at positions 14,164-23,469 of contig JMZW01174031; provirus ΙΙΙ at positions 40,701-51,516 of contig JMZW01021293). This search also identified approximately 100 solo long terminal repeats (LTR), which are formed by recombination between the two LTRs flanking the viral internal sequences . The BLASTn parameters employed for the identification of solo LTRs were the following: e-value < 10−100, identity to the LTR of full-length ELVgv provirus at least 80%, and coverage at least 50%. In addition, several smaller contigs containing fragments of internal virus sequences were detected (data not shown). The colugo genome assembly covers majority of the genome (assembly size 2.8 Gbp, accession number JMZW00000000), therefore it can be assumed that there are at least three complete provirus copies and ~30 times more solo LTRs per genome.
There are four lines of evidence suggesting that ELVgv inserted into the colugo germline millions of years ago. First, the three complete proviruses accumulated many genetic defects. These include insertions and deletions of various sizes, multiple frameshifts and stop codons, and insertions of SINE and LINE sequences (Figure 2). Second, the solo LTRs are formed only after prolonged existence in the germline . Third, comparison of LTR sequences belonging to individual proviruses can be used to estimate the insertion times . These estimates are only very approximate and use the fact that the 5’ and 3’ LTRs are identical at the time of insertion. Any divergence between them is supposed to have been formed postintegration and at neutral substitution rate of the host genome . We assumed the range of mammalian substitution rates to be between 2.2 and 4.5 × 10−9 per site per year ,. The provirus Ι had 20 differences between 5’ and 3’ LTRs, resulting in an estimated time of insertion of 5.1 - 10.3 million years ago (MYA). Similarly, proviruses ΙΙ and ΙΙΙ yielded integration time estimates of 10.1 - 20.7 MYA and 13.2 - 27.0 MYA, respectively. We note that all three proviruses have different perfect or almost perfect target site duplications, indicating that they have not undergone recombination events after integration and that the LTRs belong to the original integrating virus (Figure 2). The genetic distances between the individual proviruses are between 0.078 and 0.105 substitutions per site. However, we did not attempt to use the distances to estimate the integration age. It is not known whether they were formed by independent insertions of circulating exogenous virus, by reinfection of germline cells or by intracellular retrotranspositions. In addition, the assembly of genomic contigs from short Illumina reads is inherently very difficult in repeat regions that include ERVs. Especially the parsing of reads among the orthologous internal positions of different proviruses might not be exact. A fourth line of evidence pointing to ancient origin of ELVgv came from the fact that seven of the solo LTR insertions reside in regions of apparent segmental genomic duplications (Additional file 8). The virus integration must have happened before the duplication event. This allows estimating the lower age limit of the integrations, which is up to 7 MYA.
The Malayan colugo (G. variegatus) belongs to a tiny order Dermoptera, which contains only one additional extant species, Philippine colugo (Cynocephalus volans) . Colugos, primates, and treeshrews (Scandentia) cluster together in a taxonomic subgroup Euarchonta . There is an ongoing dispute about the placement of Dermoptera. Chromosome painting comparison of these groups suggested that tree-shrews and colugos had a closer phylogenetic relationship and formed a sister group to primates . However, screening of protein-coding exons indicated that colugos are closer to primates than to tree-shrews . In either scenario, the split of the dermopteran lineage is estimated to be between 80–90 MYA. This is considerably older than the highest estimate of the ELVgv insertion age and indicates that the genome invasion was an independent event in Dermoptera. In accordance with this fact, about half of the ELVgv integration sites could be identified in primates and other mammals in its empty pre-integration form (data not shown). It will be informative to ascertain the presence of ELVgv in the Cynocephalus genus, which diverged from the genus Galeopterus about 18.3 MYA ,, and in the multiple subspecies of Galeopterus variegatus. The timescale of the ELVgv genome infiltration is at the upper limit of the previously described lentiviral invasions in leporid species (12 MYA) ,, lemurs (4.2 MYA) , and ferrets (12 MYA) ,. The source and ancestral relationships between these ancient lentiviruses are not possible to resolve with the current data due to the inconclusive nature of phylogenetic analyses. The ancient origin and presence in a potentially closest relative of primates makes the colugo virus an interesting addition to the lentivirus family and may add to our understanding of lentivirus evolution.
DE and JP designed the study. All authors participated in the data collection and analysis, and in writing of the manuscript. All authors read and approved the final manuscript.
We would like to acknowledge Richard K. Wilson and The Genome Institute, Washington University School of Medicine, for the generation and public release of the Galeopterus sequence assembly. We thank to Jiří Hejnar and members of his laboratory for helpful comments to the manuscript. This work was funded by program LK11215 provided by the Czech Ministry of Education, Youth and Sports. Access to computing and storage facilities provided by ELIXIR CZ and the National Grid Infrastructure MetaCentrum, administered under the programme “Projects of Large Infrastructure for Research, Development, and Innovations” (LM2010005), is greatly appreciated.
- Johnson WE: A proviral puzzle with a prosimian twist. Proc Natl Acad Sci U S A. 2008, 105 (51): 20051-20052. 10.1073/pnas.0811419106.PubMed CentralView ArticlePubMedGoogle Scholar
- Stoye JP: Studies of endogenous retroviruses reveal a continuing evolutionary saga. Nat Rev Microbiol. 2012, 10 (6): 395-406.PubMedGoogle Scholar
- Katzourakis A, Tristem M, Pybus OG, Gifford RJ: Discovery and analysis of the first endogenous lentivirus. Proc Natl Acad Sci U S A. 2007, 104 (15): 6261-6265. 10.1073/pnas.0700471104.PubMed CentralView ArticlePubMedGoogle Scholar
- Gifford RJ, Katzourakis A, Tristem M, Pybus OG, Winters M, Shafer RW: A transitional endogenous lentivirus from the genome of a basal primate and implications for lentivirus evolution. Proc Natl Acad Sci U S A. 2008, 105 (51): 20362-20367. 10.1073/pnas.0807873105.PubMed CentralView ArticlePubMedGoogle Scholar
- Gilbert C, Maxfield DG, Goodman SM, Feschotte C: Parallel germline infiltration of a lentivirus in two Malagasy lemurs. PLoS Genet. 2009, 5 (3): e1000425-10.1371/journal.pgen.1000425.PubMed CentralView ArticlePubMedGoogle Scholar
- Cui J, Holmes EC: Endogenous lentiviruses in the ferret genome. J Virol. 2012, 86 (6): 3383-3385. 10.1128/JVI.06652-11.PubMed CentralView ArticlePubMedGoogle Scholar
- Keckesova Z, Ylinen LM, Towers GJ, Gifford RJ, Katzourakis A: Identification of a RELIK orthologue in the European hare (Lepus europaeus) reveals a minimum age of 12 million years for the lagomorph lentiviruses. Virology. 2009, 384 (1): 7-11. 10.1016/j.virol.2008.10.045.PubMed CentralView ArticlePubMedGoogle Scholar
- Han GZ, Worobey M: Endogenous lentiviral elements in the weasel family (Mustelidae). Mol Biol Evol. 2012, 29 (10): 2905-2908. 10.1093/molbev/mss126.PubMed CentralView ArticlePubMedGoogle Scholar
- Belshaw R, Watson J, Katzourakis A, Howe A, Woolven-Allen J, Burt A, Tristem M: Rate of recombinational deletion among human endogenous retroviruses. J Virol. 2007, 81 (17): 9437-9442. 10.1128/JVI.02216-06.PubMed CentralView ArticlePubMedGoogle Scholar
- Zuker M: Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 2003, 31 (13): 3406-3415. 10.1093/nar/gkg595.PubMed CentralView ArticlePubMedGoogle Scholar
- Kohany O, Gentles AJ, Hankus L, Jurka J: Annotation, submission and screening of repetitive elements in Repbase: RepbaseSubmitter and Censor. BMC Bioinformatics. 2006, 7: 474-10.1186/1471-2105-7-474.PubMed CentralView ArticlePubMedGoogle Scholar
- Petersen TN, Brunak S, Heijne G, Nielsen H: SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat Methods. 2011, 8: 785-786. 10.1038/nmeth.1701.View ArticlePubMedGoogle Scholar
- Krogh A, Larsson B, von Heijne G, Sonnhammer EL: Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol. 2001, 305 (3): 567-580. 10.1006/jmbi.2000.4315.View ArticlePubMedGoogle Scholar
- Duckert P, Brunak S, Blom N: Prediction of proprotein convertase cleavage sites. Protein Eng Des Sel. 2004, 17 (1): 107-112. 10.1093/protein/gzh013.View ArticlePubMedGoogle Scholar
- Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S: MEGA5: Molecular Evolutionary Genetics Analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol. 2011, 28 (10): 2731-2739. 10.1093/molbev/msr121.PubMed CentralView ArticlePubMedGoogle Scholar
- Edgar RC: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004, 32 (5): 1792-1797. 10.1093/nar/gkh340.PubMed CentralView ArticlePubMedGoogle Scholar
- Dimmic MW, Rest JS, Mindell DP, Goldstein RA: rtREV: an amino acid substitution matrix for inference of retrovirus and reverse transcriptase phylogeny. J Mol Evol. 2002, 55 (1): 65-73. 10.1007/s00239-001-2304-y.View ArticlePubMedGoogle Scholar
- Huelsenbeck JP, Ronquist F: MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics (Oxford, England). 2001, 17 (8): 754-755. 10.1093/bioinformatics/17.8.754.View ArticleGoogle Scholar
- Johnson WE, Coffin JM: Constructing primate phylogenies from ancient retrovirus sequences. Proc Natl Acad Sci U S A. 1999, 96 (18): 10254-10260. 10.1073/pnas.96.18.10254.PubMed CentralView ArticlePubMedGoogle Scholar
- Kumar S, Subramanian S: Mutation rates in mammalian genomes. Proc Natl Acad Sci U S A. 2002, 99 (2): 803-808. 10.1073/pnas.022629899.PubMed CentralView ArticlePubMedGoogle Scholar
- Chinwalla AT, Cook LL, Delehaunty KD, Fewell GA, Fulton LA, Fulton RS, Graves TA, Hillier LW, Mardis ER, McPherson JD, Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, Agarwala R, Ainscough R, Alexandersson M, An P, Antonarakis SE, Attwood J, Baertsch R, Bailey J, Barlow K, Beck S, Berry E, Birren B, Bloom T, Bork P, et al: Initial sequencing and comparative analysis of the mouse genome. Nature. 2002, 420 (6915): 520-562. 10.1038/nature01262.View ArticlePubMedGoogle Scholar
- Janečka JE, Helgen KM, Lim NT-L, Baba M, Izawa M, Boeadi , Murphy WJ: Evidence for multiple species of Sunda colugo. Curr Biol. 2008, 18 (21): R1001-R1002. 10.1016/j.cub.2008.09.005.View ArticlePubMedGoogle Scholar
- Martin RD: Colugos: obscure mammals glide into the evolutionary limelight. J Biol. 2008, 7 (4): 13-10.1186/jbiol74.PubMed CentralView ArticlePubMedGoogle Scholar
- Nie WH, Fu BY, O’Brien PCM, Wang JH, Su WT, Tanomtong A, Volobouev V, Ferguson-Smith MA, Yang FT: Flying lemurs - the `flying tree shrews’? molecular cytogenetic evidence for a Scandentia-Dermoptera sister clade. BMC Biol. 2008, 6: 11-10.1186/1741-7007-6-18.View ArticleGoogle Scholar
- Janecka JE, Miller W, Pringle TH, Wiens F, Zitzmann A, Helgen KM, Springer MS, Murphy WJ: Molecular and genomic data identify the closest living relative of primates. Science (New York, NY). 2007, 318 (5851): 792-794. 10.1126/science.1147555.View ArticleGoogle Scholar
- Hedges SB, Dudley J, Kumar S: TimeTree: a public knowledge-base of divergence times among organisms. Bioinformatics (Oxford, England). 2006, 22 (23): 2971-2972. 10.1093/bioinformatics/btl505.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.