Contribution of type W human endogenous retroviruses to the human genome: characterization of HERV-W proviral insertions and processed pseudogenes
© The Author(s) 2016
Received: 18 April 2016
Accepted: 23 August 2016
Published: 9 September 2016
Human endogenous retroviruses (HERVs) are ancient sequences integrated in the germ line cells and vertically transmitted through the offspring constituting about 8 % of our genome. In time, HERVs accumulated mutations that compromised their coding capacity. A prominent exception is HERV-W locus 7q21.2, producing a functional Env protein (Syncytin-1) coopted for placental syncytiotrophoblast formation. While expression of HERV-W sequences has been investigated for their correlation to disease, an exhaustive description of the group composition and characteristics is still not available and current HERV-W group information derive from studies published a few years ago that, of course, used the rough assemblies of the human genome available at that time. This hampers the comparison and correlation with current human genome assemblies.
In the present work we identified and described in detail the distribution and genetic composition of 213 HERV-W elements. The bioinformatics analysis led to the characterization of several previously unreported features and provided a phylogenetic classification of two main subgroups with different age and structural characteristics. New facts on HERV-W genomic context of insertion and co-localization with sequences putatively involved in disease development are also reported.
The present work is a detailed overview of the HERV-W contribution to the human genome and provides a robust genetic background useful to clarify HERV-W role in pathologies with poorly understood etiology, representing, to our knowledge, the most complete and exhaustive HERV-W dataset up to date.
More than 40 years after the first evidence of discrepancy between the amount of genetic material and organisms complexity, it is now established that less than 2 % of the human genome is composed of protein-coding regions . With respect to this data, it is impressive to consider that human endogenous retroviruses (HERVs) represent four times this value, constituting about the 8 % of our DNA . HERV sequences seem to have been acquired through a traditional infective process, occurred mostly over 30 million years ago . The reverse transcription of the viral genome and the further integration into the germ line cells allowed the Mendelian transmission of these elements through the offspring, determining their coevolution with the host genome.
HERVs belong to class-I transposable elements, termed also retrotransposons, which duplicate through a reverse-transcribed RNA intermediate. Beside HERVs this group comprises also elements devoid of long terminal repeats (LTRs), such as long and short interspersed nuclear elements (LINEs and SINEs respectively). Despite their abundant presence, HERV classification has been for a long time incomplete and sometimes controversial , and a comprehensive dataset of the HERV elements present in the human genome has been only recently provided . In particular, HERVs are distributed among three main classes based on sequence similarity with the exogenous members: class I (Gammaretrovirus- and Epsilonretrovirus-like), class II (Betaretrovirus-like) and class III (Spumaretrovirus-like). Each class encloses a variable number of groups . HERV groups have been traditionally identified with a letter according to the type of human tRNA that binds the primer binding site (PBS) during the reverse transcription process . For example, HERV-K elements are supposed to use a Lysine tRNA. Some groups have also been occasionally named according to a neighbor gene (HERV-ADP) or a particular amino acid motif (HERV-FRD). These nomenclatures are now considered inadequate, and taxonomic studies of HERV groups are usually performed using a phylogenetic approach, commonly based on the highly conserved pol gene . Currently, HERV primary integrations can be divided into 39 groups, and this panorama is further complicated by 31 additional “non canonical” groups of mosaic forms arisen from secondary integrations or recombination events .
For few HERV groups, viral spreading in human chromosomes was not only due to new infections generating novel provirus integrations, but it was also mediated by alternative mechanisms. It is in fact known that several elements of the HERV-W multi-copy group derive from the retrotranscription and mobilization of proviral RNA transcripts mediated by human LINE (L1) machinery, that is responsible for their insertion into new genomic regions. Those sequences are structurally colinear with the proviral mRNA and are called processed pseudogenes . Moreover, the human genome harbors several hundreds of solitary HERV-W LTRs deriving from homologous recombination between the 5′- and 3′ LTRs that removed the retroviral internal part .
Regardless of the mechanism of formation, the genomic persistence of HERV sequences during evolution led to the accumulation of several mutations, insertions and deletions, that have generally compromised their coding capacity . A prominent exception is once again represented by the HERV-W group. Initially identified for its possible role in Multiple Sclerosis (MS), this group showed a high expression level in placental tissues. Further investigations interestingly revealed that an HERV-W provirus, named ERVWE1 and localized to locus 7q21.2, (1) retained a complete env Open Reading Frame (ORF) ; (2) was able to produce a functional protein, called Syncytin-1 and (3) had been co-opted by the human genome for the trophoblast cells fusion during pregnancy, an important structure for regulating the exchanges between mother and fetus [12–14].
Starting from these findings, the expression and coding capacity of HERV-W group have been investigated in the different tissues, above all to find a correlation to various diseases, such as MS [15–21], Schizophrenia [22, 23] and bipolar disorder [24, 25], comprising also a number of pathologies with poorly understood etiology, such as osteoarthritis and cutaneous T cell lymphoma [26, 27]. However, despite the great interest in HERV-W expression, no definitive correlation with human pathologies have been conclusively demonstrated so far  and the characterization of the group at the genomic level still remains a major genetic goal and a bioinformatics challenge . Specifically, the current knowledge of the HERV-W genomic distribution and number of copies is still referred to analyses performed a few years ago [8, 30, 31]. In particular, Voisset et al. described the presence of 70, 100, and 30 HERV-W-related gag, pro, and env regions respectively, using a PCR approach on isolated chromosomes DNA samples with HERV-W-specific primers . Costas identified a total of 140 HERV-W elements through a NCBI BLAST within the draft sequence of the human genome . Pavlícek et al. reported 311 HERV-W elements and 343 solitary LTRs identified using the RepeatMasker program in the GoldenPath assembly of 87 % of the human genome  These works represent milestones in the HERV-W group characterization, but the absence of a complete and exhaustive version of the human genome and the use of different methodologies make it hard to compare and correlate these data with the current version of the human genome.
Moreover, with the exception of the well-described Syncytin-1 provirus, detailed information about the group composition and its members characteristics are somehow lacking, preventing a comprehensive analysis of their possible involvement in human pathologies. In fact, a detailed knowledge of HERV-W genic origin is essential to complete the previously mentioned observed expression profiles [16–25, 32] and to evaluate their possible involvement in disease development and/or progression. Furthermore, it is well known that the mere presence of HERV integrated elements could affect human physiology and health through alternative mechanisms even in the absence of gene expression or products. This can occur for example (1) with gene physical disruption after HERV insertion [33, 34]; (2) by damaging recombination events that can produce genomic alterations ranging from deletions and duplications to large-scale chromosomal rearrangements ; and (3) through the effects exerted by HERVs and their LTRs that naturally present promoters, enhancers, polyadenylation signals and splice donor sites [5, 36–38] and can regulate also human genes expression in a tissue specific manner [39–49].
In this context, the current HERV-W expression studies seem to be not exhaustive to understand the real effects that these elements can exert. In fact, on the one side, due to their multi-copy nature, it is not always clear from which genomic locus a HERV-W mRNA is transcribed, and, on the other side, the potential effects of such sequences is not solely connected to their expression capacity, but depends also on their localization and their ability to (dys)regulate host functions also through alternative mechanisms behind the presence of a RNA/protein products.
In the light of this, the definition of a precise and updated HERV-W genomic map is a pressing need to better evaluate their role in human health and their real influence on host genome. Here we report a comprehensive analysis of HERV-W sequences presence and distribution within human genome, with a detailed description of the different structural and phylogenetic aspects characterizing the group.
HERV-W identification and general classification
In a recent work aimed to the global classification of HERV clades and sequences in the human genome, we reported the presence of 126 elements belonging to HERV-W group . These data were obtained through the bioinformatics tool RetroTector (ReTe), a program package implemented for the identification of ERV full integrations in vertebrate genomes and the attempted reconstruction of the relative ORFs and proteins . For HERV sequences recognition ReTe uses a collection of generic, conserved motifs, a few within env and gag genes, that can be mutated or lost in defective proviruses . Such “bias” was reported as responsible for the low representation of HERV Class III proviruses that have an aberrant gag and may not have an env . In the light of this, willing to build an updated dataset of HERV-W sequences in the human genome GRCh37/hg19 assembly, we used a combined strategy based on (1) the ReTe analysis and (2) a traditional Genome Browser BLAT search , using the assembled RepBase reference LTR17-HERV17-LTR17  as a query. This integrated approach led to the characterization of a total of 213 HERV-W related sequences: the 126 previously identified by ReTe and 87 additional elements retrieved by Genome Browser BLAT. Indeed, a high proportion of newly identified HERV-W sequences showed huge and recurrent deletions that caused loss of extended portions in gag, pol and env genes (described more in detail in the structural characterization section). Hence, the defective nature of the great majority of HERV-W sequences could be responsible for the underrepresentation of the ReTe outcome, confirming the importance of a double approach in HERV identification.
The main characteristics of HERV-W elements are summarized in supplementary material (Additional file 1: Table S1). We named the HERV-W elements according to their genomic localization, in order to have a unique and direct identification of each sequence. In the presence of multiple sequences in the same locus, the order within the band is expressed with a letter following the alphabetical order as previously described . HERV-W elements occurred on all chromosomes showing no recognizable cluster distribution, except chromosome 16 that apparently do not contain HERV-W proviruses or pseudogenes.
The 213 HERV-W sequences were firstly divided into three categories due to previously reported structural characteristics that mostly address the LTRs portion and that reflect their mechanism of formation : proviruses (65), processed pseudogenes (135) and undefined elements (13). Briefly, with respect to the LTR17 RepBase consensus (780 nucleotides), proviral sequences show complete LTRs (referred here as proviral LTRs) and have been inserted into human DNA by a traditional process of retroviral integration. Proviral LTRs show a traditional composition with two unique regions (U3 and U5) separated by a repeated portion (R), giving a U3-R-U5 structure. As described by Pavlícek et al. , pseudogenes are LINE-1 processed HERV-W sequences presenting (1) truncated LTRs (referred here as pseudogenic LTRs), with the 5′ LTR showing a R-U5 structure (start from nucleotide 256 of the consensus) and the 3′ LTR showing a U3-R structure (end at position 326 of the consensus), (2) a poly(A) tail of variable length, and (3) a common TT/AAAA insertion motif and a variable-length (5–15 bp) target site duplication . Finally, undefined elements are sequences that have lost those regions in both LTRs and so remained undefined due to the absence of the signatures described above.
It is interesting to note that our results differed from previous analysis performed a number of years ago on not exhaustive draft versions of the human genome [8, 30, 31] and with the use of different detecting methodologies, leading to discordant results that are not always easy to retrieve and correlate with current data. In fact, on one side, two studies on HERV-W distribution and composition [30, 31] reported a lower number of elements with respect to our dataset. In particular, Voisset et al. described the presence of 70, 100, and 30 HERV-W-related gag, pro, and env regions, respectively, without further indications about their origin , while Costas identified a total of 140 HERV-W elements, 73 less than the present analysis. On the other side, when compared to our dataset, the study by Pavlicek et al. reported a higher number of HERV-W sequences (311) . The lack of available supplementary information of Pavlicek HERV-W dataset (e.g. nucleotide sequences or genomic localization) did not allow us to perform a direct comparison with our results. However, Pavlicek et al. HERV-W sequences were retrieved from a draft version of the human genome using the RepeatMasker program that, in the presence of the recurrent and huge deletions such as the ones observed in the HERV-W sequences, could not easily identify the whole elements. Hence, more fragments previously reported as independent elements possibly belonged to the same provirus/pseudogene. This hypothesis seems to be confirmed by a subsequent study where the same dataset has been used for the HERV-W processed pseudogenes length distribution analysis . Such report showed that the most represented length class in Pavlicek dataset enclosed very short sequences (0–0.5 Kb), with a low proportion of >3.5 Kb elements. Differently, in our dataset >90 % of sequences are in the 1–7.5 Kb range, with around 25 % >6.5 Kb. Overall, the use of the Rete software, that relates retroviral elements reconstructing the original chain , together with a visual inspection of all aligned sequences plus their flanking sites of integration with respect to the group reference, probably led to more reliable sequence recognition. Furthermore, the overestimation of HERV-W members in Pavlicek dataset could also be due to the possible inclusion of HERV9 sequences, highly related to HERV-W but constituting a separate phylogenetic group . In fact, to avoid such bias we initially included a HERV9 consensus in every HERV-W phylogenetic trees, assuring that none of the sequences classified as HERV-W clustered with HERV9 group (data not shown). Importantly, a significant contribution on the HERV-W group presence in the human genome was recently provided in a study in which the cDNA obtained from HERV-W RNA transcripts in MS patients and controls brain samples was amplified in the env region and assigned to single HERV-W loci by Genome Browser BLAT on the NCBI36/hg18 assembly (March 2006) . While the purpose of that study was not a HERV-W group genomic characterization and was biased for env sequences analysis, yet it provided a remarkable genomic map of 176 HERV-W loci, enclosing 35 proviruses, in their supplementary material . Noteworthy, with respect to this study, our analysis led to the identification of 37 further HERV-W elements (9 proviruses, 18 processed pseudogenes and 10 undefined sequences), and to a more defined classification of proviruses and processed pseudogenes.
In order to characterize the HERV-W structure we firstly aligned and analyzed the 213 sequences dataset with respect to the assembled reference LTR17-HERV17-LTR17, built from RepBase Update consensus sequences for HERV-W LTRs and internal portion . HERV-W sequences showed a typical proviral structure, with the gag, pro, pol and env genes flanked by two LTRs. Briefly, the gag gene (nucleotides 2718–4191) encodes the structural components of matrix (MA), capsid (CA) and nucleocapsid (NC); the pro-pol genes (4195–7692) determine the production of the three viral enzymes Protease, Reverse Transcriptase (RT) and Integrase (IN); and the env gene (7720–9348) is responsible for encoding the envelope surface (SU) and transmembrane (TM) elements. The 5′- and 3′ LTRs (1–780 and 9406–10186, respectively) are formed during the retrotranscription process and are identical at time of integration. In addition, almost all HERV-W identified sequences present a 2 Kb long non-coding region, located between 5′ LTR and gag gene and characterized by an AG rich expansion of variable length. This portion was previously reported for three cDNA HERV-W clones , but neither function or origin has been proposed or demonstrated yet.
Phylogenetic analysis and HERV-W proposed subgroup classification
In the case of proviral sequences, the 5′- and 3′ LTRs were analyzed together in the same phylogenetic tree (Fig. 4a). On the contrary, the truncated structure of pseudogenic 5′- and 3′ LTRs only yields a short common region (R; about 90 nucleotides) necessitating a separate analysis (Fig. 4b, c). gag, pol and env genes trees are included in supplementary material in Additional file 2: Fig. S1.
In LTR trees, the distribution of proviral and pseudogenic sequences in two major clusters allowed us to divide them into two distinct subgroups, named 1 and 2. The subgroup of HERV-W single members is reported in Additional file 1: Table S1. Within the 213 HERV-W group members, 69 % of the sequences belong to subgroup 1 (38 proviruses and 108 pseudogenes), while 24 % of them belonged to subgroup 2 (25 proviruses and 27 pseudogenes). The remaining 7 % was constituted of sequences lacking both LTRs and that, subsequently, could not be classified.
Recurrent mutations in HERV-W subgroup 2 LTRs
PVd subgroup 2
PGe subgroup 2
Solo LTRs subgroup 2
PVd subgroup 2A
PVd subgroup 2B
Type 2A additional mutations
PVd subgroup 2B
PVd subgroup 2A
Type 2B additional mutations
The identified key substitutions were then investigated also in the pseudogenic HERV-W dataset, where their strong relation with the sequences distribution in the NJ trees was confirmed for the first 5 positions (96–100 % frequency in subgroup 2 versus 0–3.5 % in subgroup 1), while the last two mutations were shared among about the 75 % of sequences (Table 1). Due to the pseudogenic LTRs truncated structure, the subgroup division was evident in the 3′ LTRs tree (U3-R, positions 1–326 in LTR17) where 5 key positions out of 7 are maintained. The pseudogenic 5′ LTRs (R-U5, positions 256–780) harbor instead only the two less represented key positions and showed a more confused topology, underlining the importance of the described substitutions in the phylogenetic asset of the group.
Extended analysis of HERV-W genomic LTRs
Considering the relevance of LTR structural characteristics for HERV-W classification purposes, we retrieved via Genome Browser BLAT about 800 HERV-W LTRs present in hg19 assembly. This wider dataset has been used to assess the global reliability of the subgroup definition. The NJ tree analysis performed supported our classification, with a tree that resembled the topology observed for proviral and pseudogenic LTRs (Additional file 3: Fig. S2) and showed a comparable distribution of solitary elements between the two subgroups (71 % classified as subgroup 1 and 29 % as subgroup 2). When investigated for recurrent substitutions, the key positions defined for subgroup 2 were confirmed as commonly shared in 87–98 % of the subgroup members and rarely present (1–6 %) in the rest of the whole HERV-W LTRs dataset.
The NJ trees built for the retroviral gag, pro, pol and env genes did not highlight the presence of any subgroup (Additional file 2: Fig. S1), and the nucleotide analysis confirmed that the sequences share a comparable grade of homology. This result demonstrated that the phylogenetic relevant variations within the HERV-W group are located in the LTR elements.
A LTR-based classification was previously suggested by Costas, that identified three distinct HERV-W subfamilies named 1, 2 and 3, on the basis of nucleotide differences described in a shorter version of the 3′ LTR, with a truncation in correspondence to position 326 of LTR17, typical of pseudogenes . Our data indicate instead that the HERV-W main subgroups are only two: subgroup 1 (associated to Costas subfamily 3) and subgroup 2 (related to Costas subfamilies 1 and 2). Subgroup 2 key mutations enclose the 5 mutations observed by Costas plus 2 more in the 3′ LTR terminal portion. With respect to the previous classification, the one we propose is primarily based on a phylogenetic analysis, corroborated by the presence of high frequency key positions found in both 5′ and 3′ full-length LTRs and confirmed for the first time in a comprehensive HERV-W solitary LTRs dataset.
Time of integration
It is known that, at time of integration, the 5′- and 3′ LTRs of the same provirus are identical  and accumulate random substitution in an independent way. Hence, to assess the HERV-W group estimated age we assumed for the human genome a substitution rate of 0.13 %/nucleotides/million year  and used this rate to assess the action of divergence on each HERV-W sequence. Based on this assumption, we calculated the percentage of divergent nucleotides (D %) (1) between the 5′- and 3′ LTRs of each HERV provirus; (2) between each LTR (proviral and pseudogenic) and a generated consensus for each subgroup and (3) between a 150–300 nucleotides region of each HERV-W internal element gag, pro, pol RT, pol IN and env genes (proviral and pseudogenic) and a generated consensus. Regarding the two consensus-based approaches, in consideration that the substitution rate acts randomly on each sequence, the subgroup-generated consensus should ideally represent the ancestral situation.
It is important to note that the traditional sole comparison of the two LTRs of the same sequence would not be sufficient for a reliable estimation. In fact: (1) the LTR versus LTR method could not be applied at all to pseudogenic sequences, representing the 63 % of the whole dataset, due to the short region in common between the 5′- and 3′ LTRs; (2) also in the case of proviral sequences, the lack of one or both LTRs make possible such calculation only for the 70 % of proviral sequences (23 % of the total HERV-W members). The two additional approaches completed and improved the time of integration estimation, allowing to consider a larger subset of elements (94 % of the total HERV-W members) and to represent also pseudogenes and older and less intact sequences, which were not previously taken into account. Importantly, the combination of multiple divergence calculations provided significant improvements also in age estimation reliability and precision. The expression of each HERV-W sequence time of integration through the use of an averaged value allowed to determine the standard deviation and to reduce estimation biases related to outliers and different selective pressure that are reported to interest LTR elements with respect to the rest of the retroviral genome  (Fig. 5b). Data showed that some proviruses had a 0.3–2 folds higher age estimation when calculated using the LTR versus consensus method as compared with the LTR versus LTR method. Despite the absence of a clear explanation, it is possible to speculate that the exogenous viruses that gave rise to these sequences harbored some nucleotide differences in their LTRs that are not properly represented in the consensus sequence, built on the majority of viruses, leading to an apparent higher amount of mutations. In addition, data showed a higher divergence in the gag, pol-3′ (including IN) and env portions, leading to a older age estimation with respect to the internal pro and pol-5′ regions (Fig. 5c), thus suggesting different mutation rates according to the specific viral portions.
Taken together, these results suggest that the HERV-W group integration started about 40 million years ago at the time of the Catarrhini primates, after the divergence between New World Monkeys and Old World Monkeys. This is in line with previous studies [31, 57], which were based on the presence of HERV-W PCR products in different Old World Monkey blood samples , or on the divergence calculation among HERV-W subfamilies , and gave thus just a general overview of primates HERV-W group acquisition. In the present study, the time of insertion has been estimated for each single HERV-W locus through at least two different methods of age calculation, providing a precise and exhaustive picture of the group diffusion among primates, with a rather long period of activity that took place until 25–20 million years ago.
The estimated age of the single HERV-W sequences was generally also supported by the identification of each locus orthologous in primates until the Oldest Common Ancestor (O.C.A.) (Additional file 1: Table S1). Results showed that the great majority of sequences are shared from human to Rhesus Macaque (61 %) or to Gibbon (31 %), with an entry that must be occurred after their separation from the Platyrrhini parvorder (40 million years ago) and before their divergence from the evolutionarily younger hominoids, occurred around 30 (Rhesus Macaque) and 20 (Gibbon) million years ago . Few elements were also found starting only since Orangutan (12), Gorilla (3) and Chimpanzee (2) (Additional file 1: Table S1), but in these cases the estimated age was higher than expected. This probably suggests that such sequences were lost in older primates, even though their absence in Rhesus and Gibbon could be also due to a lower efficiency of Genome Browser comparison between the human genome and the most ancient Catarrhini assemblies. Finally, a single HERV-W element was found only in the human genome assembly hg19, on locus 12q13.3. This data is unexpected because no human specific HERV-W elements have been reported so far, but could not be supported by reliable age estimation due to the shortness of the sequence (about 1500 nucleotides) and the lack of both LTRs.
PBS type and gammaretroviral features
We have also identified and analyzed structural features typically shared among retroviral sequences within the same genus, that can be used as taxonomic and phylogenetic markers . As previously reported , the main gammaretroviral features are (1) one nucleocapsid Zinc finger motif, involved in the retroviral RNA interaction during packaging ; (2) the C-terminal polymerase IN GPY/F motif, that binds the host DNA and could have a role in the integration specificity [61, 62] and (3) a nucleotide frequency bias determined by the action of encapsidated host RNA editing systems .
The gag nucleocapsidic Zinc finger, corresponding to nucleotides 4021–4062 in the RepBase assembled LTR17-HERV17-LTR17 reference sequence, has a typical CX2CX4HX4C amino acid motif. It was found in almost all sequences that retained the harboring genetic region, with a higher prevalence in proviral sequences that were also the most complete in term of genetic composition. Moreover, noteworthy, a second Zinc finger was identified in 96 % of the sequences (nucleotides 4093–4130). This second Zinc finger has a modified structure with respect to the usual one, showing the loss of one of the variable residues (CX2CX3HX4C). The amino acid composition of the two motifs was highly conserved as shown in Fig. 7b. The presence of a second Zinc finger was not previously reported for HERV-W group, and its structure is consistent with the second Zinc finger found in a subset of HERV-H sequences, another gammaretroviral HERV group . However, while for HERV-H a correlation between the presence of this second motif and the age of sequences was proposed, for HERV-W we could not observe such correlation (data not shown).
The IN domain contains a GPY/F motif, a stretch of conserved amino acids with the general WXnGPYXV structure corresponding to nucleotides 7501–7521 in the reference sequence. Considering that the C-terminal part of the polymerase gene was deleted in 85 % of sequences, in the remaining few members the GPY/F feature was found with a 100 % frequency. Also for this feature the logo analysis showed a conserved amino acid sequence (Fig. 7c).
Regarding the nucleotide composition, HERV-W members present a weak bias in purines, tending to be richer in Adenine (about 30 %) and poorer in Guanine (around 22 %) (data not shown). Among Gammaretroviruses an impoverishment of G nucleotide was previously observed for HERV-H group in association with an higher content of Cytosine , while the G to A hypermutation condition was reported for HERV-K group  and is a well known effect of the APOBEC3 defensive action against HIV-1 Lentivirus . Hence, it is possible to speculate that this editing system could have played a role as a control mechanism to limit HERV-W and other endogenous elements mobility during evolution , also considering APOBEC3 ability to greatly inhibit the LINE mediated transposition of other retroelements .
Genomic context of insertion
The current major field of HERVs investigation is their expression and coding capacity, however, the impact of these sequences on the host depends also on their genetic surrounding. The context of integration can, in fact, modulate HERVs physiology, and HERV sequences inserted in proximity of human genes are known to be able to influence their expression [39–41, 43, 45, 48, 49]. As reported for other HERV groups , the analysis of the genomic context of all 213 HERV-W confirmed that the majority of sequences are located in intergenic regions, with the exception of 80 elements inserted into human genes.
HERV-W genomic context: insertions into human coding genes
Gene or relative protein function and associations
HIVEP3 Int 1(−)
Transcription factor, binds Ig and T-cell receptors recombination signal
RASAL2 Int 1 (+)b
RAS superfamily of small GTPases protein activator like. Associations: BMI, weight
ZNF678 Int 2 (+)a,b
Zinc Finger protein. Associations: body height
LCLAT1 Int 2 (+)b
Predominantly remodels anionic phospholipids in endoplasmic reticulum
ASB3 Int 1/2 (−)b
Suppressor of cytokine signaling proteins and their binding partners
KYNU Int 2 (+)a,b
NAD cofactors biosynthesis from tryptophan. Associations: body height, cholesterol, schizophrenia
COBLL1 Int 2 (−)a
Cordon bleu WH2 repeat protein-like 1. Associations: BMI, Cholesterol, HDL, triglycerides, stroke, response to statin therapy, anthropometric sexual dimorphism
AGPS Int 1 (+)b
 Mutations are cause of rhizomelic chondrodysplasia punctata type 3
DIRC3 Int 1 (−)b
Disrupted in renal carcinoma long non-coding RNA. Associations: diabetes mellitus
SLC22a14 Int 1 (+)b
Solute carrier transmembrane protein
NEK11 Int 14/13 (+)b
Never In mitosis kinase. Involved in DNA replication and G2/M checkpoint response to DNA damage. Related to embryonic lethality and preeclampsia
XRN1 Int 1 (−)b
Exoribonuclease involved in Long noncoding RNA decapping and miRNA regulation
ZMAT3 Int 2/3 (−)a,b
Zinc finger matrin. Acts as a bona fide target gene of p53/TP53
ZNF595 Int 3 (+)
Zinc finger protein. Function as transcription factor
ACOX3 Int 1 (−)a,b
Oxidizes the CoA-esters of 2-methyl-branched fatty acids
ARFIP1 Int 2 (+)b
ADP ribosylation factor interacting protein1.  Enhance the cholera toxin activity
DEPDC1B Int 2 (−)b
Significantly upregulated in nonsmall cell lung carcinoma cell lines (reduced patient survival)
ACOT13 Int 1 (+)
Acyl-CoA thioesterase. Involved in regulation of lipid composition and metabolism
EYS Int 13 (−)b
 In photoreceptor layer: mutated in autosomal recessive retinitis pigmentosa
TBX18 Int 7 (−)a,b
Role in embryonic development. Associations: cholesterol, coronary disease
ATG5 Int 6 (−)a,b
Autophagy related apoptosis specific protein. Associations: lipoproteins, LES
PDSS2 Int 2 (−)b
Prenyl (decaprenyl) diphosphate synthase, subunit 2. Synthesizes the side-chain of coenzyme Q.  Coenzyme Q10 deficiency, primary, 3: fatal encephalomyopathy and nephrotic syndrome
SLC16A10 Int 1 (+)a,b
Na(+)-independent transport of aromatic amino acids across the plasma membrane. Associations: cholesterol, LDL
AIG1 Int 1 (+)b
Androgen-induced. Associations: C-reactive protein, insulin, myocardial infarction
BZW2 Int 3 (+)
Homo sapiens basic leucine zipper and W2 domains 2
SUGCT/C7orf10 Int 1 (+)b
 Mutations are associated with glutaric aciduria type III. Others: BMI, fat distribution, cardiomegaly, coronary disease, pancreatic and prostatic neoplasms
NRCAM Int 2 (−)a
Neuronal Cell Adhesion Molecule. Associations: autism, obsessive compulsive disorder, schizophrenia
FOXP2 Int 2 (+)a,b
 Required for development of speech and language regions of the brain during embryogenesis. Associated to speech-language disorders
SLC18A1 Int 10/11 (−)b
Involved in vesicular transport of biogenic amines. Associations: bipolar disorder, major depressive disorder
NKAIN3 Int 3 (+)
Na+/K+ transporting ATPase interacting proteins. Associations: mental competency, neuroblastoma, stroke
CYP7B1 Int 1 (−)b
 Cyp450 enzyme. Associations: bile acid synthesis congenital defect, spastic paraplegia. Others: Alzheimer disease, lipoproteins, schizophrenia
UBE2 W Int 2(−)b
Ubiquitin-conjugating enzyme. Along with ubiquitin-activating (E1) and ligating (E3) enzymes, coordinates the ubiquitin addition to proteins.  Interacts with FANCL and regulates the monoubiquitination of Fanconi anemia protein FANCD2
ZNF704 Int 2 (−)b
Zinc finger protein
PTPRD Int 12 (−)b
Protein tyrosine phosphatase, receptor type, D.  Restless Legs Syndrome. Associations: asthma, BMI, cholesterol, lipids, lipoproteins, triglycerides, diabetes
CD72 Int 1 (−)a,b
B-cell proliferation and differentiation antigen. Associations: lupus erythematosus
CYP2C19 Int 6 (+)b
 Cyp450 enzyme, responsible for therapeutic agents metabolism. Associated to metabolic defects and variants
ENTPD1 Int 1 (+)b
 Triphosphate Diphosphohydrolase. Associated with Spastic Paraplegia
ANO3 Int 14 (+)a,b
 May act as a chloride channel. Associations: Dystonia 24. Others: bmi, obesity, c-reactive protein, cholesterol, coronary disease, schizophrenia
AAMDC Int 2 (+)b
Adipogenesis associated Mth938 domain containing
PRSS23 Int 2 (+)b
Encodes a conserved member of the trypsin family of serine proteases
RIMKLB Int 5 (+)b
Catalyses ATP-dependent condensation of NAA and glutamate to produce NAAG
SLC41A2 Int 1 (−)
Solute carrier family 41member 2
ALG5 Int 7/8 (−)b
Participates in N-linked glycosylation of proteins
TCRA Int 1 (+)b
T cell receptor alpha locus
FAM179B Int 7 (+)
Homo sapiens family with sequence similarity 179 member B
C14orf37 Int 4 (−)b
Associations: attention deficit disorder with hyperactivity
SLFN14 Int 3 (−)b
Implicated in regulation of cell growth and T-cell development (studies in mouse
ACACA Int 2/6 (−)b
Biogenesis of long-chain fatty acid. Associations: BMI, breast cancer
STXBP4 Int 8 (+) b
Translocation of transport vesicles from cytoplasm to plasma membrane, like the insulin-stimulated GLUT4 translocation in adipocytes. Associations: BMI, cholesterol
ZNF90 Int 1 (+)b
Zinc finger protein 90. May be involved in transcriptional regulation. 
ZNF780A Ex 9 (−)b
Zinc finger protein 780A
CYP2A7 in 1 (−)b
Cytochrome P450, family 2, subfamily A, polypeptide 7
IGSF5 Ex 1–2, Int 1 (+) b
Participates at tight-junctions (kidney, gut) or acts as adhesion molecule (testis). Associations: coronary disease, lipoproteins, Parkinson disease, stroke
FAAH2 Int 7 (+)b
Degradation and inactivation of bioactive fatty acid amides
CD24 Int 1 (−)
Mature granulocytes and B cells surface antigen
HERV-W genomic context: insertions into human non-coding genes
Gene function and associations
LOC101929147 Int 4 (+)
Uncharacterized antisense long non-coding RNA
TCONS_00000271 Int 3 (+)
Large intergenic non coding RNA
LOC284581 Int 1 (+)b
Uncharacterized antisense long non-coding RNA
STARD7-AS1 Int1 (+)b
StAR-related lipid transfer domain protein 7 antisense long non coding RNA (LOC285033)
TCONS_00004484 Int 1 (−)
Long intergenic non coding RNA
MIR548 N Int 1 (+)b
Homo sapiens microRNA 548n
CLRN1-AS1 Int 1 (+)
CLRN1 antisense non-coding RNA
TCONS_00007753 Int 1 (−)
Long intergenic non coding RNA
LOC100507053 Int 1 (+)
Uncharacterized antisense long non-coding RNA
TCONS_00007833 Int 1 (−)
Long intergenic non coding RNA
MIR5684 Int 2 (+)
MicroRNA involved in post-transcriptional regulation of gene expression
TCONS_00011526 Ex 1, Int 1 (−)
Long intergenic non coding RNA
TCONS_l2_00024517 Int 2, Ex 3 (+)
TCONS_l2_00024518 Int 1, Ex 2 (+)
TCONS_l2_00024519 Int 1 (+)
Long intergenic non coding RNAs
DQ594967 Ex 1(−)b
Antisense non coding RNA
TCONS_00015019 Int 1 (−)
Long intergenic non coding RNA
LOC441389 Int 5 Ex 6 (+)b
Uncharacterized long non-coding RNA
TCONS_00017977 Int 1 (−)
Long intergenic non coding RNA
PRSS23 Int 2 (+)
Protease serine 23 near-coding RNA
TMPRSS4-AS1 Int 2 (−)b
Antisense non-coding RNA
LINC00383 Ex 1, Int 1 (+)
Long intergenic non coding RNA
TCONS_00021873 Int 2 (+)
Long intergenic non coding RNA
MIR548XHG Ex 1, Int 1 (−)
MIRNA548X host gene long non-coding RNA
AL163953.3 Int 3 (+)
Long non-coding RNA
AK125686 Int 2 (−)b
Antisense non coding RNA
TCONS_00016997 Ex 1–2, Int 1 (+)
Long intergenic non coding RNA
HERV-W genomic context: transcription factor (TF) binding sites
Env putative proteins analysis
Env puteins analysis
ORF length (amino acids)
Noteworthy, seven Env puteins conserved a coding sequence without internal stop codons. Among them, three env genes (4q13.3, 5q11.2 and Xp22.31) are theoretically long enough to encode a complete protein (Additional file 4: Fig. S3). However, even if uninterrupted, those ORFs showed changes of reading frame with respect to the Syncytin-1 translation mode. 20q13.2 (483 aa) and 4q21.22 (320 aa) sequences are the most conserved with respect to Syncytin-1, presenting no stop codons and only one frameshift between positions 441–442 and 75–76, respectively. Xq22.3b (542 aa) and 9q22.31 (267 aa) present indeed no frameshifts but showed a single internal stop codon (position 39 and 149, respectively) that could potentially be reverted with a single point mutations, as already demonstrated ex vivo for Xq22.3b N-trenv .
Regarding the amino acid composition, all investigated Env puteins accumulated several substitutions, leading to a general average identity of about 85 % with respect to Syncytin-1 sequence. To evaluate the puteins possible biological activity, we have characterized in detail the motifs known to be mostly involved in the Syncytin-1 physiological function. Primarily, the envelope precursor must be processed into the mature SU and TM units, with a proteolytic cleavage that occurs at the Furin Cleavage Site conserved RKNR motif. The mutation of this conserved domain has been reported to abrogate the proteolytic cleavage and the fusogenic activity of Env proteins, that exhibited also delayed kinetics of appearance on the membrane compared to the wild-type envelope . The RKNR motif of the HERV-W puteins was frequently mutated at the first position, mostly with the conversion of R residue to C or H (73 % of analyzed ORFs), but was maintained in 7 sequences. After cleavage, SU and TM mature proteins are then linked through a covalent disulphide bond between the SU CWIC and the TM CX6CC motifs. While the TM domain showed a high degree of amino acid homology with respect to Syncytin-1, in the SU motif we found an I > M substitution in 100 % of sequences. Another fundamental step that drives the fusion activity is the interaction between the SU N-terminal 124 aa receptor binding domain and a human sodium-dependent neutral amino acid transporter (hASTC1 or hASTC2), which acts as type D mammalian retrovirus receptor. In the binding domain, the SDGGGX2DX2R motif was recognized to be essential for the receptor contact, and was found in the 58 % of the sequences. The Syncytin-1 fusogenic activity is held by the TM portion, that includes a fusion peptide and a fusion core formed by the amino- and carboxy-terminal heptad repeats. In Env puteins the fusion peptide sequence was characterized by at least one substitution, with residue 332 (A) that was mutated in all sequences into an R or a G (and in one case into an E). Also the fusion core was affected by several mutations localized in both heptad regions, like the residue 433R > Q substitution that is present in 25 out of 26 carboxy-terminal repeats. Interestingly, the 75 amino acids long heptad repeat region showed also a higher concentration of internal stop codons, harboring 50 % of the total stop codons found in the analyzed puteins. Moreover, in traditional Env proteins the fusogenic activity is prevented by an inhibitory R peptide that is located in the TM intracytoplasmic tail and is normally removed by viral proteases. In Syncytin-1 a four amino acid deletion at the LQMV cleavage site made the protein constitutively competent for fusion . This mutation was not present in any other analyzed HERV-W Env putein. Finally, the Syncytin-1 TM subunit also contains a conserved immunosuppressive domain that was thought to possibly contribute towards maternal immunotolerance  even though following findings suggested the absence of this activity . In any case, in the selected Env puteins this domain presents several amino acid substitutions and in 5 sequences a premature termination at position 383. Hence, with respect to locus 7q21.2 Syncytin-1 protein, the other HERV-W loci Env puteins resulted highly defective, especially in sites involved in known physiological functions. However, despite these mutations, they may still be able to produce shorter proteins with a biological significance and/or a role in disease development, as observed for other HERV sequences .
Due to its maintenance despite the presence of huge recurrent flanking deletions affecting the 85 % of HERV-W env genes, also the small env portion of about 30 nucleotides at position 8289–8318 was translated and compared with respect to Syncytin-1. As shown in Additional file 6: Fig. S5, all the 138 HERV-W elements that maintained this portion showed recurrent amino acid substitutions. In particular, the N in position 3 was changed in 136/138 sequences, substituted by H in 93 % of the elements; while the V in position 8 was substituted in 135 sequences, showing a I in 90 % of cases. This prevalence indicates that Syncytin-1 protein probably represents the exception, suggesting an unreported functional relevance of this short domain.
MSRV sequences homology with HERV-W elements
HERV-W loci homology of previously described MSRV sequences and probes
MSRV GenBank entry
n° of discordant bases
Mapped portion in LTR17-HERV17-LTR17
AF127227 (544 bp)
3q23a* (99.5 %)
AF127228 (1932 bp)
Xq22.3b* (99.6 %)
pol-env (5444–5838 and 7682–9200)
AF127229 (2004 bp)
3p12.3* (99.9 %)
pol-env-3′ LTR (5452–6792 and 8290–8318 and 9115–9732)
18q21.32* (99.9 %)
AF123882 (2477 bp)
12q21.3* (99.8 %)
AF331500 (1629 bp)
Xq22.3b* (99.7 %)
5p12* (99.4 %)
AF123881 (1511 bp)
3q26.32* (99.9 %)
AF009668 (2304 bp)
1p34.2 (99.1 %)
2p12a (100 %)
2p24.2 (100 %)
6q27b (98.5 %)
6q15 (97.2 %)
3p12.3 (99.4 %)
AF009666 (324 bp)
1p34.2 (99.5 %)
AF009667 (118 bp)
17q22 (98.2 %)
AF123880 (1003 bp)
5p12 (99.6 %)
5′ LTR (255–803)
3p24.1 (100 %)
3q26.32 (98 %)
AF072494 pol probe (678 bp)
6q21b (99.6 %)
AF072496 gag probe (536 bp)
6q21b (99.6 %)
AF072497 pro probe (364 bp)
1p34.2 (99.2 %)
pro-pol (4166–4522 and 5641–5549)
AF072498 env probe (591 bp)
Xq22.3b (99.5 %)
The MSRV sequences containing an env gene (or a portion of it) and showing highest identity with one of the HERV-W loci analyzed for Env puteins were manually translated and aligned with the correspondent HERV-W Env putein and the Syncytin-1 protein for further comparison (Additional file 7: Fig. S6). Interestingly, with respect to the Syncytin-1 sequence the HERV-W puteins and the correspondent MSRV putein shared the great majority of amino acid substitutions, and often the same amino acid change was common to all sequences analyzed. AF127227 and 3q23a share the same frameshift at position 270 of Syncytin-1 sequence. Moreover, AF127227 and AF127228 showed an internal stop codon at the same position observed in 3q23a and Xq22.3b, respectively (position 39, W in Syncytin-1). Differently, AF331500 lacks this internal stop codon presenting, like Syncytin-1, a W in this position. As already observed for HERV-W, also MSRV Env puteins showed at least one amino acid change in all domains relevant to Syncytin-1 biological activity. Given the proposed MSRV Env proteins role in pathogenesis, the presence of shared recurrent substitutions, possibly preventing the MSRV Env puteins functionality as compared to Syncytin-1, opens further questions that will have to be addressed. Overall, while more MSRV RNA expression studies are needed, the here reported HERV-W genomic map and characterization is a further step to properly assess the MSRV/HERV-W role in the context of MS.
Since the discovery of Syncytin-1 role in placentation [11–13, 92], a great attention has been dedicated to the expression potential of the HERV-W group, trying to further understand their impact on the host. Many studies were focused on HERV-W correlations with several human diseases, primarily represented by MS [15–21, 28, 75, and reviewed in 76] and other major neurological pathologies such as schizophrenia and bipolar disorder [23, 25, 93]. Despite this broad investigation, no certain correlations between HERV-W group expression and any human disease has been confirmed. Also in the major field of MS the findings are still highly discordant . One of the problems faced in this scenario is still the unfortunate lack of a complete and updated description of the HERV-W sequences in the human genome, their genomic background and a detailed knowledge of HERV-W single members. Such information could help in better interpreting the wide range of collected HERV-W expression data.
Therefore, using more updated genome data and a double bioinformatics identification approach, we performed an analysis on the GRCh37/hg19 assembly identifying a total of 213 HERV-W unambiguously classified members. Each HERV-W sequence has been precisely localized and characterized in term of structure, phylogeny and evolution, allowing to specifically identify the uniqueness of each HERW-W single member, and highlighting various non-previously reported characteristics of the group.
Firstly, we observed several nucleotide differences of HERV-W members with respect to the assembled LTR17-HERV17-LTR17 reference that was built on a small number of sequences and therefore does not properly represent the entire group. Secondly, we classified the HERV-W members into two subgroups through a LTRs phylogenetic analysis strongly supported by the identification of key mutated positions in both LTRs, shared by the majority (from 95 % up to 100 %) of sequences within the same subgroup. Beside LTRs mutations relevant for classification purposes, the subgroups comparisons showed single nucleotides differences along the whole retroviral sequence. For this reason we propose here two new consensuses, one for each subgroup (Additional file 8: File S1), that in our opinion better represent the overall HERV-W group composition.
In the present study, for the first time, the period of insertion has been estimated for each HERV-W locus through at least two different methods of age calculation. This provided a precise and exhaustive picture of the group diffusion among primates, and brought important improvements in the method reliability and applicability. Moreover, the analysis showed significantly different dynamics in the two subgroups diffusion, pointed out also by the analysis of the PBS type variability.
The analysis of structural features described for Gammaretroviruses  in HERV-W single members allows to characterize them for the first time in term of prevalence and sequence conservation among the group. Noteworthy, in addition to the traditional Zinc finger motif , we found a previously unreported second putative Zinc finger with an unusual structure, lacking one variable residue. Another interesting feature reported here for the first time is the presence of a weak bias in the HERV-W elements purine amount, with enrichment in A and a consequent underrepresentation of G.
With regards to the group genomic context, we provide an updated overview of 80 HERV-W elements inserted into human genes and the predicted capacity to bind cellular TFs. In particular, 55 HERV-Ws were found into coding genes, 8 more than what previously observed [20, 70], while 25 elements were inserted in human non-coding genes, of which the great majority (22) are reported here for the first time.
Env putein analysis led us to identify and functionally characterize 16 full-length or near full-length env genes, 3 more than previously reported , and 10 conserved but shorter env genes. Although the relative puteins resulted highly defective and mutated in comparison to Syncytin-1, these genes may still be able to produce shorter proteins with a biological significance, as observed for other HERV sequences .
In the light of the debated connection between HERV-W loci expression and MS disease, we investigated the elements known as MSRV in order to evaluate their identity with respect to one or more HERV-W loci in agreement to what has been previously reported . Our results confirmed that the majority of MSRV related sequences have from 97 to 100 % identity with one single HERV-W locus, but more complex pattern of identity, apparently involving 3 or even 6 loci, were also observed. Furthermore, the comparison between MSRV Env puteins and the highest identical HERV-W loci puteins showed common amino acid substitutions with respect to Syncytin-1, that affect all domains reported as relevant for its biological activity.
In conclusion, this report provides, to our knowledge, the most exhaustive and updated overview to date on HERV-W group in terms of structure, evolution and context of integration into the human genome, revealing that this polymorphic multicopy family is not only represented by the single HERV-W member Syncytin-1. We showed that HERV-W elements were acquired by primates during a rather long period, and evolved within and with their genome that exerted a selective pressure leading to the modification of HERV-W structures, including the previously shown co-option of one member for an important physiological function [12, 13]. Overall, the here presented characterization of the HERV-W composition and their genomic context of insertion, will be essential to investigate the effects that, beside protein expression, HERV-W can exert in different tissues both in physiological conditions as well as putative involvement in human disease development and clinical manifestations and to better define their real impact and contribution to our genome.
HERV-W identification and localization
The 213 HERV-W sequences were collected from GRCh37/hg19 assembly using a double approach that binds (1) the hg19 assembly analysis by the ReTe program package  and (2) a traditional BLAT search  in the UCSC Genome Browser database  using the RepBase Update  assembled LTR17-HERV17-LTR17 consensus as a query. The elements found by both approaches have then been confirmed as HERV-W based on (1) Repeat Masker analysis of the HERV-W sequence and its genomic flanking portions, (2) structural alignment and comparison with respect to the HERV-W group RepBase reference LTR17-HERV17-LTR17 and (3) phylogenetic trees; in order to avoid misclassifications or incomplete sequences inclusion.
HERV-W solitary LTRs were retrieved by UCSC Genome Browser BLAT search using LTR17 as a query, and kindly provided by Professor Jens Mayer (Saarland University).
Sequences alignment and structural characterization
The HERV-W nucleotide composition was characterized in detail with respect to the RepBase Update assembled LTR17-HERV17-LTR17 reference by multiple alignments performed with Mafft on line program, version 7  and the subsequent analysis on Geneious bioinformatics software platform, version 8.1.4 . All insertion and deletions were annotated, and the presence of other repetitive elements was reported.
Phylogenetic trees were built with Mega Software, version 6  using pairwise deletion and p-distance method with 500 bootstrap replications. In addition to HERV-W nucleotidic sequences and RepBase Update LTR17 and HERV17 consensus, each tree initially included a HERV9 generated consensus . This was initially made in order to identify and eliminate eventual members of this HERV-W related family.
Time of integration estimation
The age of the single HERV-W members was estimated based on the percentage of divergent nucleotides (D %) between (1) 5′- and 3′ LTRs of each provirus, (2) proviral and pseudogenic single LTRs and a generated consensus for each subgroup, and (3) proviral and pseudogenic 150–300 nucleotides gag, pro, pol RT, pol IN and env portions and a generated consensus for each subgroup. The divergence values were estimated on Mega 6 through Kimura 2-parameter corrected pairwise distances excluding gaps and CpG dinucleotides. The D % have then been used according to previous methodologies  to estimate the time of integration (T) assuming an human genome substitution rate of 0.13 %/nucleotides/million years, with the formula T = D/0.13. For the proviral 5′- versus 3′ LTR divergence a factor of 2 was applied assuming that each LTR evolved independently into the genome (T = D/0.13/2). The final age of each sequence was expressed as average of the estimated time of integration obtained, excluding those value with a standard deviation >20 %.
PBS and gammaretroviral features representation
The presence and composition of the PBS nucleotide sequence and of the nucleocapsidic Zinc finger and C-terminal polymerase IN GPY/F amino acid motifs were analyzed using Mafft alignment and Geneious platform. The grade of conservation at each position was represented with a logo built from WebLogo at http://weblogo.berkeley.edu . The PBS assignation to the correspondent human tRNA type was made by similarity analysis with respect to a tRNA library built from the Transfer RNA database (tRNAdb) of Leipzig University  and from the PBS library provided by Professor Jonas Blomberg .
The genomic context of each HERV-W sequence was characterized by the integration of their genomic coordinates with the UCSC Genome Browser Genes and Genes prediction tracks [101–103]. The elements co-localized with human genes were further analyzed by BLAST search after the activation of OMIM, UCSC, RefSeq and Gencode genes annotations . The presence of TFs binding sites were characterized by the integration of HERV-W members genomic coordinates with the UCSC Genome Browser Regulation Encode Txn Factor ChIp tracks [105, 106]. TFs binding sites were considered reliable in the presence of a score ranging from 800 to 1000.
Env puteins analysis
The env selected genes were translated in all possible frames using Geneious platform. The alignment with respect to ERVWE1/Syncytin-1 precursor (NCBI reference sequence NP_055405.3) was performed on Mafft and allowed to reconstruct the complete protein and to annotate all frameshifts and stop codons. The structural and functional relevant domains were analyzed on Geneious platform.
Analysis of MSRV sequences
Previously published MSRV sequences and probes were retrieved from GeneBank and analyzed by BLAT search for the best matching HERV-W locus/loci based on nucleotide sequence similarity in GRCh37/hg19 assembly. Alignments of MSRV sequences and the relative best matching HERV-W elements were manually inspected on Geneious platform, and discordant positions were annotated. The HERV-W locus/loci homology was then confirmed through the software Recco  with respect to our whole HERV-W dataset as described .
HERV-W consensus sequences generation
The HERV-W group and subgroups consensus sequences were generated from our HERV-W dataset using Geneious bioinformatics software platform, version 8.1.4 .
NG performed the analysis and wrote the manuscript. MC participated in the analysis and in the writing. JB and ET conceived and coordinated the study. All authors helped edit the manuscript. All authors read and approved the final manuscript.
Authors thank Jens Mayer for useful discussions and critical observations.
The authors declare that they have no competing interests.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, Funke R, Gage D, Harris K, Heaford A, Howland J, Kann L, Lehoczky J, LeVine R, McEwan P, McKernan K, Meldrim J, Mesirov JP, Miranda C, Morris W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridan A, Sougnez C, et al. Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921.PubMedView ArticleGoogle Scholar
- International Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature. 2004;431:931–45.View ArticleGoogle Scholar
- Bannert N, Kurth R. The evolutionary dynamics of human endogenous retroviral families. Annu Rev Genom Hum Genet. 2006;7:149–73.View ArticleGoogle Scholar
- Blomberg J, Benachenhou F, Blikstad V, Sperber G, Mayer J. Classification and nomenclature of endogenous retroviral sequences (ERVs): problems and recommendations. Gene. 2008;448:115–23.View ArticleGoogle Scholar
- Vargiu L, Rodriguez-Tomé P, Sperber GO, Cadeddu M, Grandi N, Blikstad V, Tramontano E, Blomberg J. Classification and characterization of human endogenous retroviruses; mosaic forms are common. Retrovirology. 2016;13:7.PubMedPubMed CentralView ArticleGoogle Scholar
- Cohen M, Larsson E. Human endogenous retroviruses. BioEssays. 1988;9:191–6.PubMedView ArticleGoogle Scholar
- Jern P, Sperber GO, Blomberg J. Use of endogenous retroviral sequences (ERVs) and structural markers for retroviral phylogenetic inference and taxonomy. Retrovirology. 2005;2:50.PubMedPubMed CentralView ArticleGoogle Scholar
- Pavlícek A, Paces J, Elleder D. Processed pseudogenes of human endogenous retroviruses generated by LINEs: their integration, stability, and distribution. Genome Res. 2002;12:391–9.PubMedPubMed CentralView ArticleGoogle Scholar
- Mager DL, Goodchild NL. Homologous recombination between the LTRs of a human retrovirus-like element causes a 5-kb deletion in two siblings. Am J Hum Genet. 1989;45:848–54.PubMedPubMed CentralGoogle Scholar
- Villesen P, Aagaard L, Wiuf C, Pedersen FS. Identification of endogenous retroviral reading frames in the human genome. Retrovirology. 2004;1:32.PubMedPubMed CentralView ArticleGoogle Scholar
- Blond JL, Besème F, Duret L, Bouton O, Bedin F, Perron H, Mandrand B, Mallet F. Molecular characterization and placental expression of HERV-W, a new human endogenous retrovirus family. J Virol. 1999;73:1175–85.PubMedPubMed CentralGoogle Scholar
- Mi S, Lee X, Li X, Veldman GM, Finnerty H, Racie L, Lavallie E, Tang X, Edouard P, Howes S Jr, Keith JC, Mccoy JM. Syncytin is a captive retroviral envelope protein involved. Nature. 2000;403(February):785–9.PubMedGoogle Scholar
- Blond JL, Lavillette D, Cheynet V, Bouton O, Oriol G, Chapel-Fernandes S, Mandrand B, Mallet F, Cosset FL. An envelope glycoprotein of the human endogenous retrovirus HERV-W is expressed in the human placenta and fuses cells expressing the type D mammalian retrovirus receptor. J Virol. 2000;74:3321–9.PubMedPubMed CentralView ArticleGoogle Scholar
- Mangeney M, Renard M, Schlecht-Louf G, Bouallaga I, Heidmann O, Letzelter C, Richaud A, Ducos B, Heidmann T. Placental syncytins: Genetic disjunction between the fusogenic and immunosuppressive activity of retroviral envelope proteins. Proc Natl Acad Sci U S A. 2007;104:20534–9.PubMedPubMed CentralView ArticleGoogle Scholar
- Christensen T. Association of human endogenous retroviruses with multiple sclerosis and possible interactions with herpes viruses. Rev Med Virol. 2005;15:179–211.PubMedView ArticleGoogle Scholar
- Perron H, Perin JP, Rieger F, Alliel PM. Particle-associated retroviral RNA and tandem RGH/HERV-W copies on human chromosome 7q: possible components of a “chain-reaction” triggered by infectious agents in multiple sclerosis? J Neurovirol. 2000;6(Suppl 2):S67–75.PubMedGoogle Scholar
- Perron H, Lazarini F, Ruprecht K, Péchoux-Longin C, Seilhean D, Sazdovitch V, Créange A, Battail-Poirot N, Sibaï G, Santoro L, Jolivet M, Darlix J-L, Rieckmann P, Arzberger T, Hauw J-J, Lassmann H. Human endogenous retrovirus (HERV)-W ENV and GAG proteins: physiological expression in human brain and pathophysiological modulation in multiple sclerosis lesions. J Neurovirol. 2005;11:23–33.PubMedView ArticleGoogle Scholar
- Brudek T, Christensen T, Aagaard L, Petersen T, Hansen HJ, Møller-Larsen A. B cells and monocytes from patients with active multiple sclerosis exhibit increased surface expression of both HERV-H Env and HERV-W Env, accompanied by increased seroreactivity. Retrovirology. 2009;6:104.PubMedPubMed CentralView ArticleGoogle Scholar
- García-Montojo M, de la Hera B, Varadé J, de la Encarnación A, Camacho I, Domínguez-Mozo M, Arias-Leal A, García-Martínez A, Casanova I, Izquierdo G, Lucas M, Fedetz M, Alcina A, Arroyo R, Matesanz F, Urcelay E, Alvarez-Lafuente R. HERV-W polymorphism in chromosome X is associated with multiple sclerosis risk and with differential expression of MSRV. Retrovirology. 2014;11:2.PubMedPubMed CentralView ArticleGoogle Scholar
- Schmitt K, Richter C, Backes C, Meese E, Ruprecht K, Mayer J. Comprehensive analysis of human endogenous retrovirus group HERV-W locus transcription in multiple sclerosis brain lesions by high-throughput amplicon sequencing. J Virol. 2013;87:13837–52.PubMedPubMed CentralView ArticleGoogle Scholar
- Hon GM, Erasmus RT, Matsha T. Multiple sclerosis-associated retrovirus and related human endogenous retrovirus-W in patients with multiple sclerosis: a literature review. J Neuroimmunol. 2013;263:8–12.PubMedView ArticleGoogle Scholar
- Karlsson H, Bachmann S, Schröder J, McArthur J, Torrey EF, Yolken RH. Retroviral RNA identified in the cerebrospinal fluids and brains of individuals with schizophrenia. Proc Natl Acad Sci U S A. 2001;98:4634–9.PubMedPubMed CentralView ArticleGoogle Scholar
- Perron H, Mekaoui L, Bernard C, Veas F, Stefas I, Leboyer M. Endogenous retrovirus type W GAG and envelope protein antigenemia in serum of schizophrenic patients. Biol Psychiatry. 2008;64:1019–23.PubMedView ArticleGoogle Scholar
- Frank O, Giehl M, Zheng C, Hehlmann R, Leib-Mosch C, Seifarth W. Human endogenous retrovirus expression profiles in samples from brains of patients with schizophrenia and bipolar disorders. J Virol. 2005;79:10890–901.PubMedPubMed CentralView ArticleGoogle Scholar
- Perron H, Hamdani N, Faucard R, Lajnef M, Jamain S, Daban-Huard C, Sarrazin S, LeGuen E, Houenou J, Delavest M, Moins-Teisserenc H, Moins-Teiserenc H, Bengoufa D, Yolken R, Madeira A, Garcia-Montojo M, Gehin N, Burgelin I, Ollagnier G, Bernard C, Dumaine A, Henrion A, Gombert A, Le Dudal K, Charron D, Krishnamoorthy R, Tamouza R, Leboyer M. Molecular characteristics of human endogenous retrovirus type-W in schizophrenia and bipolar disorder. Transl Psychiatry. 2012;2:e201.PubMedPubMed CentralView ArticleGoogle Scholar
- Bendiksen S, Martinez-Zubiavrra I, Tümmler C, Knutsen G, Elvenes J, Olsen E, Olsen R, Moens U. Human endogenous retrovirus W activity in cartilage of osteoarthritis patients. Biomed Res Int. 2014;2014:1–14.View ArticleGoogle Scholar
- Maliniemi P, Vincendeau M, Mayer J, Frank O, Hahtola S, Karenko L, Carlsson E, Mallet F, Seifarth W, Leib-Mösch C, Ranki A. Expression of human endogenous retrovirus-w including syncytin-1 in cutaneous T-cell lymphoma. PLoS ONE. 2013;8:e76281.PubMedPubMed CentralView ArticleGoogle Scholar
- Antony JM, Deslauriers AM, Bhat RK, Ellestad KK, Power C. Human endogenous retroviruses and multiple sclerosis: innocent bystanders or disease determinants? Biochim Biophys Acta. 2011;1812:162–76.PubMedView ArticleGoogle Scholar
- Magiorkinis G, Belshaw R, Katzourakis A. “There and back again”: revisiting the pathophysiological roles of human endogenous retroviruses in the post-genomic era. Philos Trans R Soc Lond B Biol Sci. 2013;368:20120504.PubMedPubMed CentralView ArticleGoogle Scholar
- Voisset C, Bouton O, Bedin F, Duret L, Mandrand B, Mallet F, Paranhos-Baccala G. Chromosomal distribution and coding capacity of the human endogenous retrovirus HERV-W family. AIDS Res Hum Retrovir. 2000;16:731–40.PubMedView ArticleGoogle Scholar
- Costas J. Characterization of the intragenomic spread of the human endogenous retrovirus family HERV-W. Mol Biol Evol. 2002;19:526–33.PubMedView ArticleGoogle Scholar
- Perron H, Germi R, Bernard C, Garcia-Montojo M, Deluen C, Farinelli L, Faucard R, Veas F, Stefas I, Fabriek BO, Van-Horssen J, Van-der-Valk P, Gerdil C, Mancuso R, Saresella M, Clerici M, Marcel S, Creange A, Cavaretta R, Caputo D, Arru G, Morand P, Lang AB, Sotgiu S, Ruprecht K, Rieckmann P, Villoslada P, Chofflon M, Boucraut J, Pelletier J, et al. Human endogenous retrovirus type W envelope expression in blood and brain cells provides new insights into multiple sclerosis disease. Mult Scler. 2012;18:1721–36.PubMedPubMed CentralView ArticleGoogle Scholar
- Varmus HE. Form and function of retroviral proviruses. Science. 1982;216:812–20.PubMedView ArticleGoogle Scholar
- Kim H-S. Genomic impact, chromosomal distribution and transcriptional regulation of HERV elements. Mol Cells. 2012;33:539–44.PubMedPubMed CentralView ArticleGoogle Scholar
- Hedges DJ, Deininger PL. Inviting instability: transposable elements, double-strand breaks, and the maintenance of genome integrity. Mutat Res. 2007;616:46–59.PubMedView ArticleGoogle Scholar
- Khodosevich K, Lebedev Y, Sverdlov E. Endogenous retroviruses and human evolution. Comp Funct Genom. 2002;3:494–8.View ArticleGoogle Scholar
- Jern P, Coffin JM. Effects of retroviruses on host genome function. Annu Rev Genet. 2008;42:709–32.PubMedView ArticleGoogle Scholar
- Schön U, Diem O, Leitner L, Günzburg WH, Mager DL, Salmons B, Leib-Mösch C. Human endogenous retroviral long terminal repeat sequences as cell type-specific promoters in retroviral vectors. J Virol. 2009;83:12643–50.PubMedPubMed CentralView ArticleGoogle Scholar
- Kowalski PE, Freeman JD, Mager DL. Intergenic splicing between a HERV-H endogenous retrovirus and two adjacent human genes. Genomics. 1999;57:371–9.PubMedView ArticleGoogle Scholar
- Medstrand P, Landry JR, Mager DL. Long terminal repeats are used as alternative promoters for the endothelin B receptor and apolipoprotein C-I genes in humans. J Biol Chem. 2001;276:1896–903.PubMedView ArticleGoogle Scholar
- Dunn CA, Medstrand P, Mager DL. An endogenous retroviral long terminal repeat is the dominant promoter for human beta 1,3-galactosyltransferase 5 in the colon. Proc Natl Acad Sci U S A. 2003;100:12841–6.PubMedPubMed CentralView ArticleGoogle Scholar
- Jordan IK, Rogozin IB, Glazko GV, Koonin EV. Origin of a substantial fraction of human regulatory sequences from transposable elements. Trends Genet. 2003;19:68–72.PubMedView ArticleGoogle Scholar
- van de Lagemaat LN, Landry J-R, Mager DL, Medstrand P. Transposable elements in mammals promote regulatory variation and diversification of genes with specialized functions. Trends Genet. 2003;19:530–6.PubMedView ArticleGoogle Scholar
- Dunn CA, van de Lagemaat LN, Baillie GJ, Mager DL. Endogenous retrovirus long terminal repeats as ready-to-use mobile promoters: the case of primate beta3GAL-T5. Gene. 2005;364:2–12.PubMedView ArticleGoogle Scholar
- Medstrand P, van de Lagemaat LN, Dunn CA, Landry J-R, Svenback D, Mager DL. Impact of transposable elements on the evolution of mammalian gene regulation. Cytogenet Genome Res. 2005;110:342–52.PubMedView ArticleGoogle Scholar
- Sin HS, Huh JW, Kim DS, Kang DW, Min DS, Kim TH, Ha HS, Kim HH, Lee SY, Kim HS. Transcriptional control of the HERV-H LTR element of the GSDML gene in human tissues and cancer cells. Arch Virol. 2006;151:1985–94.PubMedView ArticleGoogle Scholar
- Piriyapongsa J, Polavarapu N, Borodovsky M, McDonald J. Exonization of the LTR transposable elements in human genome. BMC Genom. 2007;8:291.View ArticleGoogle Scholar
- Conley AB, Piriyapongsa J, Jordan IK. Retroviral promoters in the human genome. Bioinformatics. 2008;24:1563–7.PubMedView ArticleGoogle Scholar
- Isbel L, Whitelaw E. Endogenous retroviruses in mammals: an emerging picture of how ERVs modify expression of adjacent genes. BioEssays. 2012;34:734–8.PubMedView ArticleGoogle Scholar
- Sperber GO, Airola T, Jern P, Blomberg J. Automated recognition of retroviral sequences in genomic data–RetroTector. Nucleic Acids Res. 2007;35:4964–76.PubMedPubMed CentralView ArticleGoogle Scholar
- Kent WJ. BLAT—the BLAST-like alignment tool. Genome Res. 2002;12:656–64.PubMedPubMed CentralView ArticleGoogle Scholar
- Jurka J, Kapitonov VV, Pavlicek A, Klonowski P, Kohany O, Walichiewicz J. Repbase update, a database of eukaryotic repetitive elements. Cytogenet Genome Res. 2005;110:462–7.PubMedView ArticleGoogle Scholar
- Subramanian RP, Wildschutte JH, Russo C, Coffin JM. Identification, characterization, and comparative genomic distribution of the HERV-K (HML-2) group of human endogenous retroviruses. SM1. Retrovirology. 2011;8:90.PubMedPubMed CentralView ArticleGoogle Scholar
- Pavlı́ček A, Pačes J, Zı́ka R, Hejnar J. Length distribution of long interspersed nucleotide elements (LINEs) and processed pseudogenes of human endogenous retroviruses: implications for retrotransposition and pseudogene detection. Gene. 2002;300:189–94.PubMedView ArticleGoogle Scholar
- Dangel AW, Mendoza AR, Menachery CD, Baker BJ, Daniel CM, Carroll MC, Wu LC, Yu CY. The dichotomous size variation of human complement C4 genes is mediated by a novel family of endogenous retroviruses, which also establishes species-specific genomic patterns among old world primates. Immunogenetics. 1994;40:425–36.PubMedView ArticleGoogle Scholar
- Lebedev YB, Belonovitch OS, Zybrova NV, Khil PP, Kurdyukov SG, Vinogradova TV, Hunsmann G, Sverdlov ED. Differences in HERV-K LTR insertions in orthologous loci of humans and great apes. Gene. 2000;247:265–77.PubMedView ArticleGoogle Scholar
- Kim HS, Takenaka O, Crow TJ. Isolation and phylogeny of endogenous retrovirus sequences belonging to the HERV-W family in primates. J Gen Virol. 1999;80:2613–9.PubMedView ArticleGoogle Scholar
- Perelman P, Johnson WE, Roos C, Seuánez HN, Horvath JE, Moreira MAM, Kessing B, Pontius J, Roelke M, Rumpler Y, Schneider MPC, Silva A, O’Brien SJ, Pecon-Slattery J. A molecular phylogeny of living primates. PLoS Genet. 2011;7:1–17.View ArticleGoogle Scholar
- Blomberg J, Benachenhou F, Blikstad V, Sperber G, Mayer J. Classification and nomenclature of endogenous retroviral sequences (ERVs): problems and recommendations. Gene. 2009;448:115–23.PubMedView ArticleGoogle Scholar
- Bowzard JB, Bennett RP, Krishna NK, Ernst SM, Rein A, Wills JW. Importance of basic residues in the nucleocapsid sequence for retrovirus Gag assembly and complementation rescue. J Virol. 1998;72:9034–44.PubMedPubMed CentralGoogle Scholar
- Malik HS, Eickbush TH. Modular evolution of the integrase domain in the Ty3/Gypsy class of LTR retrotransposons. J Virol. 1999;73:5186–90.PubMedPubMed CentralGoogle Scholar
- Singleton TL, Levin HL. A long terminal repeat retrotransposon of fission yeast has strong preferences for specific sites of insertion. Eukaryot Cell. 2002;1:44–55.PubMedPubMed CentralView ArticleGoogle Scholar
- Jern P, Sperber GO, Ahlsén G, Blomberg J. Sequence variability, gene structure, and expression of full-length human endogenous retrovirus H. J Virol. 2005;79:6325–37.PubMedPubMed CentralView ArticleGoogle Scholar
- Zsíros J, Jebbink MF, Lukashov VV, Voûte PA, Berkhout B. Biased nucleotide composition of the genome of HERV-K related endogenous retroviruses and its evolutionary implications. J Mol Evol. 1999;48:102–11.PubMedView ArticleGoogle Scholar
- Mangeat B, Turelli P, Caron G, Friedli M, Perrin L, Trono D. Broad antiretroviral defence by human APOBEC3G through lethal editing of nascent reverse transcripts. Nature. 2003;424:99–103.PubMedView ArticleGoogle Scholar
- Chiu Y-L, Greene WC. The APOBEC3 cytidine deaminases: an innate defensive network opposing exogenous retroviruses and endogenous retroelements. Annu Rev Immunol. 2008;26:317–53.PubMedView ArticleGoogle Scholar
- Chiu Y-L, Witkowska HE, Hall SC, Santiago M, Soros VB, Esnault C, Heidmann T, Greene WC. High-molecular-mass APOBEC3G complexes restrict Alu retrotransposition. Proc Natl Acad Sci U S A. 2006;103:15588–93.PubMedPubMed CentralView ArticleGoogle Scholar
- van de Lagemaat LN, Medstrand P, Mager DL. Multiple effects govern endogenous retrovirus survival patterns in human gene introns. Genome Biol. 2006;7:R86.PubMedPubMed CentralView ArticleGoogle Scholar
- Medstrand P, Van De Lagemaat LN, Mager DL. Retroelement distributions in the human genome: variations associated with age and proximity to genes. Genome Res. 2002;12:1483–95.PubMedPubMed CentralView ArticleGoogle Scholar
- Li F, Nellåker C, Yolken RH, Karlsson H. A systematic evaluation of expression of HERV-W elements; influence of genomic context, viral structure and orientation. BMC Genom. 2011;12:22.View ArticleGoogle Scholar
- Hadjiargyrou M, Delihas N. The intertwining of transposable elements and non-coding RNAs. Int J Mol Sci. 2013;14:13307–28.PubMedPubMed CentralView ArticleGoogle Scholar
- Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, Evans CA, Holt RA, Gocayne JD, Amanatides P, Ballew RM, Huson DH, Wortman JR, Zhang Q, Kodira CD, Zheng XH, Chen L, Skupski M, Subramanian G, Thomas PD, Zhang J, Gabor Miklos GL, Nelson C, Broder S, Clark AG, Nadeau J, McKusick VA, Zinder N, et al. The sequence of the human genome. Science. 2001;291:1304–51.PubMedView ArticleGoogle Scholar
- Gimenez J, Mallet F. ERVWE1 (endogenous retroviral family W, Env(C7), member 1). Atlas Genet Cytogenet Oncol Haematol. 2008;12:134–48.Google Scholar
- de Parseval N, Lazar V, Casella J-F, Benit L, Heidmann T. Survey of human genes of retroviral origin: identification and transcriptome of the genes with coding capacity for complete envelope proteins. J Virol. 2003;77:10414–22.PubMedPubMed CentralView ArticleGoogle Scholar
- Roebke C, Wahl S, Laufer G, Stadelmann C, Sauter M, Mueller-Lantzsch N, Mayer J, Ruprecht K. An N-terminally truncated envelope protein encoded by a human endogenous retrovirus W locus on chromosome Xq22.3. Retrovirology. 2010;7:69.PubMedPubMed CentralView ArticleGoogle Scholar
- Cheynet V, Ruggieri A, Oriol G, Blond J-L, Boson B, Vachot L, Verrier B, Cosset F-L, Mallet F. Synthesis, assembly, and processing of the Env ERVWE1/syncytin human endogenous retroviral envelope. J Virol. 2005;79:5585–93.PubMedPubMed CentralView ArticleGoogle Scholar
- Bonnaud B, Bouton O, Oriol G, Cheynet V, Duret L, Mallet F. Evidence of selection on the domesticated ERVWE1 env retroviral element involved in placentation. Mol Biol Evol. 2004;21:1895–901.PubMedView ArticleGoogle Scholar
- Schiavetti F, Thonnard J, Colau D, Boon T, Coulie PG. A human endogenous retroviral sequence encoding an antigen recognized on melanoma by cytolytic T lymphocytes. Cancer Res. 2002;62:5510–6.PubMedGoogle Scholar
- Perron C, Geny A, Laurent C, Mouriquand J, Pellat J, Perret J, Seigneurin J. Leptomeningeal cell line from multiple sclerosis with reverse transcriptase activity and viral particles. Res Virol. 1989;140:551–61.PubMedView ArticleGoogle Scholar
- Perron H, Lalande B, Gratacap B, Laurent A, Genoulaz O, Geny C, Mallaret M, Schuller E, Stoebner P, Seigneurin J. Isolation of retrovirus from patients with multiple sclerosis. Lancet. 1991;337:862–3.PubMedView ArticleGoogle Scholar
- Komurian-Pradel F, Paranhos-Baccala G, Bedin F, Ounanian-Paraz A, Sodoyer M, Ott C, Rajoharison A, Garcia E, Mallet F, Mandrand B, Perron H. Molecular cloning and characterization of MSRV-related sequences associated with retrovirus-like particles. Virology. 1999;260:1–9.PubMedView ArticleGoogle Scholar
- Perron H, Garson JA, Bedin F, Beseme F, Paranhos-Baccala G, Komurian-Pradel F, Mallet F, Tuke PW, Voisset C, Blond JL, Lalande B, Seigneurin JM, Mandrand B. Molecular identification of a novel retrovirus repeatedly isolated from patients with multiple sclerosis. The Collaborative Research Group on Multiple Sclerosis. Proc Natl Acad Sci U S A. 1997;94:7583–8.PubMedPubMed CentralView ArticleGoogle Scholar
- Garcia-Montojo M, Dominguez-Mozo M, Arias-Leal A, Garcia-Martinez Á, de las Heras V, Casanova I, Faucard R, Gehin N, Madeira A, Arroyo R, Curtin F, Alvarez-Lafuente R, Perron H. The DNA copy number of human endogenous retrovirus-W (MSRV-Type) is increased in multiple sclerosis patients and is influenced by gender and disease severity. PLoS One 2013;8:e53623.Google Scholar
- Mameli G, Astone V, Arru G, Marconi S, Lovato L, Serra C, Sotgiu S, Bonetti B, Dolei A. Brains and peripheral blood mononuclear cells of multiple sclerosis (MS) patients hyperexpress MS-associated retrovirus/HERV-W endogenous retrovirus, but not human herpesvirus 6. J Gen Virol. 2007;88(Pt 1):264–74.PubMedView ArticleGoogle Scholar
- Mameli G, Poddighe L, Astone V, Delogu G, Arru G, Sotgiu S, Serra C, Dolei A. Novel reliable real-time PCR for differential detection of MSRVenv and syncytin-1 in RNA and DNA from patients with multiple sclerosis. J Virol Methods. 2009;161:98–106.PubMedView ArticleGoogle Scholar
- Dolei A, Perron H. The multiple sclerosis-associated retrovirus and its HERV-W endogenous family: a biological interface between virology, genetics, and immunology in human physiology and disease. J Neurovirol. 2009;15:4–13.PubMedView ArticleGoogle Scholar
- Blomberg J, Ushameckis D, Jern P. Evolutionary aspects of human endogenous retroviral sequences (HERVs) and disease. In: Sverdlov ED, editor. Retroviruses and primate genomes evolution. Austin: Landes Bioscience; 2000. pp. 204–238.Google Scholar
- Voisset C, Weiss RA, Griffiths DJ. Human RNA “rumor” viruses: the search for novel human retroviruses in chronic disease. Microbiol Mol Biol Rev. 2008;72:157–96 (table of contents).PubMedPubMed CentralView ArticleGoogle Scholar
- Laufer G, Mayer J, Mueller BF, Mueller-Lantzsch N, Ruprecht K. Analysis of transcribed human endogenous retrovirus W env loci clarifies the origin of multiple sclerosis-associated retrovirus env sequences. Retrovirology. 2009;6:37.PubMedPubMed CentralView ArticleGoogle Scholar
- Flockerzi A, Maydt J, Frank O, Ruggieri A, Maldener E, Seifarth W, Medstrand P, Lengauer T, Meyerhans A, Leib-Mösch C, Meese E, Mayer J. Expression pattern analysis of transcribed HERV sequences is complicated by ex vivo recombination. Retrovirology. 2007;4:39.PubMedPubMed CentralView ArticleGoogle Scholar
- Deb-Rinker P, Klempan TA, O’Reilly RL, Torrey EF, Singh SM. Molecular characterization of a MSRV-like sequence identified by RDA from monozygotic twin pairs discordant for schizophrenia. Genomics. 1999;61:133–44.PubMedView ArticleGoogle Scholar
- Blaise S, de Parseval N, Heidmann T. Functional characterization of two newly identified human endogenous retrovirus coding envelope genes. Retrovirology. 2005;2:19.PubMedPubMed CentralView ArticleGoogle Scholar
- Christensen T. HERVs in neuropathogenesis. J Neuroimmune Pharmacol. 2010;5:326–35.PubMedView ArticleGoogle Scholar
- Chance MR, Sagi I, Wirt MD, Frisbie SM, Scheuring E, Chen E, Bess JW Jr, Henderson LE, Arthur LO, South TL, et al. Extended x-ray absorption fine structure studies of a retrovirus: equine infectious anemia virus cysteine arrays are coordinated to zinc. Proc Natl Acad Sci U S A. 1992;89:10041–5.PubMedPubMed CentralView ArticleGoogle Scholar
- James Kent W, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D. The human genome browser at UCSC. Genome Res. 2002;12:996–1006.PubMedView ArticleGoogle Scholar
- Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 2013;30:772–80.PubMedPubMed CentralView ArticleGoogle Scholar
- Kearse M, Moir R, Wilson A, Stones-Havas S, Cheung M, Sturrock S, Buxton S, Cooper A, Markowitz S, Duran C, Thierer T, Ashton B, Meintjes P, Drummond A. Geneious basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics. 2012;28:1647–9.PubMedPubMed CentralView ArticleGoogle Scholar
- Tamura K, Stecher G, Peterson D, Filipski A, Kumar S. MEGA6: molecular evolutionary genetics analysis version 6.0. Mol Biol Evol. 2013;30:2725–9.PubMedPubMed CentralView ArticleGoogle Scholar
- Crooks GE, Hon G, Chandonia JM, Brenner SE. WebLogo: a sequence logo generator. Genome Res. 2004;14:1188–90.PubMedPubMed CentralView ArticleGoogle Scholar
- Jühling F, Mörl M, Hartmann RK, Sprinzl M, Stadler PF, Pütz J. tRNAdb 2009: compilation of tRNA sequences and tRNA genes. Nucleic Acids Res. 2009;37(Database issue):D159–62.PubMedView ArticleGoogle Scholar
- Harrow J, Frankish A, Gonzalez JM, Tapanari E, Diekhans M, Kokocinski F, Aken BL, Barrell D, Zadissa A, Searle S, Barnes I, Bignell A, Boychenko V, Hunt T, Kay M, Mukherjee G, Rajan J, Despacio-Reyes G, Saunders G, Steward C, Harte R, Lin M, Howald C, Tanzer A, Derrien T, Chrast J, Walters N, Balasubramanian S, Pei B, Tress M, et al. GENCODE: the reference human genome annotation for the ENCODE project. Genome Res. 2012;22:1760–74.PubMedPubMed CentralView ArticleGoogle Scholar
- Hsu F, Kent WJ, Clawson H, Kuhn RM, Diekhans M, Haussler D. The UCSC known genes. Bioinformatics. 2006;22:1036–46.PubMedView ArticleGoogle Scholar
- Pruitt KD, Tatusova T, Maglott DR. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2005;33(Database issue):D501–4.PubMedView ArticleGoogle Scholar
- Karolchik D, Barber GP, Casper J, Clawson H, Cline MS, Diekhans M, Dreszer TR, Fujita PA, Guruvadoo L, Haeussler M, Harte RA, Heitner S, Hinrichs AS, Learned K, Lee BT, Li CH, Raney BJ, Rhead B, Rosenbloom KR, Sloan CA, Speir ML, Zweig AS, Haussler D, Kuhn RM, Kent WJ. The UCSC Genome Browser database: 2014 update. Nucleic Acids Res. 2014;42(Database issue):D764–70.PubMedView ArticleGoogle Scholar
- Gerstein MB, Kundaje A, Hariharan M, Landt SG, Yan K-K, Cheng C, Mu XJ, Khurana E, Rozowsky J, Alexander R, Min R, Alves P, Abyzov A, Addleman N, Bhardwaj N, Boyle AP, Cayting P, Charos A, Chen DZ, Cheng Y, Clarke D, Eastman C, Euskirchen G, Frietze S, Fu Y, Gertz J, Grubert F, Harmanci A, Jain P, Kasowski M, et al. Architecture of the human regulatory network derived from ENCODE data. Nature. 2012;489:91–100.PubMedPubMed CentralView ArticleGoogle Scholar
- Wang J, Zhuang J, Iyer S, Lin X, Whitfield TW, Greven MC, Pierce BG, Dong X, Kundaje A, Cheng Y, Rando OJ, Birney E, Myers RM, Noble WS, Snyder M, Weng Z. Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors. Genome Res. 2012;22:1798–812.PubMedPubMed CentralView ArticleGoogle Scholar
- Maydt J, Lengauer T. Recco: recombination analysis using cost optimization. Bioinformatics. 2006;22:1064–71.PubMedView ArticleGoogle Scholar