Skip to main content

Whole genome sequencing of 51 breast cancers reveals that tumors are devoid of bovine leukemia virus DNA


Controversy exists regarding the association of bovine leukemia virus (BLV) and breast cancer. PCR-based experimental evidence indicates that BLV DNA is present in breast tissue and that as many as 37% of cancer cases may be attributable to viral exposure. Since this association might have major consequences for human health, we evaluated 51 whole genomes of breast cancer samples for the presence of BLV DNA. Among 32 billion sequencing reads retrieved from the NCBI database of genotype and phenotype, none mapped on different strains of the BLV genome. Controls for sequence divergence and proviral loads further validated the approach. This unbiased analysis thus excludes a clonal insertion of BLV in breast tumor cells and strongly argues against an association between BLV and breast cancer.


BLV naturally infects cattle, water buffalo, yak and zebu [14]. Sporadic infections with BLV have occasionally been reported in other species like alpaca [5]. Experimentally, BLV can also be transmitted to a number of species including sheep [6], goats [6], rats [7] and rabbits [8]. BLV infection causes B cell lymphocytosis, leukemia and/or lymphoma in natural and some experimental hosts [1]. There is also controversial evidence suggesting that BLV might infect humans: (1) antibodies against the BLV capsid were detected in 74% of human sera from the Berkeley Community, California [9], (2) BLV DNA was detected in breast tissues using PCR [1012]. Based on a positive correlation between the rates of BLV infection and tumor frequencies (36–59% compared to 29–45% in normal tissue), as many as 37% of breast cancer cases may be attributable to BLV exposure [12].

Although these observations initiated some skepticism within the scientific community [13], the potential consequences for human health clearly require further investigation.

Results and discussion

To avoid potential experimental artifacts associated with DNA amplification techniques, we directly analyzed whole genomes of breast tumors and adjacent tissues. After retrieval of raw DNA sequences from the NCBI dbGaP [14, 15], paired-reads were probed for alignment on different BLV strains using Bowtie2. As a positive control, a nuclear DNA fragment (chr12: 53,959,600–53,964,000) devoid of repeated sequences that would lead to an overestimation of aligned reads and set to 4.4 kb to fit with the monoploid 8.8 kb BLV genome was selected from the human genome. Alignment of 51 breast tumors genomes on the nuclear control sequence identified between 283 and 1287 paired-reads (illustrated on Fig. 1 and summarized on Table 1). In contrast, no homology was found with 5 different BLV subtypes (highlighted in blue on the phylogenic tree of Fig. 2a). In 19 biopsies adjacent to the breast tumors, 386–1197 paired-reads aligned onto the nuclear DNA sequence whereas none mapped on BLV (Table 1). All DNA samples contained extranuclear DNA as indicated by alignment of a control mitochondrial sequence (NC_012920) (Table 1).

Fig. 1

Representative alignment of dbGaP sequencing reads to human and BLV DNA. Breast cancer patients were BRC3 from USA (study phs000472), MEX-BR-15 from Mexico and SX1A2 from Vietnam (study phs000369). Aligned reads were visualized using integrative genomics viewer (IGV)

Table 1 Absence of BLV DNA in 51 whole genomes of breast tumors
Fig. 2

Analysis of sequence variation and proviral load in sequence alignments. a Neighbour-joining phylogenetic tree of BLV and HTLV-1 genomes. b Using the ART simulation tool (NIH), Illumina-like 100 bp paired-reads were generated in silico from the mutants. 880 simulated reads were probed for alignment on BLV AF033818 using Bowtie2 and visualized using IGV. c Correlation between proviral loads and predicted number of reads

Although no paired-read corresponding to five different BLV variants could be identified, the possibility remains that extensive sequence variability impaired detection. On average, the whole genome sequencing procedure generated 660 million reads per sample. Given that the BLV provirus length is 8.8 kb and that a normal human diploid genome is 6.6 billion base pairs, the average number of reads that would be generated by a 8.8 kb-long monoploid sequence is 880 (660,000,000/6600,000,000 × 8800). Providing that the BLV provirus is integrated in a single copy per cell, the whole genome sequencing procedure would thus generate 880 reads on average. If the strain in the sample diverges from the five reference sequences, a fraction of the reads would not be retrieved. Therefore, BLV variants were artificially generated in silico by introducing 2, 3, 6, 10 and 20% nucleotide changes in reference AF033818 (mutants 0.02, 0.03, 0.06, 0.10 and 0.20, respectively). Phylogenetic analysis of Fig. 2a illustrates that in silico generated divergence far exceeds the maximal natural sequence variations observed worldwide [16]. 880 Illumina-like reads were then simulated from these in silico variants using ART simulation tool and mapped on BLV genome AF033818. Most reads (818 of 880) generated from mutant 0.02 aligned on reference sequence AF033818 (Fig. 2b). Even the highly divergent mutant 0.10 still aligned 41% of its 880 reads on the reference. Up to 20% divergence in mutant 0.20 was required to significantly impair detection, although BLV specific reads were still identified (Fig. 2b).

Whole genome analysis thus excludes clonal integration of natural and highly divergent BLV strains in breast tumors. Since only a small proportion of cells may carry the provirus, the sensitivity of the analysis was correlated to the proviral loads. Any natural BLV variant that would infect 10% of the tumor cells is expected to generate about 100 reads (Fig. 2c, dotted blue line). The number of expected reads decreases along with the percentage of infected cells to reach approximately one read with a proviral load of 0.1% (Fig. 2c, dotted blue line). Considering a 59% prevalence of breast tumors positive for BLV [12], 30 samples out of our 51 should be positive. Even with an individual proviral load around 0.1%, this should make about 30 reads (on average one per patient) mapping on BLV, whereas none were found.

Using whole genome analysis, we concluded that there is no evidence for a single BLV-specific or even related sequence. The discrepancies and limitations of this report and others pertain to:

  1. 1.

    The origin of the samples It is indeed possible that tumor biopsies from previous studies originating from US [11, 12] and Colombia [10] significantly differ from those reported in the dbGaP NCBI database. Even if we restrict our observations on US originating samples (n = 35), the discrepancy remains highly significant. Indeed, Buehring reported 67 breast tumors positive for BLV over 114 cases [12] whereas we found none over 35 cases (the p value for fisher test is 1.12 × 10−6).

  2. 2.

    The DNA extraction technique In situ PCR suggested that BLV proviral DNA is localized in the cytoplasm [11, 12]. Analysis of mitochondria-specific sequences (Table 1) shows that dbGaP NCBI database includes reads corresponding to 16 kb-long, circular and extranuclear mitochondrial DNA.

  3. 3.

    The strain divergence Artificial in silico simulation of highly divergent mutants still identified BLV specific reads (Fig. 2b). Since nucleotide substitutions among BLV strains worldwide are limited to 2.3% [16], it remains questionable whether these mutants still belong to the same species. Further analysis show that breast tumor genomes do not map on HTLV-1 sequences (data not shown). Why BLV-conserved sequences were previously identified by PCR remains an enigma.

  4. 4.

    Viral expression Although BLV is expressed at trace levels in the bovine species, the p24 viral capsid protein was detected in 5% of breast tumors [12]. This observation is inconsistent with RNASeq analysis of 154.7 billion of transcriptome sequencing reads from The Cancer Genome Atlas Research Network [17, 18].

Our present study based on whole genome analysis excludes a clonal insertion of BLV in tumor cells and does not support converging lines of evidence which previously suggested an association between BLV infection and breast cancer.


Raw DNA sequences from whole genomes of breast tumors and normal breast tissues adjacent the tumor were retrieved from the NCBI database of genotype and phenotype (dbGaP). These sequences were extracted from two studies: (1) estrogen receptor positive breast cancer: aromatase inhibitor response study (accession number phs000472) [14] and (2) sequence analysis of mutations and translocations across breast cancer subtypes (accession number phs000369) [15]. Archive files were downloaded with prefetch v2.5.7 and sequencing reads were extracted with fastdump v2.5.7 using “split-3” option to separate paired reads and single reads (NCBI SRA Toolkit). Paired reads were probed for alignment on different BLV variants (accession numbers: AF033818, AF275515, D00647, K02120, LC080667) and, as positive control, on human genomic sequences using Bowtie2 (version 2.2.5). We used the “very-sensitive” option of Bowtie2 to maximize the likelihood of viral detection. Analyses were performed on computing cluster running on Linux OS. BLV divergent sequences were created in silico by introducing substitutions, deletions or insertions with equal probabilities in 2, 3, 6, 10 and 20% of the reference AF033818 (mutants 0.02, 0.03, 0.06, 0.10 and 0.20, respectively). Neighbor-joining phylogenetic tree was elaborated using Clustal Omega (EMBL-EBI) and visualized by Dendroscope 3. Illumina-like paired-reads were generated from the BLV sequence using the ART simulation tool (version GreatSmokyMountains-04-17-2016, NIH).


  1. 1.

    Gillet N, Florins A, Boxus M, Burteau C, Nigro A, Vandermeers F, et al. Mechanisms of leukemogenesis induced by bovine leukemia virus: prospects for novel anti-retroviral therapies in human. Retrovirology. 2007;4:18.

  2. 2.

    Meas S, Seto J, Sugimoto C, Bakhsh M, Riaz M, Sato T, et al. Infection of bovine immunodeficiency virus and bovine leukemia virus in water buffalo and cattle populations in Pakistan. J Vet Med Sci. 2000;62:329–31 (cited 20 Jun 2016).

  3. 3.

    Jiménez C, Bonilla JA, Dolz G, Rodriguez LR, Herrero L, Bolaños E, et al. Bovine leukaemia-virus infection in Costa Rica. J Vet Med Ser B. 1995;42:385–90 (cited 20 Jun 2016).

  4. 4.

    Ma J-G, Zheng W-B, Zhou D-H, Qin S-Y, Yin M-Y, Zhu X-Q, et al. First report of bovine leukemia virus infection in yaks (Bos mutus) in China. Biomed Res Int. 2016;2016:9170167 (cited 2016 Jun 27).

  5. 5.

    Lee LC, Scarratt WK, Buehring GC, Saunders GK. Bovine leukemia virus infection in a juvenile alpaca with multicentric lymphoma. Can Vet J. 2012;53:283–6 (cited 20 Jun 2016).

  6. 6.

    Kettmann R, Mammerickx M, Portetelle D, Grégoire D, Burny A. Experimental infection of sheep and goat with bovine leukemia virus: localization of proviral information on the target cells. Leuk Res. 1984;8:937–44 (cited 2016 May 25).

  7. 7.

    Altanerova V, Portetelle D, Kettmann R, Altaner C. Infection of rats with bovine leukaemia virus: establishment of a virus-producing rat cell line. J Gen Virol. 1989;70(Pt 7):1929–32 (cited 25 May 2016).

  8. 8.

    Altanerova V, Ban J, Altaner C. Induction of immune deficiency syndrome in rabbits by bovine leukaemia virus. AIDS. 1989;3:755–8 (cited 25 May 2016).

  9. 9.

    Buehring GC, Philpott SM, Choi KY. Humans have antibodies reactive with Bovine leukemia virus. AIDS Res Hum Retrovir. 2003;19:1105–13.

    CAS  Article  PubMed  Google Scholar 

  10. 10.

    Giovanna M, Carlos U. Bovine leukemia virus gene segment detected in human breast tissue. Open J Med. 2013;2013:84–90.

  11. 11.

    Buehring GC, Shen HM, Jensen HM, Yeon Choi K, Sun D, Nuovo G. Bovine leukemia virus DNA in human breast tissue. Emerg Infect Dis. 2014;20:772–82.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  12. 12.

    Buehring GC, Shen HM, Jensen HM, Jin DL, Hudes M, Block G. Exposure to bovine leukemia virus is associated with breast cancer: a case–control study. PLoS One. 2015;10(9):e0134304. doi:10.1371/journal.pone.0134304.

    Article  PubMed  PubMed Central  Google Scholar 

  13. 13.

    Sinha G. Bovine leukemia virus possibly linked to breast cancer. J Natl Cancer Inst. 2016;108 (cited 22 Apr 2016).

  14. 14.

    Ellis MJ, Ding L, Shen D, Luo J, Suman VJ, Wallis JW, et al. Whole-genome analysis informs breast cancer response to aromatase inhibition. Nature. 2012;486:353–60. doi:10.1038/nature11143.

    CAS  PubMed  PubMed Central  Google Scholar 

  15. 15.

    Banerji S, Cibulskis K, Rangel-Escareno C, Brown KK, Carter SL, Frederick AM, et al. Sequence analysis of mutations and translocations across breast cancer subtypes. Nature. 2012;486:405–9. doi:10.1038/nature11154.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  16. 16.

    Polat M, Takeshima S-N, Hosomichi K, Kim J, Miyasaka T, Yamada K, et al. A new genotype of bovine leukemia virus in South America identified by NGS-based whole genome sequencing and molecular evolutionary genetic analysis. Retrovirology. 2016;13:4 (cited 31 May 2016).

  17. 17.

    Tang KW, Alaei-Mahabadi B, Samuelsson T, Lindh M, Larsson E. The landscape of viral expression and host gene fusion and adaptation in human cancer. Nat Commun. 2013;4:2513.

  18. 18.

    Khoury JD, Tannir NM, Williams MD, Chen Y, Yao H, Zhang J, et al. Landscape of DNA virus associations across human malignant cancers: analysis of 3,775 cases using RNA-Seq. J. Virol. 2013;87:8916–26 (cited 4 May 2016).

Download references

Authors’ contributions

NAG and LW designed the experiment, analyzed the data and wrote the paper. Both authors read and approved the final manuscript.


We thank David Colignon from CECI (consortium of high-performance computing centres of UCL, ULB, ULg, UMons, and UNamur) and Wouter Coppieters from the GIGA-Genomics platform of the University of Liège for their advice on cluster computing. We are grateful to the NIH dbGaP for providing access to studies phs000369 and phs000472. We thank David Halzen for manuscript editing.

Competing interests

Both authors declare that they have no competing interests.

Availability of data and materials

The datasets analysed during the current study are available from the corresponding author on reasonable request.

Ethics approval and consent to participate

Human DNA sequences were retrieved from the NCBI database of Genotype and Phenotype and processed following the NIH Code of Conduct for Genomic Data Use.


This work received financial support of the “Fonds National de la Recherche Scientifique” (FNRS), the Télévie, the Interuniversity Attraction Poles (IAP) Program “Virus-host interplay at the early phases of infection” BELVIR initiated by the Belgian Science Policy Office, the Belgian Foundation against Cancer (FBC), the “Centre anticancéreux près ULg” (CAC) and the “Fonds Léon Fredericq” (FLF), the “AgricultureIsLife” project of Gembloux Agrobiotech (GxABT), the “ULg Fonds Spéciaux pour la Recherche”, the COFUND program, the ERA-IB Astinprod and the “Plan Cancer” of the “Service Public Fédéral”. NAG is supported by a grant of the Télévie. LW is a research director of the FNRS.

The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Author information



Corresponding authors

Correspondence to Nicolas A. Gillet or Luc Willems.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Gillet, N.A., Willems, L. Whole genome sequencing of 51 breast cancers reveals that tumors are devoid of bovine leukemia virus DNA. Retrovirology 13, 75 (2016).

Download citation


  • Breast cancer
  • Bovine leukemia virus
  • BLV