Mapping, annotation and distribution of integrations from RNA-seq. (A) Integration and library statistics for tumors 324 through 410 subjected to RNA-seq. Cufflinks  library scale factors (SF). The fraction of chimeric pairs (R1/R2) proceeding from the LTR or mouse sequence, respectively, as well as the fraction of pairs spanning proviral-mouse junctions (fusions) are shown. The percentage of integrations supported by fusions is shown in black while those without are shown in pink. Numbers within bars indicate the number of integrations. (B) Venn diagram showing the overlap (P = 2.03E-30, hypergeometric probability) of genes assigned from integrations containing a chimeric fusion with those assigned from integrations in RTCGD and the BALB/c and NMRI datasets in Table 1. (C) Distance map showing the positions of RNA-seq integrations supported by chimeric fusions relative to nearest RefSeq gene annotation. (D) Distance map showing the distribution of RNA-seq integrations relative to integrations from RTCGD and the BALB/c and NMRI datasets. The horizontal line marks a distance of 10 kb. This figure shows (per integration) gene assignments common to the integration datasets, and if integrations were confirmed by DNA analyses (the numbers in parenthesis indicate the number of integrations). Integrations marked by purple arrows may have been assigned to a different gene in (B) (described in the main text). (E) Coverage of each integration site relative to the mean coverage of all integrations in each tumor. The minimum coverage corresponds to a single chimeric read pair. The coverage of the integration marked by a red arrow in tumor 327 is above the mean. (*) integrations supported by chimeric fusions and confirmed in DNA analyses. (^) integrations supported by chimeric fusions that were not assigned to a previously tagged gene. The numbering of the integrations in C-E follows the numbering in Additional file 2.