Tracing the HIV-1 subtype B mobility in Europe: a phylogeographic approach

Background The prevalence and the origin of HIV-1 subtype B, the most prevalent circulating clade among the long-term residents in Europe, have been studied extensively. However the spatial diffusion of the epidemic from the perspective of the virus has not previously been traced. Results In the current study we inferred the migration history of HIV-1 subtype B by way of a phylogeography of viral sequences sampled from 16 European countries and Israel. Migration events were inferred from viral phylogenies by character reconstruction using parsimony. With regard to the spatial dispersal of the HIV subtype B sequences across viral phylogenies, in most of the countries in Europe the epidemic was introduced by multiple sources and subsequently spread within local networks. Poland provides an exception where most of the infections were the result of a single point introduction. According to the significant migratory pathways, we show that there are considerable differences across Europe. Specifically, Greece, Portugal, Serbia and Spain, provide sources shedding HIV-1; Austria, Belgium and Luxembourg, on the other hand, are migratory targets, while for Denmark, Germany, Italy, Israel, Norway, the Netherlands, Sweden, Switzerland and the UK we inferred significant bidirectional migration. For Poland no significant migratory pathways were inferred. Conclusion Subtype B phylogeographies provide a new insight about the geographical distribution of viral lineages, as well as the significant pathways of virus dispersal across Europe, suggesting that intervention strategies should also address tourists, travellers and migrants.


Background
Pandemic HIV-1 group M infection originated in Africa from the simian immunodeficiency virus (SIVcpz) infecting chimpanzees [1][2][3][4][5][6]. The subtype B epidemic in the United States and elsewhere, was the result of a single point introduction -migration -of the virus from Haiti around the late sixties [7,8]. The introduction of HIV-1 into Europe occurred mainly through homosexual contacts or needle sharing in or from the USA [9][10][11][12][13], or through heterosexual contacts with individuals from Central Africa [14,15]. At the beginning of the HIV-1 epidemic (the early 1980's) the prevalence of HIV-1 infection was higher among men having sex with other men (MSM) than among heterosexuals. For this reason and also because subtype B was identified at a high prevalence among MSM in the USA, it was the predominant clade in Europe. The prevalence of non-B subtypes in Europe has been increasing over the last years [16][17][18][19][20][21][22][23][24][25][26][27][28][29][30][31]. However, the AIDS epidemic among the long-term residents is still dominated by viruses assigned to subtype B [32,33].
RNA viruses, such as the HIV-1, provide measurably evolving populations characterized by very high nucleotide substitution rate [34,35]. Phylogenies can be used for molecular epidemiology studies and notably they contain information about temporal and spatial dynamics of the virus [36]. The latter is the geographic pattern of viral lineages sampled from different localities, also termed as phylogeography, tracking the migration of the virus. For several viral infections, the dispersal of the parasite and its host cannot be easily tracked, therefore suggesting that phylogenies may be a better way to monitor migratory pathways of the virus [37,38]. This methodology has been recently applied to phylogeographic studies of influenza A (H5N1) [37] and HCV [39] epidemics showing the pathways of viral dispersal.
Thus, phylogenies are the 'state of the art' in characterizing viral genealogy and evolution and also serve as tools to track migration for organisms for which there is no other way to monitor their dispersal [38]. Although several phylogenetic studies have analyzed HIV-1 clades by geographic region in Europe, none has inferred the history of virus's migration through its phylogeny. In the present study, we inferred the migration history of HIV-1 virus among 17 countries in Europe, by way of a phylogeography of subtype B sequences.

Results
Migration events were inferred through virus phylogenies by using the Slatkin and Maddison's method [40] (illustrated in Figure 1). Trees were built by maximum likelihood (ML) methodology and countries from which sequences were sampled were assigned to the tips of the 10 3 ML bootstrap trees. Inclusion of a large number of phylogenies takes into account phylogenetic uncertainty, because migration events are estimated over a set of trees rather than a single one.

Phylogenetic analyses
Phylogenies of subtype B sequences from 16 countries in Europe and Israel (Table 1) showed no considerable grouping of sequences by country, however in the case of Poland most of the sequences (65, 72%) formed a single monophyletic clade (Figure 2). Similarly a fraction of sequences from Austria (16, 18%), Luxembourg (13, 14%) and Portugal (20, 22%) fell within single clusters, however the number of viral lineages spreading within local transmission networks was much lower in these areas than in Poland. Notably, in Poland individuals This tree contains 8 sequences sampled from 2 countries (A and B) Figure 1 This tree contains 8 sequences sampled from 2 countries (A and B). Tips (HIV-1 sequences) were labelled according to its sampling country. A. If there are no epidemiological links between the two populations A and B, viral sequences will consist of two monophyletic groups, therefore representing distinct epidemics. B. In case that an individual sampled within population B acquired the infection in geographic area A, one branch sampled from population B would cluster within the monophyletic clade of the population A. The migration pattern for each country was estimated by counting "state" (county label) changes at each internal node of the tree by the criterion of parsimony. For each country we counted "exporting" (From) and "importing" (To) migration events. Specifically, as shown in Fig. 1b, a state change (A-B) is counted as an exporting migration event for country A and as importing for B. In our study migration events correspond to mobility of HIV-1 strains or infections and, therefore, inferred exporting or importing migration events are proportional to country-wise mobility of HIV-1 subtype B strains.
infected locally were mainly IDUs (39/65, 60%). Bayesian phylogenetic methods were used to further confirm the monophyletic nature of the B sequences from Poland, Austria, Luxembourg and Portugal. The final analysis was performed including a few sequences of the different monophyletic clusters identified in the ML trees and 1-2 from the other countries as references. Sequences again appeared as monophyletic in this analysis, with high posterior probability support (>0.8; data not shown), further supporting our previous results.
ML phylogenies suggest that sequences from the rest of Europe show distinct grouping patterns. Specifically a number of sequences for each locality cluster within short monophyletic clades (approximately consisting of 2-6 sequences), or others show no grouping according to their geographic origin ( Figure 2E). These findings suggest that except in the case of Poland and also to a lesser extend for Austria, Portugal, Luxembourg, where a considerable percentage of infections were the result of single migration and subsequent spread among the local population, for the rest of countries there is a high level of mixing across Europe.
For patients recruited in the prospective study, information on the most likely origin of the HIV infection was collected through a questionnaire. Among them, 572 sequences were used in the current analysis. Interestingly, among those for whom this information was available (456 patients), 90.4% claimed that they acquired the subtype B.

Statistical Phylogeography
To test the significance of specific pathways of location changes (migration events) between countries, we estimated the expected number of changes, under the null hypothesis of complete geographic mixing, for each pair of countries (Tables S1 and S2 in Additional file 1), as described previously [37,39]. The total number of location changes between countries (migration events) for all trees was significantly lower than expected by chance under the null hypothesis of panmixis confirming that, although there is a high level of HIV dispersal between countries, there is still geographic subdivision among the subtype B lineages analyzed. Moreover, the results of this test showed major differences across Europe (Additional files 2 and 3). In particular, for Austria, Luxembourg and Poland no significant exporting migration was observed, while for the latter importing migration was also not significant; therefore classifying Poland as the country with the lowest HIV migration -or, in other words, with the most isolated HIV epidemic among the countries analysed ( Figure 3). For Austria, and Luxembourg, on the other hand, there was evidence that some of the subtype B infections were the result of migration from Italy and Portugal, Switzerland, respectively; while similarly to Poland no significant outgoing migration was observed. According to the ML trees, only a few sequences from Israel and Greece fell within the Polish monophyletic cluster, suggesting limited migration to the latter countries ( Figure 2D).
Germany, Greece, Italy, Norway, the Netherlands, Portugal, Spain, Serbia, Switzerland, and the UK appeared as source of subtype B mobility (high levels of exporting migration; "From") to other countries (Additional files 2 and 3). In case that significant migration was detected from a country to more than 2 others, the former was designated as "exporter". Notably, Greece's migratory targets were dispersed to 7 countries, while for both Spain and the Netherlands; they were to 5 and 6 countries, respectively ( Figure 3). High levels of HIV migration -with regard to the highest difference between the observed and the expected migration events under panmixis -were detected from Italy to Austria and Switzerland, from Portugal to Luxembourg and also from the Netherlands to Germany (Table S2 in Additional file 1). On the other hand, Belgium, Denmark, Sweden and Israel showed only limited export of HIV-1 subtype B (Additional files 2 and 3).
Major migratory targets of HIV-1 subtype B (importing migration; "To") were Austria, Belgium, Germany, Italy, Luxembourg, Norway, the Netherlands, Sweden, Spain, Switzerland, and the UK (a similar criterion as for the "From" migration was used to assign countries) (Additional files 4 and 5), while limited migration was observed into Serbia and Israel (Supplementary information Figure 1c, d in Additional files 4 and 5) (in case that significant migration was detected from a country to more than 2 others, the former was designated as "exporter"). Notably, except from Poland, significant importing migration was detected for all countries across Europe ( Figure 3).
Parts of the phylogenetic tree inferred for subtype B sequences sampled across Europe Based on these findings, evidence for directional HIV dispersion was detected where Spain, Greece, Portugal and Serbia acted as sources of migration events ("exporters") ( Figure 3); Austria, Belgium, and Luxembourg (Luxembourg and Austria were classified within the "importers" due to the high migration (>7) inferred from Portugal towards Luxembourg), provided migratory targets ("importers") ( Figure 3), while significant bidirectional HIV migration was found for Denmark, Germany, Italy, Israel, Norway, the Netherlands, Sweden, Switzerland and the UK (Figure 3). Israel and Sweden were classified among localities with bidirectional migration because in both countries significant bidirectional mobility was detected. In contrast, for Poland, no significantly importing or exporting migration was found that is in accordance with the high percentage of sequences grouping according to the sampling location.
To further confirm our findings all steps of the analyses (phylogenetic analysis with ML bootstrapping, inference of migration events and statistical phylogeography) were repeated in a 2 nd run. Notably, migration events inferred on 10 3 newly inferred ML bootstrap trees were almost identical to the previous (R 2 = 0.98, p < 0.001; data not shown). Moreover, statistical phylogeography revealed that out of 46 and 50 significantly high migration events inferred in the two rounds of analyses, 43 were identical, thus suggesting that the major migratory pathways were reproducible.

Discussion
Our results based on a phylogeographic study of a large number of sequences sampled from 16 [32,53]. Moreover, because of its policies, the Netherlands attracts foreign drug users and male homosexuals, two populations known to be at higher risk for HIV infection [51].
Migratory pathways inferred through viral phylogenies cannot be directly validated by other sources of information (epidemiological figures, mobility and immigration information, tourism, etc), because these data are not stratified by subtype. Moreover, due to the high mobility of population within Europe and the complexity of the epidemic spread, information about the locus of infection for an individual doesn't necessarily match with the geographic origin of the source. On the other hand, phylogenetic analysis of viral sequences provides a realistic approach for the reconstruction of HIV transmission chains or networks [36,46,47,49,[54][55][56], therefore suggesting that statistical phylogeography is appropriate for inferring the spatial dispersal of a viral epidemic.
Given the high complexity of the epidemic, dense sampling is needed in order to accurately reconstruct the spatial characteristics of the subtype B infections in Europe.
This provides one of the limitations of this study; on the other hand however the analysis of our dataset, which is the largest available at the time of analysis, provides for a first time a description of the geographic distribution of viral lineages as well as the significant migrations of HIV subtype B across Europe, by means of viral phylogenies. Dense sampling for each locality would be ideal for such purposes; however limited availability of sequences for several countries, as well as computation time provide as the major limitations for such a study.
We paid special attention to representativeness of our data. The prospective SPREAD collection strategy (data from 2002-2004) was specifically designed to avoid such a bias [53], while the retrospectively collected CATCH data (1996)(1997)(1998)(1999)(2000)(2001)(2002) were sampled as part of national surveillance studies designed to investigate the transmission of drug resistance or as part of the standard clinical practice of baseline sequencing for all newly diagnosed cases in each participating center [57]. For most countries where national data were available, the data were a rather good representation of the national epidemic.
In conclusion, HIV-1 subtype B phylogeographies provide a new insight for the first time into the pathways of spatial diffusion and virus migration across Europe. HIV-1 subtype B was each time introduced from multiple sources and subsequently spread locally, but the pattern is not uniform across Europe. The countries grouped into sources (Greece, Portugal, Serbia and Spain) and sinks (Austria, Belgium and Luxembourg) of virus migration, as well as countries with significant bidirectional migration (Denmark, Germany, Italy, Israel, Norway, the Netherlands, Sweden, Switzerland and the UK). The only exception was Poland where a significant number of sequences fell within a monophyletic cluster. These results suggest that mobility of the virus matches mobility of the host, such that in order to reduce further spread of the epidemic, prevention measures should not only be directed towards national populations, but also towards migrants, travellers and tourists who are the major sources and targets of HIV dispersal.

HIV-1 sequences
Protease (PR) and partial reverse transcriptase (RT) sequences were sampled from HIV-1 seropositive individuals who had never received antiretroviral drugs (ARV) as described previously [53,57]. Specifically, partial PR/RT sequences were sampled from 17 countries in Europe including Israel. Sequences were collected from two studies, the Combined Analysis of Resistance Transmission over Time of Chronically and Acute Infected HIV Patients; (CATCH), in a retrospective setting [57] and a prospective study named after Strategy to Control SPREAD of HIV Drug Resistance (SPREAD) [53]. In the CATCH analysis all sequences were collected during 1996-2002 from geographically distinct centres across the participating countries, except for Belgium and the Netherlands, where HIV-1 sequences were sampled from a single geographic area.
In the prospective setting (SPREAD), samples were collected during 2002-2004 according to two different approaches in order to ensure representative sampling [53]. Notably although data from the period 1996-2002 were retrospectively analyzed, they were collected as part of national surveillance studies designed to investigate the transmission of drug resistance or of the standard clinical practice of baseline sequencing for all newly diagnosed cases in each participating center [57]. In the prospective setting a standardized sampling strategy was designed in order to ensure representative sampling in all countries [53]. For the purpose of this study we included only those classified as subtype B. All individuals were sampled at a single time point. The subtyping process was performed by phylogenetic analysis [53,57]. The prevalence of the transmission risk groups among the study population is shown in Table 1.

Phylogenetic analyses Sampling strategy
For the estimation of country-wise clustering (migration), first we need to infer the phylogenies of the sequences under study. One of the issues to be addressed was how many sequences needed to be included for each country. The dataset size needs to be large enough as: 1) to include most of the available information from each country and 2) to estimate rare migration events. On the other hand, we had to restrict the number of sequences to keep the computation time needed for phylogenetic inference reasonable, while maintaining an informative number of sequences required for the calculation of migration events. For this reason, we performed a preliminary analysis of migration for 4 countries including 10, 20 25 or 90 sequences per country. For each dataset, we tested whether the distribution of the total number of migration events across the set of all credible trees differed significantly from a distribution of randomly generated trees (phylogenetic inference was performed by ML method). The results of this preliminary analysis showed that with 25 sequences per country, the largest number of countries reached significantly different migration levels than compared to the distribution for a random set of trees (P < 0.01). However the larger the number of sequences included per country the higher the signal for clustering with regard the total number of changes across inferred versus random set of trees.
Consequently, we included in the analyses the largest number of sequences (90) available per country, expect from Belgium, Greece, the Netherlands, Israel, Norway and Serbia for which a smaller number of sequences was available, however only for the last three countries the number of sequences included was << 90. As a result of choosing approximately equal number of strains per country, irrespective of the prevalence or the total number of infected individuals across Europe, we calculated the relative mobility per infected individual. Therefore, the numbers in the migration matrices are directly comparable reflecting actual differences in mobility between countries. For example, we estimated higher migration from the UK to Spain (5.34), than from Germany to Italy (3.23) ( Table S2 in Additional file 1).
Phylogenetic analyses for the estimation of the migration process were performed in a single dataset consisting of 1337 sequences analyzed in two independent runs (Table  1).

Alignment and phylogenetic tree reconstruction
The alignment of the subtype B partial RT sequences sampled from 1337 individuals was performed using CLUS-TAL W version 1.74 [58] and manually edited according to the encoded reading frame. In order to avoid any bias due to convergent evolution at antiretroviral drug resistance mutations on the phylogenetic analysis, we excluded all sites associated with major resistance in PR ( Phylogenetic trees were inferred by maximum likelihood method under the general time-reversible GTR model of nucleotide substitution including a Γ distributed rates heterogeneity among sites as implemented in RAxML [59]. Bootstrapping was performed on the maximum likelihood trees (1000 replicates) to assess the reliability of the obtained topologies.

Inference of migration events
All bootstrap generated trees (10 3 ) were used for the estimation of the HIV-1 migration events by using the cladistic approach first described by Slatkin and Maddison [40], as implemented in MacClade [60]. Specifically, all the nodes of the inferred trees were assigned with a character according to the geographic origin (e.g. 0, 1, 2, 3 for Austria, Belgium, Denmark, France, etc). The algorithm reconstructs "ancestral" states that in our case correspond to countries, at each internal node by the criterion of parsimony [40]. Parsimony selects the reconstruction that minimizes the total number of steps on the tree [41].
When two branches from 2 different locations (e.g. 0 and 1) join with each other, and thus more than one character can be reconstructed at the node, then the ancestor state at the internal node is assigned to be the union of the two characters [0, 1] that is assigned a migration event. If this number between two groups of sequences remains low, the possibility for migration events between these particular groups also remains low.
Specifically, the migration events between HIV-1 sequences sampled in different locations were estimated for each dataset according to the following method: 1) for nodes with more than one equally parsimonious reconstructions (e.g. 0, 1 or 0), implicit examination of all most parsimonious reconstructions (MPRs) was used in case of a big number of MPRs [61,62], while explicit examination was used in case of a small number of MPR, as implemented in MacClade. As a result, for a particular type of character change, e.g. [0,1] MacClade reports a minimum, a maximum and a average number of [0,1] changes estimated over all possible MPRs. We estimated the average number of migration events for each tree used in the analyses. 2) Polytomies that correspond to nodes with more than two descendant nodes were interpreted as regions of uncertain evolution (soft polytomies) as implemented in MacClade.

Inference of migration matrices
For each dataset a 17 × 17 migration matrix was estimated between HIV-1 sequences sampled in different European countries. Each migration event was calculated as the median of the distribution estimated from all trees (10 3 ) used in the analysis. In the matrix, all 'from' events and 'to' events are pooled per country.

Statistical phylogeography
To further estimate which migration events were significantly different from the expected number of changes under the null hypothesis of full geographic mixing of HIV-1 sequences, we estimated if the distribution for each of the migration events estimated over 10 3 bootstrap trees was statistically different from the distribution estimated from the same set of trees (10 3 ) after reshuffling taxa at the tips. This analysis was performed using Mesquite [63]. Equality of medians between observed and expected migration events was assessed by means of the Kruskal-Wallis one-way analysis of variance and the level of significance was adjusted according to Bonferroni correction for multiple comparisons.
The differences between the observed and the expected values indicate the levels of HIV-1 country-dependent structure in the dataset, and thus also of the relative mobility of the virus between countries. This strategy allowed estimating significant differences also when an unequal number of strains were included per country.
Notably in order to assess the validity of our results, the whole process of phylogenetic analysis, inference of migration events and statistical phylogeography was repeated twice.