Patterns of evolution of host proteins involved in retroviral pathogenesis

Background Evolutionary analysis may serve as a useful approach to identify and characterize host defense and viral proteins involved in genetic conflicts. We analyzed patterns of coding sequence evolution of genes with known (TRIM5α and APOBEC3G) or suspected (TRIM19/PML) roles in virus restriction, or in viral pathogenesis (PPIA, encoding Cyclophilin A), in the same set of human and non-human primate species. Results and conclusion This analysis revealed previously unidentified clusters of positively selected sites in APOBEC3G and TRIM5α that may delineate new virus-interaction domains. In contrast, our evolutionary analyses suggest that PPIA is not under diversifying selection in primates, consistent with the interaction of Cyclophilin A being limited to the HIV-1M/SIVcpz lineage. The strong sequence conservation of the TRIM19/PML sequences among primates suggests that this gene does not play a role in antiretroviral defense.


Background
Evolutionary genomics approaches have been proposed as powerful tools to identify protein regions relevant for host-pathogen interactions [1]. Identifying signatures of genetic conflict can open the way to biological testing of hypotheses regarding the function of host proteins. In retrovirology, the utility of this approach was recently demonstrated in evolutionary analyses of the antiretroviral defense genes TRIM5α, encoding a retrovirus restriction factor targeting the viral capsid [2,3], and APOBEC3G, coding for a cytidine deaminase that hypermutates viral DNA in primates [4][5][6]. Both genes were shown to have been shaped by positive selection, which led to the rapid fixation of adaptive amino acid replacement substitu-tions. The two genes revealed two different patterns of positive selection: a localized region of rapid change in TRIM5α [3], and a pattern where positively selected residues are scattered throughout the sequence in APOBEC3G [5].

APOBEC3G
To trace the evolutionary history of these genes, we first sequenced their coding regions from eleven primate species [see Additional files 1 and 2]. We then analyzed their substitutional patterns in the framework of the accepted primate phylogeny [7] using several codon-based maximum likelihood procedures as implemented in the codeml tool of the PAML program package [12] ( Figure  1).
To obtain an overview of the coding sequence evolution, we estimated the number of nonsynonymous (KA) over synonymous (KS) substitutions per site (averaged over the entire sequence) for each branch of the trees using the free-ratio model of codeml [12]. Similarly to previous reports [3,5,6], this analysis revealed generally high KA/KS values on the different branches of the TRIM5α and APOBEC3G trees (average KA/KS ~1.1 for both genes), indicating that these genes show accelerated amino acid replacement rates due to the action of positive selection [13]. In contrast, PPIA and TRIM19 (PML) show low KA/ KS values (0.05 and 0.15, respectively, when averaged over the entire tree), suggesting that their protein sequences have been strongly preserved by purifying selection (Figure 1).
In more detailed analyses, we then utilized models that allow for different K A /K S rates at different sites of the sequences, because adaptive evolution often occurs at a limited number of sites [14]. We first compared a null model ("M1a", [15,16]), which assumes two site classes (sites under purifying selection and neutrally evolving sites), to an alternative model ("M2a", [15,16]), which adds a third site class that allows for sites with K A /K S > 1, using likelihood ratio tests [17]. This comparison revealed that the alternative model provides a significantly better fit (P < 10 -30 ) for the TRIM5α and APOBEC3G genes than the null model, whereas the null model could not be rejected for TRIM19 and PPIA ( Table 1). The K A /K S for the additional site class is larger than 1 for both TRIM5α (K A / K S ~6.4) and APOBEC3G (K A /K S ~4.4), strongly suggesting adaptive protein evolution driven by positive selection at a subset of sites. Thus, this analysis supports the hypothesis that TRIM5α and APOBEC3G evolved under positive selection. Contrary to this, nearly all sites of TRIM19 and PPIA (91.5% and 100%, respectively) are under purifying selection (Table 1).

Protein domains
Using a recently developed Bayesian approach [16], we analyzed the site class under positive selection in TRIM5α and APOBEC3G in more detail. For TRIM5α, 11 of 493 (2%) codon sites can be predicted to be positively selected with high confidence (P > 0.95, Figure 2A). Two clusters of positive selection are found in the SPRY domain. The first cluster resides between amino acids 322 to 340 in the variable region 1 (v1, [18]), a region previously described as a "patch" of positive selection [3]. Replacement of the v1 region, or of specific amino acids within v1, modifies the restriction pattern of TRIM5α [19,20]. The second cluster, localized between amino acids 381 to 389, corresponds to the previously described variable region v2 of the SPRY domain [18]. Substitution of the human v2 region by a Rhesus monkey v2 exhibits no inhibitory activity against HIV-1 or a N-MLV L117H chimera [19,20]. However, the role of v2 in species-specific lentiviral restriction has not yet been extensively tested.
The analysis also predicts a large number (24 of 384, 6%) of positively selected sites in the APOBEC3G ( Figure 2B) sequence. This result is consistent with previous reports by Sawyer et al. [5]. However, the inclusion of several new species from an additional hominoid lineage, Hylobatidae (gibbons and siamangs), points to the existence of a cluster of residues under positive selection between amino acids 62 and 103, the region that defines the Vifinteraction domain [21]. The protein Vif, which counteracts the activity of APOBEC3G, is encoded by nearly all lentiviruses [22]. Within the Vif-interaction domain of APOBEC3G, 10 residues can be pinpointed to have evolved under strong positive selection. Interestingly, the APOBEC3G amino acid position 128, which controls the ability of the HIV-1 Vif protein to bind and inactivate this host defense factor [23,24], is correctly identified as being positively selected (P > 0.987).
The parallel assessment of multiple genes in the same set of primates allows for several considerations and conclusions. First, by including additional primate lineages, we modify and complement previously observed patterns for two antiviral defense genes/proteins. For TRIM5α, our analysis confirms previous results by Sawyer et al [3], but underscores the potential interest of the second variable region of the SPRY domain that may be of functional relevance and merits further experimental analysis. With respect to APOBEC3G, our analysis extends previous reports that showed protein-wide distribution of positively selected residues. It suggests that this protein potentially carries a functionally relevant cluster of selected residues that coincides with the region of HIV-1-Vif interaction [23,24]. Positive selected sites by Bayes Empirical Bayes Inference with probabilities P > 0.95 for the two proteins are listed in Additional file 3.
Second, the failure to identify signatures of positive selection in the TRIM19 (PML) gene suggests that its encoded protein does not have antiviral activity, or that the protein acts as an intermediary, lacking a physical protein-protein interaction with the pathogen. TRIM19 (PML) has been implicated in many functions, for example, in apoptosis and cell proliferation [9]. In addition, TRIM19 (PML) expression may act as an effector of the antiviral state induced by type I interferons [9]. Overexpression of TRIM19 (PML) is reported to confer resistance to infection by vesicular stomatitis virus and influenza A virus. Rabies, Lassa virus and lymphocytic choriomeningitis virus replicate to higher levels in PML-negative cells, whereas overexpression of the protein has no significant effect. Various roles have been proposed for TRIM19 (PML) in retroviral replication [8,25], although these findings remain controversial [26]. Many other viruses, including herpes simplex type 1 disturb the nuclear bodies that contain, among other proteins, TRIM19 (PML). However, it is unclear whether these effects are a consequence of the viral infection or a sign of its participation in antiviral defense. Thus, the effect of TRIM19 (PML) might be indirect. Failure to identify a signature of positive selection militates against a direct role of this protein in antiviral defense, because it would be expected that a prolonged contact with multiple pathogens over long evolutionary time periods would have resulted in signatures of positive selection indicative of a genetic conflict.
Finally, the absence of a signature of positive Darwinian selection in Cyclophilin A provides a complement to the understanding of the role of this protein in retroviral pathogenesis. Cyclophilin A interacts directly with the HIV-1 capsid, an interaction that may protect HIV-1 from antiviral restriction activity [27]. Although required by members of the HIV-1M/SIV CPZ lineage for replication, it is not needed by other primate immunodeficiency viruses [11]. Owl monkeys exhibit post-entry restriction of HIV-1 mediated by a TRIM5-Cyclophilin A fusion protein generated by retroposition [28]. Evolutionary analysis of PPIA indicates that Cyclophilin A has been preserved by strong purifying selection, leaving its protein sequence virtually unchanged. This is consistent with the interaction of Cyclophilin A and the viral capsid being limited to the HIV-1M/SIVcpz lineage.
Together, the results presented here further support that an evolutionary genomics approach may be very useful for systematically assessing functional roles of primate host proteins potentially relevant in viral pathogenesis [29]. Candidates for this approach may include other members of the TRIM or APOBEC families [30,31] as well as proteins involved in pathogen recognition and life cycle. Signatures of positive selection, but also the absence of signs of a genetic conflict, constitute relevant informa-