Discovery of novel targets for multi-epitope vaccines: Screening of HIV-1 genomes using association rule mining
© Paul and Piontkivska; licensee BioMed Central Ltd. 2009
Received: 14 April 2009
Accepted: 06 July 2009
Published: 06 July 2009
Studies have shown that in the genome of human immunodeficiency virus (HIV-1) regions responsible for interactions with the host's immune system, namely, cytotoxic T-lymphocyte (CTL) epitopes tend to cluster together in relatively conserved regions. On the other hand, "epitope-less" regions or regions with relatively low density of epitopes tend to be more variable. However, very little is known about relationships among epitopes from different genes, in other words, whether particular epitopes from different genes would occur together in the same viral genome. To identify CTL epitopes in different genes that co-occur in HIV genomes, association rule mining was used.
Using a set of 189 best-defined HIV-1 CTL/CD8+ epitopes from 9 different protein-coding genes, as described by Frahm, Linde & Brander (2007), we examined the complete genomic sequences of 62 reference HIV sequences (including 13 subtypes and sub-subtypes with approximately 4 representative sequences for each subtype or sub-subtype, and 18 circulating recombinant forms). The results showed that despite inclusion of recombinant sequences that would be expected to break-up associations of epitopes in different genes when two different genomes are recombined, there exist particular combinations of epitopes (epitope associations) that occur repeatedly across the world-wide population of HIV-1. For example, Pol epitope LFLDGIDKA is found to be significantly associated with epitopes GHQAAMQML and FLKEKGGL from Gag and Nef, respectively, and this association rule is observed even among circulating recombinant forms.
We have identified CTL epitope combinations co-occurring in HIV-1 genomes including different subtypes and recombinant forms. Such co-occurrence has important implications for design of complex vaccines (multi-epitope vaccines) and/or drugs that would target multiple HIV-1 regions at once and, thus, may be expected to overcome challenges associated with viral escape.
In the course of viral infection, recognition of viral peptides by class I major histocompatibility complex (MHC) molecules and subsequent interactions of the peptide/MCH complex with the cytotoxic T lymphocytes (CTLs, or CD8+ T cells) plays an important role in the control of the infection [1, 2]. Viral CTL epitopes (which are short viral peptides recognized by the immune system components, CTL and MHC class I molecules) are an integral – and critical – part of this recognition process, and amino acid changes at CTL epitopes have been shown to play a role in viral "escape" (in other words, evading recognition by the immune system) in human (HIV) and simian (SIV) immunodeficiency viruses [3–8]. In particular, in HIV certain CTL epitopes are subjected to consistent selective pressure from the host's immune system, leading to rapid accumulation of amino acid changes, while other CTL epitopes evolve under purifying selection pressure [9, 10]. Furthermore, rapidly accumulating genetic diversity in the global HIV-1 pandemic  underlies a great need to develop vaccines that are protective against multiple subtypes and strains simultaneously.
The epitope-vaccine approach has been suggested as a strategy to circumvent the rapid rate of mutations in HIV-1 and the subsequent viral escape from the host's immune system as well as the development of resistance to anti-viral drugs [12–14]. The inclusion of CTL epitope sequences in vaccines has several advantages, including a possibility of targeting a majority of viral variants if highly conserved epitopes are used. Likewise, when epitopes from different genes or genomic regions are included in the same vaccine, such multi-epitope vaccines can induce broader cellular immune responses [15, 16].
Several strategies can be used to develop multi-epitope vaccines, including (a) the generation of tetramer epitope vaccines with epitopes being chosen based on the presence of principal neutralizing determinant , (b) the generation of synthetic peptides with prediction of the candidate epitopes based on the peptide binding affinity of anchor residues in silico, focusing on those capable of binding to multiple HLA alleles , (c) the juxtaposition of multiple HLA-DR-restricted HTL epitopes  with epitope identification by screening of HIV-1 antigens for peptides that contain the HLA-DR-supertype binding motif . However, inherent limitation of in-silico epitope predictions is generating a rather large number of initially predicted epitopes, many of which are false positives; and hence, there exists a need for subsequent experimental validation of many potential candidates [20–24]. Furthermore, because of the enormous genetic diversity of HIV, some predicted epitope candidates may be specific to only certain subtypes [21, 25, 26], whereas relying primarily on the extent of amino acid sequence conservation does not determine the potential immunogenicity . Other methods, such as artificial neural networks  and hidden Markov models , also have limitations, such as adjustable values whose optimal values are hard to find initially, over fitting, overtraining and interpreting . For example, in a study by Anderson et al. (2000) on experimental binding of 84 peptides to class I MHC molecules , there was no correlation between predicted versus experimental binding, and a high possibility of false-negatives. Thus, in this study we develop a novel strategy to identify best epitope candidates for multi-epitope vaccines from the pool of experimentally well-supported epitopes based on the association-rule mining technique.
Briefly, an association rule mining technique, which is a method that can detect association between items (frequent item sets) and formulate conditional implication rules among them [31–33], is used to examine relationships between 218 "best-defined" CTL epitopes (from the list of Frahm, Linde & Brander, 2007 ). Our results show that some CTL epitopes are significantly associated with each other so that they co-occur together in the majority of the reference viral genomes including circulating recombinant forms. At least 23 association rules were identified that involve CTL epitopes from 3 different genes, Gag, Pol and Nef, respectively. We also identified several combinations of 3 to 5 CTL epitopes that are frequently found together in the same viral genome despite high mutation and recombination rates found in HIV-1 genomes, and thus, can be used as likely candidates for multi-epitope vaccine development.
Materials and methods
HIV-1 genomic sequence data and alignment
List of 62 HIV-1 reference sequences (including 44 non-recombinant sequences, grouped by subtypes, and 18 circulating recombinant forms (CRFs) included in the study (2005 subtype reference set of the HIV sequence database, Los Alamos National Laboratory).
The summary of the average numbers of breakpoints in the CRF genomes was based on the breakpoint maps summarized at the HIV database at Los Alamos .
The set of 218 CTL epitopes, described as "the best-defined HIV CTL epitopes" by Frahm, Linde & Brander (2007)  that included only those epitopes supported by strong experimental evidence in humans, was used. These epitopes, together with their respective genomic coordinates according to the reference HXB2 sequence (GenBank accession number K03455) , are described in Additional file 1.
Selecting epitopes for association rule mining
To determine whether the same associations exist among non-recombinant and circulating recombinant forms (CRFs), three data sets were created. The first sequence set (designated later as "62-all") included all 62 HIV-1 reference sequences used in the study, the second set included only 44 non-recombinant sequences ("44-non-CRFs") and the third set included 18 CRFs (designated as "18-CRFs"). Because of the requirement that an epitope be present as a "perfect match" in at least one sequence as described above, 1 and 29 epitopes were removed from the epitope lists for the second and third data sets, respectively. This resulted in lists of 188 and 160 epitopes, respectively (Additional file 1).
Additionally, one hundred "pseudo-datasets" of 62 sequences each (62 × 100) was created by randomly selecting sequences from the original sequence set (random sampling with replacement). Similarly to the bootstrap test widely used in phylogenetics , these pseudo-sets were used as controls to determine the significance of detected associations using the same threshold as the 62-all data set (i.e., 75% support and 95% confidence), in other words, whether identified associations in our original 62 sequence set could be attributed to the overrepresentation of certain sequence types by chance. The number of epitopes analyzed in each data set is given in Additional file 2. It should be noted that essentially the same association rules were identified in the pseudo-datasets as they were in the 62-all data set, which is consistent with the expectations that high values of support and confidence constraints used here already prune away most of the insignificant rules .
Association rule mining
Association rule mining is a data mining technique that discovers relationships (associations, or rules) that exist within a data set [31–33, 40]. One of the commonly known applications of association rule mining is "market basket" analysis [40–42]. However, in addition to marketing analysis, association rule mining has many useful applications to answer biological problems, including the discovery of relationships between genotypes and phenotypes in bacterial genomes , predicting drug resistance in HIV , and predicting MHC-peptide binding . In this study, association rule mining was used to discover novel relationships between CTL epitopes that consistently co-occur together in viral genomes despite high mutation and recombination rates, so that such epitopes can be used as promising candidates in the design of multi-epitope vaccines.
Summary of the discovered CTL epitope association rules.
Number of epitope associations with support >= 0.75 * & confidence >= 0.95
Unique epitope associations#
Associations with 2 epitopes $
Associations with 3 epitopes
Associations with 4 epitopes
Associations with 5 epitopes
Associations with 6 epitopes
Associations with 7 epitopes
Unique epitope associations with epitopes from only one gene
Epitopes from Gag only
Epitopes from Pol only
Epitopes from Nef only
Unique epitope associations with epitopes from two genes
Unique epitope associations with epitopes from all three genes (Gag-Pol-Nef)
Estimates of the nucleotide substitution rates
The relative degree of sequence divergence among reference sequences and different genomic regions was evaluated by comparing the number of synonymous and nonsynonymous substitutions. In particular, the number of synonymous nucleotide substitutions per synonymous site (dS) and the number of nonsynonymous nucleotide substitutions per nonsynonymous site (dN) were estimated by the Nei-Gojobori method  as implemented in the MEGA4 program . This simple method was used because it is expected to have lower variance than more complicated substitution models . The standard errors were estimated with 100 bootstrap replications. Pairwise dN and dS values were estimated for the so-called "associated" epitope regions (defined as epitopes that were found to be involved in any association rule), non-associated epitope regions (epitopes that were not involved in any association rule) and non-epitope regions (i.e., regions that did not harbor any CTL epitopes used in study), respectively.
Results and discussion
Mining for association rules
In order to identify CTL epitope regions that consistently co-occur together in the HIV-1 genomes, 189 CTL epitopes were mapped in 62 HIV-1 reference sequences (Table 1), where "perfect match" was recorded as "epitope presence", while one or more amino acid differences between the canonical CTL epitope sequence and respective HIV sequences were considered as "epitope absence", and association rule mining was applied to determine whether certain CTL epitopes consistently co-occurred together. Using the data mining tool WEKA [46, 47], the initial minimum support and confidence values were set to 0.75 and 0.95, respectively, to ensure that we identified only the most frequently co-occurring epitopes. In other words, a minimum support value of 75% ensures that only epitopes that are present as a "perfect match" in at least 75% of the sequences are included in association rules (e.g., epitope A is present in at least 46 sequences out of 62). The support for the 18-CRFs data set was later raised to 0.95 (i.e., even more conservative) to limit the overall number of associations because this data set generated a lot more association rules with 75% support compared to the other data sets, as it had 31 CTL epitopes with at least 75% support whereas those for the 62-All and 44-non-CRFs data sets were 25 and 26, respectively. On the other hand, a level of confidence set to 95% indicates that the identified association rule (e.g., epitope A being associated with epitope B) will be present in at least 95% of the sequences where epitope A occurs. In the case of 62 reference sequences, that means at least 44 reference sequences had both epitopes present.
Interestingly, one of the frequently associated epitopes found in three genes associations (Figure 2), Nef FLKEKGGL (HLA-B*08-restricted epitope) [56, 8], also referred to as B8-FL8 epitope, is a known frequently targeted highly immunodominant epitope in HLA-B*08 individuals that often elicits a strong epitope-specific CD8+ T-cell response [57, 58]. This epitope has also been shown to be targeted by specific T cell receptors that have unusually long complementarity determining regions 3 (CDR3) and capable of recognizing the escape mutants arising in that epitope, a response associated with slow disease progression . Furthermore, the strong amino acid sequence conservation at this epitope region identified in our study is consistent with the clinical data that indicated a rather limited capacity of the virus to tolerate amino acid changes at that epitope, as evidenced by the lack of amino acid variation in some patients with persistent and strong CTL response despite being infected for over 13 years [8, 58]. Overall, strong functional constraints on the virus and lower fitness of escape mutants are likely contributors to the high extent of sequence conservation of B8-FL8 epitope, and hence, it represents a promising vaccine candidate, although further studies are needed.
As Figure 2 shows, distribution of highly conserved epitope regions that participate in associations spanning three genes varied among and within genes. Notably, all 23 three-gene association rules included the same Nef epitope (B8-FL8 FLKEKGGL). The Pol gene had the highest number of associated epitopes (9) that differ from each other, while the Gag gene had 3 different epitopes involved in multiple association rules. Some of these associations included epitopes from the same adjacent/overlapping regions, e.g., Gag GLNKIVRMY is associated with the Pol IVTDSQYAL epitope and other adjacent/overlapping epitopes in at least 9 association rules (Figure 2). Other epitopes, such as Gag GHQAAMQML, instead participate in association rules that involved multiple non-overlapping epitope regions in the Pol gene. It is possible that different mechanisms are responsible for long-term evolutionary maintenance of different types of epitope associations, such as those that involve CTL epitopes from relatively closely located regions (within 200–300 codons apart), as well as associations that include epitopes from distantly located parts of the genome, although further studies are necessary. Overall, this approach allows us to identify co-evolving regions in viral genomes that are highly conserved at the amino acid level and are subjected to strong purifying selection eliminating the majority of amino acid changes that may occur in such regions.
Selection at CTL epitopes involved in association rules
Average pairwise dN and dS values estimated at non-epitope and CTL epitope regions.
P value *
CTL epitopes involved in association rules
CTL epitopes not involved in association rules
Significance of CTL epitope "participation" in the association rules
Properties of 22 CTL epitopes that frequently co-occur together in the reference HIV-1 genomes (per the 62-all sequence set).
Non overlapping genomic regions
Amino acid sequence
HLA allele *
Amino acid Coordinates
Number of "unique" association rules each epitope is involved
Number of association rules each region is involved
Overall, we were able to identify several highly conserved epitopes that are relatively widely spread across the worldwide HIV-1 population, and present not only in non-recombinant subtypes, but also in the circulating recombinant forms. Such highly conserved epitopes may be considered promising candidates for multi-epitope vaccine design, as they are likely to be targeted in a majority of HIV lineages, thereby increasing population coverage. However, in addition to being highly conserved, there are additional benefits in utilizing CTL epitopes identified as participants in association rules (such as those depicted on Figure 2). In particular, an association between epitopes generally implies that if one epitope from the rule is present in the viral genome, the other epitopes from the rule will also be present with high likelihood. Furthermore, because these epitopes may be located in different genes – and are often far apart from each other – a potential recombination – or a mutation – event may remove only some but not all target epitopes, and thus will only diminish the efficiency of a multi-epitope vaccine instead of completely disabling its action. Our earlier studies have identified at least 10 CTL epitope regions that exhibit evidence of persistent purifying selection (Piontkivska and Hughes 2004 , Table 2: http://jvi.asm.org/cgi/content/full/78/21/11758/T2 therein). Of these highly conserved epitopes, Pol epitope LFLDGIDKA (recognized by HLA-B81) is also found to be a part of several association rules identified in this study, including association rules spanning three genes and four CTL epitopes (in particular, 2 epitopes from Gag and 1 epitope from Pol and Nef, respectively), and as such, represents a promising candidate for multi-epitope vaccine development.
Because the HIV genomes and definitions of the CTL epitopes were drawn from the reference sequences and the list of "best-defined" epitopes of the HIV Sequence and HIV Immunology databases, respectively, neither patient's HLA haplotype, stage of infection nor CTL responses are known. However, some of the associated epitopes have been shown to be immunogenic in acute HIV-1 infection, particularly those participating in associations involving epitopes from three different genes, while some others have been shown to be strongly immunogenic in drug-naive patients (Additional file 4). Furthermore, while some CTL epitopes may certainly be prone to escape mutations when exposed to the immune pressure elicited by the restricting HLA allele, the associated epitopes identified in this study are recognized by different HLA alleles, with some combinations representing three different alleles from the same HLA locus. For example, epitope association of Gag SEGATPQDL, Pol LFLDGIDKA and Nef FLKEKGGL is recognized by the HLA-B*4001, B*81 and B*0801 alleles, respectively, and thus, it is unlikely to be recognized by all three alleles within the same patient. On the other hand, a recent study has shown that there is a promiscuity of some CTL epitopes where epitope presentation and CTL recognition can occur in the context of alternative, not restricting, HLA class I alleles, often from different HLA supertypes . As shown in Table 4, five of the 22 associated epitopes have been designated as promiscuous [per ], with at least one promiscuous epitope identified in each gene (Gag, Pol and Nef). Therefore, inclusion of these epitopes may potentially enhance the efficiency of a multi-epitope vaccine across a broader range of host HLA haplotypes (although "functionally homozygous" individuals who express both original and alternative HLA alleles may be at disadvantage [55, 60]). Further studies are needed to address the mechanisms of immune control of HIV infection through combinations of HLA alleles and CTL epitopes, particularly, promiscuous epitopes.
While our results demonstrated presence of several highly conserved – and identified to exist in association with each other – CTL epitopes in multiple HIV-1 reference genomes, including CRFs, the underlying functional significance of these regions for the virus remains poorly understood. Very few of the epitope regions found in association rules had such molecular features as glycosylstion, myristoylation, amidation, or phosphorylation sites. They also lacked any cell attachment motif or Leucine Zipper motif [61, 62]. Yet, the highly conserved nature of these CTL epitopes hints at major functional significance of these regions. One possibility is that the strong sequence conservation is driven by functional constraints related to potential RNA secondary and tertiary structures formed by genomic regions of these epitopes, individually or in combination with each other. In such case it may be expected that the overall extent of sequence divergence will be lower at these epitopes than elsewhere in the genome, and indeed, both dN and dS values were found to be lower at the associated epitopes than at the other epitopes or non-epitope regions (Table 3).
It is also noteworthy that some epitopes are not involved in any association despite being present in more than 75% of the reference sequences, hinting at some underlying mechanism that holds the "associated epitopes" together. It is possible that the associated epitopes from different genes co-evolve together because of functional and structural constraints due to protein-protein interactions that are necessary for many viral processes . Since some of the HIV proteins are expressed as polyproteins (such as Gag-Pol) , regulation of polypeptide processing in the cell is an important part of the viral life cycle and is often mediated by interactions between domains that belong to different processed proteins. For example, within Gag-Pol several regions that are located close to the N and C termini of protease (PR) have been shown to influence PR activation . Likewise, modulating reverse transcriptase (RT) activation has been shown to have an effect on Gag-Pol interaction and polypeptide processing , while interactions between C terminal flexible loop of Nef and Gag-Pol polyprotein are essential for HIV assembly . While molecular mechanisms of potential interactions involving associated epitope regions are currently unknown, these regions represent interesting candidates for future experimental studies to elucidate these interactions and their functional significance.
Our results revealed the presence of multiple associated co-evolving CTL epitope regions in HIV-1 genomes that are also significantly conserved across a broad range of HIV-1 subtypes and sub-subtypes. However, further studies are needed to ascertain the efficiency of these associated epitopes in multi-epitope vaccines as well as to uncover the underlying structural and/or functional constraints behind co-occurrences of the highly conserved epitopes.
Application of association rule mining revealed that certain CTL epitope combinations (including epitopes from three different genes) consistently co-occur in HIV-1 genomic sequences present in major geographic regions around the world. Such epitopes that are both well supported by experimental evidence and highly conserved across different non-recombinant and recombinant forms of HIV-1 genomes can be considered as ideal candidates for multi-epitope vaccines against HIV-1.
Human Leukocyte Antigen
This work was partially supported by the Kent State University Research Council.
- Klein J, Horejsi V: Immunology. 1997, Oxford: Blackwell Science, 2Google Scholar
- Bjorkman PJ, Saper MA, Samraoui B, Bennett WS, Strominger JL, Wiley DC: The foreign antigen binding site and T cell recognition regions of class I histocompatibility antigens. Nature. 1987, 329: 512-518.View ArticlePubMedGoogle Scholar
- Klenerman P, Wu Y, Phillips R: HIV: current opinion in escapology. Curr Opin Microbiol. 2002, 5: 408-413.View ArticlePubMedGoogle Scholar
- Goulder PJ, Watkins DI: HIV and SIV CTL escape: implications for vaccine design. Nat Rev Immunol. 2004, 4 (8): 630-640.View ArticlePubMedGoogle Scholar
- Altman JD, Feinberg MB: HIV escape: there and back again. Nat Med. 2004, 10: 229-230.View ArticlePubMedGoogle Scholar
- Allen TM, O'Connor DH, Jing P, Dzuris JL, Mothé BR, Vogel TU, Dunphy E, Liebl ME, Emerson C, Wilson N, Kunstman KJ, Wang X, Allison DB, Hughes AL, Desrosiers RC, Altman JD, Wolinsky SM, Sette A, Watkins DI: Tat-specific CTL select for SIV escape variants during resolution of primary viraemia. Nature. 2000, 407: 386-390.View ArticlePubMedGoogle Scholar
- O'Connor DH, McDermott AB, Krebs KC, Dodds EJ, Miller JE, Gonzalez EJ, Jacoby TJ, Yant L, Piontkivska H, Pantophlet R, Burton DR, Rehrauer WM, Wilson N, Hughes AL, Watkins DI: A dominant role for CD8+ T-lymphocyte selection in simian immunodeficiency virus sequence variation. J Virol. 2004, 78: 14012-14022.PubMed CentralView ArticlePubMedGoogle Scholar
- Price DA, Goulder PJ, Klenerman P, Sewell AK, Easterbrook PJ, Troop M, Bangham CR, Phillips RE: Positive selection of HIV-1 cytotoxic T lymphocyte escape variants during primary infection. Proc Natl Acad Sci USA. 1997, 94: 1890-1895.PubMed CentralView ArticlePubMedGoogle Scholar
- Piontkivska H, Hughes AL: Between-host evolution of cytotoxic T-lymphocyte epitopes in human immunodeficiency virus type 1: an approach based on phylogenetically independent comparisons. J Virol. 2004, 78 (21): 11758-11765.PubMed CentralView ArticlePubMedGoogle Scholar
- Piontkivska H, Hughes AL: Patterns of sequence evolution at epitopes for host antibodies and cytotoxic T-lymphocytes in human immunodeficiency virus type 1. Virus Res. 2006, 116: 98-105.View ArticlePubMedGoogle Scholar
- Yang X, Yang H, Zhou G, Zhao GP: Infectious disease in the genomic era. Annu Rev Genomics Hum Genet. 2008, 9: 21-48.View ArticlePubMedGoogle Scholar
- Chen YH, Xiao Y, Yu T, Dierich MP: Epitope vaccine: a new strategy against HIV-1. Immunol Today. 1999, 20: 588-589.View ArticlePubMedGoogle Scholar
- Xiao Y, Yun L, Chen Y-H: Epitope-vaccine as a new strategy against HIV-1 mutation. Immunol Lett. 2001, 77: 3-6.View ArticlePubMedGoogle Scholar
- Liu Z, Xiao Y, Chen Y-H: Epitope-vaccine strategy against HIV-1: today and tomorrow. Immunobiology. 2003, 208 (4): 423-428.View ArticlePubMedGoogle Scholar
- Newman M, Livingston B, McKinney D, Chesnut R, Sette A: The multi-epitope approach to development of HIV vaccines. AIDS Vaccine 2001: 5–8 September 2001; Philadelphia. 2001, Abstract No: 305.Google Scholar
- Cano CA: The multi-epitope polypeptide approach in HIV-1 vaccine development. Genet Anal. 1999, 15: 149-153.View ArticlePubMedGoogle Scholar
- Sette A, Livingston B, McKinney D, Appella E, Fikes J, Sidney J, Newman M, Chesnut R: The development of Multi-epitope vacines: Epitope identification, Vaccine design and Clinical evaluation. Biologicals. 2001, 29: 271-276.View ArticlePubMedGoogle Scholar
- Livingston B, Crimi C, Newman M, Higashimoto Y, Appella E, Sidney J, Sette A: A Rational Strategy to Design Multiepitope Immunogens Based on Multiple Th Lymphocyte Epitopes. J Immunol. 2002, 168 (11): 5499-5506.View ArticlePubMedGoogle Scholar
- Wilson CC, Palmer B, Southwood S, Sidney J, Higashimoto Y, Appella E, Chesnut R, Sette A, Livingston BD: Identification and antigenicity of broadly cross-reactive and conserved human immunodeficiency virus type 1-derived helper T-lymphocyte epitopes. J Virol. 2001, 75 (9): 4195-4207.PubMed CentralView ArticlePubMedGoogle Scholar
- Davies MN, Flower DR: Harnessing bioinformatics to discover new vaccines. Drug Discov Today. 2007, 12: 389-395.View ArticlePubMedGoogle Scholar
- Martin W, Sbai H, De Groot AS: Bioinformatics tools for identifying class I-restricted epitopes. Methods. 2003, 29: 289-298.View ArticlePubMedGoogle Scholar
- Flower DR: Towards in silico prediction of immunogenic epitopes. Trends Immunol. 2003, 24: 667-674.View ArticlePubMedGoogle Scholar
- Rammensee H, Bachmann J, Emmerich NP, Bachor OA, Stevanovic S: SYFPEITHI: database for MHC ligands and peptide motifs. Immunogenetics. 1999, 50: 213-219.View ArticlePubMedGoogle Scholar
- Doytchinova IA, Walshe VA, Jones NA, Gloster SE, Borrow P, Flower DR: Coupling in silico and in vitro analysis of peptide-MHC binding: a bioinformatic approach enabling prediction of superbinding peptides and anchorless epitopes. J Immunol. 2004, 172: 7495-7502.View ArticlePubMedGoogle Scholar
- De Groot AS, Bosma A, Chinai N, Frost J, Jesdale BM, Gonzalez MA, Martin W, Saint-Aubin C: From genome to vaccine: in silico predictions, ex vivo verification. Vaccine. 2001, 19: 4385-4395.View ArticlePubMedGoogle Scholar
- Frahm N, Linde C, Brander C: Identification of HIV-Derived, HLA Class I Restricted CTL Epitopes: Insights into TCR Repertoire, CTL Escape and Viral Fitness. HIV Molecular Immunology 2006/2007. Edited by: Korber BT, Brander C, Haynes BF, Koup R, Moore JP, Walker BD, Watkins DI. 2007, Published by Los Alamos National Laboratory, Theoretical Biology and Biophysics, Los Alamos, New Mexico LA-UR 07-4752, 3-28. [http://www.hiv.lanl.gov/content/immunology/pdf/2006_07/optimal_ctl_article.pdf]Google Scholar
- Brusic V, Rudy G, Harrison LC: Prediction of MHC binding peptides using artificial neural networks. Complex Systems: Mechanism of Adaptation. Edited by: Stonier RJ, Yu XS. 1994, Amsterdam: IOS Press, 253-260.Google Scholar
- Udaka K, Mamitsuka H, Nakaseko Y, Abe N: Prediction of MHC class I binding peptides by a query learning algorithm based on hidden Markov models. J Biol Phys. 2002, 28: 183-194. 10.1023/A:1019931731519.PubMed CentralView ArticlePubMedGoogle Scholar
- Flower DR, Doytchinova IA, Paine K, Taylor P, Blythe MJ, Lamponi D, Zygouri C, Guan P, McSparron H, Kirkbride H: Computational vaccine design. Drug Design: Cutting Edge Approaches. Edited by: Flower DR. 2002, Cambridge: Royal Society of Chemistry, 136-180.View ArticleGoogle Scholar
- Andersen MH, Tan L, Sondergaard I, Zeuthen J, Elliott T, Haurum JS: Poor correspondence between predicted and experimental binding of peptides to class I MHC molecules. Tissue Antigens. 2000, 55 (6): 519-539.View ArticlePubMedGoogle Scholar
- Zaki MJ, Parthasarathy S, Ogihara M, Li W: New Algorithms for Fast Discovery of Association Rules. Proceedings of the third International Conference on Knowledge Discovery and Data Mining: 14–17 August 1997; Newport Beach. Edited by: Heckerman D, Mannila H, Pregibon D, Uthurusamy R. 1997, Menlo Park: AAAI Press, 283-286. [http://www.aaai.org/Papers/KDD/1997/KDD97-060.pdf]Google Scholar
- Megiddo N, Shrikant R: Discovering Predictive Association Rules. Proceedings of the fourth International Conference on Knowledge Discovery and Data Mining: 27–31 August 1998; New York. Edited by: Agrawal R, Stolorz P, Piatetsky-Shapiro G. 1998, Menlo Park: AAAI Press, 274-278. [https://www.aaai.org/Papers/KDD/1998/KDD98-048.pdf]Google Scholar
- Oyama T, Kitano K, Satou K, Ito T: Extraction of knowledge on protein-protein interaction by association rule discovery. Bioinformatics. 2002, 18 (5): 705-714.View ArticlePubMedGoogle Scholar
- HIV Sequence Database by Los Alamos National Laboratory. [http://www.hiv.lanl.gov/content/sequence/NEWALIGN/align.html]
- Leitner T, Korber B, Daniels M, Calef C, Foley B: HIV-1 Subtype and Circulating Recombinant Form (CRF) Reference Sequences, 2005. HIV Sequence Compendium-2005. Edited by: Leitner T, Foley B, Hahn B, Marx P, McCutchan F, Mellors J, Wolinsky S, Korber B. 2005, Published by Theoretical Biology and Biophysics Group, Los Alamos National Laboratory, Los Alamos, NM LA-UR 06-0680, 41-48. [http://www.hiv.lanl.gov/content/sequence/HIV/COMPENDIUM/2005/partI/leitner.pdf]Google Scholar
- Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG: The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 1997, 25 (24): 4876-4882.PubMed CentralView ArticlePubMedGoogle Scholar
- HIV Sequence Database by Los Alamos National Laboratory. [http://www.hiv.lanl.gov/content/sequence/HIV/CRFs/breakpoints.html]
- Korber B, Foley BT, Kuiken C, Pillai SK, Sodroski JG: Numbering positions in HIV relative to HXB2CG. Human retroviruses and AIDS. Edited by: Korber B, Kuiken CL, Foley B, Hahn B, McCutchan F, Mellors JW, Sodroski J. 1998, Published by Theoretical Biology and Biophysics Group, Los Alamos National Laboratory, Los Alamos, NM, 102-111. [http://hcv.lanl.gov/content/sequence/HIV/COMPENDIUM/1998/III/HXB2.pdf]Google Scholar
- Nei M, Kumar S: Molecular Evolution and Phylogenetics. 2000, New York: Oxford University PressGoogle Scholar
- Witten IH, Frank E: Data Mining: Practical Machine Learning Tools and Techniques. (Morgan Kaufmann Series in Data Management Systems). 2005, Morgan Kaufmann, 2Google Scholar
- Agrawal R, Imielinski T, Swami A: Mining association rules between sets of items in large databases. Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data: 26–28 May 1993; Washington, D.C. Edited by: Buneman P, Jajodia S. 1993, New York: ACM Press, 207-216. [http://eprints.kfupm.edu.sa/50864/1/50864.pdf]View ArticleGoogle Scholar
- Chen M-C, Wu H-P: An association-based clustering approach to order batching considering customer demand patterns. Omega. 2005, 33: 333-343. 10.1016/j.omega.2004.05.003.View ArticleGoogle Scholar
- Tamura M, D'Haeseleer P: Microbial genotype-phenotype mapping by class association rule mining. Bioinformatics. 2008, 24: 1523-1529.PubMed CentralView ArticlePubMedGoogle Scholar
- Srisawat A, Kijsirikul B: Using Associative Classification for Predicting HIV-1 Drug Resistance. Proceedings of the Fourth International Conference on Hybrid Intelligent Systems: 5–8 December 2004; Kitakyushu, Japan. 2004, 280-284. 10.1109/ICHIS.2004.92.View ArticleGoogle Scholar
- Yardimci GG, Kucukural A, Saygin Y, Sezerman U: Modified Association Rule Mining Approach for the MHC-Peptide Binding Problem. Computer and Information Sciences-ISCIS 2006 (Lecture Notes in Computer Science book series). Proceedings of the 21st International Symposium: 1–3 November 2006; Istanbul, Turkey. 2006, 165-173. 10.1007/11902140.View ArticleGoogle Scholar
- Frank E, Hall M, Trigg L, Holmes G, Witten IH: Data mining in bioinformatics using Weka. Bioinformatics. 2004, 20: 2479-2481.View ArticlePubMedGoogle Scholar
- Gewehr JE, Szugat M, Zimmer R: BioWeka – extending the Weka framework for bioinformatics. Bioinformatics. 2007, 23: 651-653.View ArticlePubMedGoogle Scholar
- Nei M, Gojobori T: Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol Biol Evol. 1986, 3: 418-426.PubMedGoogle Scholar
- Tamura K, Dudley J, Nei M, Kumar S: MEGA4 Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Mol Biol Evol. 2007, 24: 1596-1599.View ArticlePubMedGoogle Scholar
- Zhuang J, Jetzt AE, Sun G, Yu H, Klarmann G, Ron Y, Preston BD, Dougherty JP: Human Immunodeficiency Virus Type 1 Recombination: Rate, Fidelity, and Putative Hot Spots. J Virol. 2002, 76 (22): 11273-11282.PubMed CentralView ArticlePubMedGoogle Scholar
- Sidney J, Peters B, Frahm N, Brander C, Sette A: HLA class I supertypes: a revised and updated classification. BMC Immunol. 2008, 9: 1-PubMed CentralView ArticlePubMedGoogle Scholar
- Thomson MM, Perez-Alvarez L, Najera R: Molecular epidemiology of HIV-1 genetic forms and its significance for vaccine development and therapy. Lancet Infect Dis. 2002, 2 (8): 461-471.View ArticlePubMedGoogle Scholar
- Hemelaar J, Gouws E, Ghys PD, Osmanov S: Global and regional distribution of HIV-1 genetic subtypes and recombinants in 2004. AIDS. 2006, 20 (16): W13-23.View ArticlePubMedGoogle Scholar
- Van Heuverswyn F, Li Y, Neel C, Bailes E, Keele BF, Liu W, Loul S, Butel C, Liegeois F, Bienvenue Y, Ngolle EM, Sharp PM, Shaw GM, Delaporte E, Hahn BH, Peeters M: Human immunodeficiency viruses: SIV infection in wild gorillas. Nature. 2006, 444 (7116): 164-View ArticlePubMedGoogle Scholar
- Keele BF, Van Heuverswyn F, Li Y, Bailes E, Takehisa J, Santiago ML, Bibollet-Ruche F, Chen Y, Wain LV, Liegeois F, Loul S, Ngole EM, Bienvenue Y, Delaporte E, Brookfield JF, Sharp PM, Shaw GM, Peeters M, Hahn BH: Chimpanzee reservoirs of pandemic and nonpandemic HIV-1. Science. 2006, 313 (5786): 523-526.PubMed CentralView ArticlePubMedGoogle Scholar
- Altfeld MA, Livingston B, Reshamwala N, Nguyen PT, Addo MM, Shea A, Newman M, fikes J, Sidney J, Wentworth P, Chesnut R, Eldridge RL, Rosenberg ES, Robbins GK, Brander C, Sax PE, Boswell S, Theresa Flynn T, Buchbinder S, Goulder PJR, Walker BD, Sette A, Kalams SA: Identification of novel HLA-A2-restricted Human Immunodeficiency Virus Type 1-specific Cytotoxic T-Lymphocyte epitopes predicted by the HLA-A2 supertype peptide-binding. J Virol. 2001, 75 (3): 1301-1311.PubMed CentralView ArticlePubMedGoogle Scholar
- Meyer-Olson D, Brady KW, Bartman MT, O'Sullivan KM, Simons BC, Conrad JA, Duncan CR, Lorey S, Siddique A, Draenert R, Addo M, Altfeld M, Rosenberg E, Allen TM, Walker BD, Kalams SA: Fluctuations of functionally distinct CD8+ T-cell clonotypes demonstrate flexibility of the HIV-specific TCR repertoire. Blood. 2006, 107: 2373-2383.PubMed CentralView ArticlePubMedGoogle Scholar
- Dong T, Stewart-Jones G, Chen N, Easterbrook P, Xu X, Papagno L, Appay V, Weekes M, Conlon C, Spina C, Little S, Screaton G, Merwe Van der A, Richman DD, McMichael AJ, Jones EY, Rowland-Jones SL: HIV-specific cytotoxic T cells from long-term survivors select a unique T cell receptor. J Exp Med. 2004, 200: 1547-1557.PubMed CentralView ArticlePubMedGoogle Scholar
- Frahm N, Yusim K, Suscovich TJ, Adams S, Sidney J, Hraber P, Hewitt HS, Linde CH, Kavanagh DG, Woodberry T, Henry LM, Faircloth K, Listgarten J, Kadie C, Jojic N, Sango K, Brown NV, Pae E, Zaman MT, Bihl F, Khatri A, John M, Mallal S, Marincola FM, Walker BD, Sette A, Heckerman D, Korber BT, Brander C: Extensive HLA class I allele promiscuity among viral CTL epitopes. Eur J Immunol. 2007, 37 (9): 2419-2433.PubMed CentralView ArticlePubMedGoogle Scholar
- Carrington M, Nelson GW, Martin MP, Kissner T, Vlahov D, Goedert JJ, Kaslow R, Buchbinder S, Hoots K, O'Brien SJ: HLA and HIV-1: heterozygote advantage and B*35-Cw*04 disadvantage. Science. 1999, 283 (5408): 1748-1752.View ArticlePubMedGoogle Scholar
- Dubay JW, Roberts SJ, Brody B, Hunter E: Mutations in the leucine zipper of the human immunodeficiency virus type 1 transmembrane glycoprotein affect fusion and infectivity. J Virol. 1992, 66: 4748-4756.PubMed CentralPubMedGoogle Scholar
- Doherty SR, Oliveira TD, Seebregts C, Danaviah S, Gordon M, Cassol S: BioAfrica's HIV-1 Proteomics Resource: Combining protein data with bioinformatics tools. Retrovirology. 2005, 2 (1): 18-PubMed CentralView ArticlePubMedGoogle Scholar
- Loregian A, Marsden HS, Palu G: Protein-protein interactions as targets for antiviral chemotherapy. Rev Med Virol. 2002, 12 (4): 239-262.View ArticlePubMedGoogle Scholar
- Jacks T, Power MD, Masiarz FR, Luciw PA, Barr PJ, Varmus HE: Characterization of ribosomal frameshifting in HIV-1 gag-pol expression. Nature. 1988, 331: 280-283.View ArticlePubMedGoogle Scholar
- Zybarth G, Carter C: Domains Upstream of the Protease (PR) in Human Immunodeficiency Virus Type 1 Gag-Pol Influence PR Autoprocessing. J Virol. 1995, 69 (6): 3878-3884.PubMed CentralPubMedGoogle Scholar
- Figueiredo A, Moore KL, Mak J, Sluis-Cremer N, de Bethune M-P, Tachedjian G: Potent Nonnucleoside Reverse Transcriptase Inhibitors Target HIV-1 Gag-Pol. PLoS Pathog. 2006, 2 (11): e119-PubMed CentralView ArticlePubMedGoogle Scholar
- Costa LJ, Zheng YH, Sabotic J, Mak J, Fackler OT, Peterlin BM: Nef binds p6* in GagPol during replication of human immunodeficiency virus type 1. J Virol. 2004, 78 (10): 5311-5323.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.