BioAfrica's HIV-1 Proteomics Resource: Combining protein data with bioinformatics tools
© Doherty et al; licensee BioMed Central Ltd. 2005
Received: 30 September 2004
Accepted: 09 March 2005
Published: 09 March 2005
Most Internet online resources for investigating HIV biology contain either bioinformatics tools, protein information or sequence data. The objective of this study was to develop a comprehensive online proteomics resource that integrates bioinformatics with the latest information on HIV-1 protein structure, gene expression, post-transcriptional/post-translational modification, functional activity, and protein-macromolecule interactions. The BioAfrica HIV-1 Proteomics Resource http://bioafrica.mrc.ac.za/proteomics/index.html is a website that contains detailed information about the HIV-1 proteome and protease cleavage sites, as well as data-mining tools that can be used to manipulate and query protein sequence data, a BLAST tool for initiating structural analyses of HIV-1 proteins, and a proteomics tools directory. The Proteome section contains extensive data on each of 19 HIV-1 proteins, including their functional properties, a sample analysis of HIV-1HXB2, structural models and links to other online resources. The HIV-1 Protease Cleavage Sites section provides information on the position, subtype variation and genetic evolution of Gag, Gag-Pol and Nef cleavage sites. The HIV-1 Protein Data-mining Tool includes a set of 27 group M (subtypes A through K) reference sequences that can be used to assess the influence of genetic variation on immunological and functional domains of the protein. The BLAST Structure Tool identifies proteins with similar, experimentally determined topologies, and the Tools Directory provides a categorized list of websites and relevant software programs. This combined database and software repository is designed to facilitate the capture, retrieval and analysis of HIV-1 protein data, and to convert it into clinically useful information relating to the pathogenesis, transmission and therapeutic response of different HIV-1 variants. The HIV-1 Proteomics Resource is readily accessible through the BioAfrica website at: http://bioafrica.mrc.ac.za/proteomics/index.html
Although the HIV-1 genome contains only 9 genes, it is capable of generating more than 19 gene products. These products can be divided into three major categories: structural and enzymatic (Gag, Pol, Env); immediate-early regulatory (Tat, Rev and Nef), and late regulatory (Vif, Vpu, Vpr) proteins. Tat, Rev and Nef are synthesized from small multiply-spliced mRNAs; Env, Vif, Vpu and Vpr are generated from singly-spliced mRNAs, the Gag and Gag-Pol precursor polyproteins are synthesized from full-length mRNA. The matrix (p17), capsid (p24) and nucleocapsid (p7) proteins are produced by protease cleavage of Gag and Gag-Pol, a fusion protein derived by ribosomal frame-shifting. Cleavage of Nef generates two different protein isoforms; one myristylated, the other non-myristylated. The viral enzymes (protease, reverse transcriptase, RNase H and integrase) are formed by protease cleavage of Gag-Pol. Alternative splicing, together with co-translational and post-translational modification, leads to additional protein variability .
The proteome link
Protease cleavage sites link
Post-translational cleavage of the Gag, Gag-Pol and Nef precursor proteins occurs at the cell membrane during virion packaging, and is essential to the production of infectious viral particles. Drugs that inhibit this process, the protease inhibitors (PIs), are the most potent antiretroviral agents currently available. Thus it is important to collect information, not only on the sequence of protease enzymes from different HIV-1 subtypes, but also on the natural polymorphisms and resistance mutations that may effect their catalytic activities, drug responsiveness, substrate specificities, and cleavage site characteristics. Studies have shown that resistance mutations in the protease of subtype B are associated with impaired proteolytic processing and decreased enzymatic activity, and that compensatory mutations at Gag and Gag-Pol cleavage sites can partially overcome these defects . These findings suggest that variation at protease cleavage sites may play an important role, not only in regulation of the viral life cycle, but also in disease progression and response to therapy.
The cleavage site section of the BioAfrica webpage is the direct extension of a recent publication in the Journal of Virology describing the location and variability of protease cleavage sites  (Figure 5). Together, these two resources provide information on the structure, amino acid composition, genetic variation and evolutionary history of protease cleavage sites, and on the natural selection pressures exerted on these sites. The section also serves as a baseline for understanding the impact of natural polymorphisms and resistance mutations on the catalytic efficiency of the protease enzyme, and on its ability to recognize and cleave individual Gag, Gag-Pol and Nef substrates. Such studies are important for understanding the mechanisms underlying the emergence of PI-induced drug resistance, and for designing alternative, optimized therapies.
Protein data-mining tools link
The HIV-1 Protein Data-Mining Tool contains twelve sequence analysis techniques for assessing protein variability among different strains of HIV-1 (Figure 6). These tools allow the user to manipulate, analyze and compare published [9–12] and newly-acquired data in a user-friendly, hands-on manner. The analysis is initiated by selecting a particular subset of HIV-1 proteins, either from the user's database, or from the representative dataset of group M viruses (subtypes A through K). Using this dataset, the investigator can then perform a variety of protein-specific analyses. With a single click of the mouse, users can download the amino acid sequence in fasta format; obtain sequence annotations from SwissProt  or GenBank ; identify functional motifs using BLOCKS , PROSITE  or ProDom ; perform similarity searches using the BLAST program available at Genbank , conduct structural comparisons using the BioAfrica BLAST Structure program; determine amino acid composition, predict hydrophobicity and tertiary structure using the Swiss-Model homology modelling server , and obtain a list of potential protein-macromolecule interactions from the Database of Interacting Proteins (DIP) . A representative analysis of HIV-1 Tat is shown in Additional file 1. The selected dataset, consisting of eight reference strains – four subtype B (HXB2-1983-France, RF-1983-US, JRFL-1986-US, WEAU160-1990-US) and four subtype C (92BR025-1992-Brazil, 96BW0502-1996-Botswana, TV002c12-1998-SouthAfrica, TV001c8.5-1998-SouthAfrica) isolates – were analyzed using PROSITE . As shown in Additional file 1, all eight isolates had identical amidation, cysteine-rich and myristylation motifs at amino acid codons 47–50, 22–37 and 44–49, respectively. Three (75%) of the B isolates contained a second myristylation site at codons 42–47, as did three (75%) subtype C viruses. One (25%) of the C viruses carried an extra GNptGS myristylation motif at position 79–84. In addition, all four (100%) C isolates contained a novel myristylation motif, GSeeSK, at amino acid position 83–88, that was not present in four B viruses selected for study. However, the most striking difference between the two subtypes was the increased number of phosphorylation motifs in subtype C relative to B viruses. This increase, which occurs in cAMP/cGMP-dependent kinase, protein kinase C (PKC) and casein kinase II (CKII) phosphorylation sites, has been reported previously , but the significance of these findings remain to be established. The analysis also highlighted the atypical nature of the HIV-1HXB2 isolate, which, in addition to a premature stop codon, contained no cAMP/cGMP, PKC or CKII phosphorylation sites.
The blast structure tool link
The HIV-1 BLAST Structure Tool facilitates the analysis of HIV-1 protein structure by allowing for rapid retrieval of archived structural data stored in the public databases (Figure 7). Users may input any HIV-1 amino acid sequence and obtain a list of similar HIV protein sequences for which structural data have been experimentally determined and deposited into the Protein Data Bank (PDB) . After downloading the data from the PDB, subsequent structural analyses can be performed using the software programs and web-servers listed in the Proteomics Tools Directory. For example, a query using an amino acid sequence of HIV-1 Integrase protein from NCBI (gi|15553624|gb|AAL01959.1) results in a list of 54 structural models (ie. PDB_ID|1K6Y) within the PDB. Each of these structural models can be retrieved from the PDB, and the most appropriate structural model could be used for generating a homology model using the query protein sequence.
The proteomics tools directory link
The HIV-1 Proteomics Tools Directory is divided into two web pages. The initial webpage is a concise compilation of some of the most commonly used protein-specific Internet resources (Figure 8). This "beginners" page displays a short list of websites for each of the following twelve categories: "protein databases", "specialized viral-protein databases", "motif and transcription factor databases", "protein sequence similarity searches", "protein sequence alignment", "protein sequence prediction tools", "protein sequence analysis", "protein sequence manipulation", "protein structure analysis", "molecular modelling tools", "tutorials", and "downloads". In addition, the Proteomics Tools Directory has an advanced web page for users who are looking for alternative, or more specialized, protein analysis tools (Figure 9). The advanced webpage displays a list of more than 200 links to different websites and web-servers. These data sources contain a variety of information ranging from specialized protein sequence databases to software programs capable of performing rigid body protein-protein molecular docking simulations.
The impending rollout of antiretroviral therapy to millions of HIV-1-infected people in sub-Saharan Africa provides a unique opportunity to monitor the efficacy of non-B treatment programs from their very inception, and to obtain critical new information for the optimization of treatment strategies that are safe, affordable and appropriate for the developing world. An integral part of this massive humanitarian effort will be the collection of large amounts of clinical and laboratory data, including genetic information on viral subtype and resistance mutations, as well as routine CD4+ T-cell counts and viral load measurements. The mere collection of this data, however, does not ensure that it will be used to its maximum potential. To achieve full benefit from this explosive source of new information, the data will need to be appropriately collated, stored, analyzed and interpreted.
The rapidly emerging field of Bioinformatics has the capacity to greatly enhance treatment (and vaccine) efforts by serving as a bridge between Medical Informatics and Experimental Science. By correlating genetic variation and potential changes in protein structure with clinical risk factors, disease presentation, and differential response to treatment and vaccine candidates, it may be possible to obtain valuable new insights that can be used to support and guide rationale decision-making, both at the clinical and public health levels. The HIV-1 Proteomics Resource, described in this report, is an initial first step in the development of improved methods for extracting and analyzing genomics data, converting it into biologically useful information related to the structure, function and physiology of HIV-1 proteins, and for assessing the role these proteins play in disease progression and response to therapy. The Resource, developed at the Molecular Virology and Bioinformatics Unit of the Africa Centre of Health and Population Studies, is a centralized user-friendly database that is easily accessed through the BioAfrica website at http://bioafrica.mrc.ac.za/proteomics.
List of abbreviations used
Basic Local Alignment Search Tool
casein kinase II
Database of Interacting Proteins
group-specific antigen polyprotein
Graphics Interchange Format
Human Immunodeficiency Virus
Human Immunodeficiency Virus Type-1
Hypertext Transfer Protocol
National Center for Biotechnology Information
Protein Data Bank
protein kinase C
ART/TRS anti-repression transactivator protein
- RNase H:
transactivating regulatory protein
virion infectivity factor
viral protein R
viral protein U
Development of the Bioafrica HIV-1 Proteomics Resource was supported by a program grant from the Wellcome Trust U.K. (#061238). The website is hosted by the South African Medical Research Council (MRC).
- Freed EO: HIV-1 replication. Somat Cell Mol Genet. 2001, 26: 13-33. 10.1023/A:1021070512287.View ArticlePubMedGoogle Scholar
- Apweiler R, Bairoch A, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin MJ, Natale DA, O'Donovan C, Redaschi N, Yeh LSL: UniProt: The Universal Protein knowledgebase. Nucleic Acids Res. 2004, 32: D115-119. 10.1093/nar/gkh131.PubMed CentralView ArticlePubMedGoogle Scholar
- Kuiken C, Korber B, Shafer RW: HIV sequence databases. AIDS Rev. 2003, 5: 52-61.PubMed CentralPubMedGoogle Scholar
- Ratner L, Haseltine W, Patarca R, Livak KJ, Starcich B, Josephs SF, Doran ER, Rafalski JA, Whitehorn EA, Baumeister K: Complete nucleotide sequence of the AIDS virus, HTLV-III. Nature. 1985, 313: 277-284. 10.1038/313277a0.View ArticlePubMedGoogle Scholar
- Kao SY, Calman AF, Luciw PA, Peterlin BM: Anti-termination of transcription within the long terminal repeat of HIV-1 by tat gene product. Nature. 1987, 330: 489-493. 10.1038/330489a0.View ArticlePubMedGoogle Scholar
- Feinberg MB, Baltimore D, Frankel AD: The role of Tat in the human immunodeficiency virus life cycle indicates a primary effect on transcriptional elongation. Proc Natl Acad Sci USA. 1991, 88: 4045-4049.PubMed CentralView ArticlePubMedGoogle Scholar
- Cullen BR: Human Immunodeficiency Virus as a Prototypic Complex Retrovirus. J Virol. 1991, 65: 1053-1056.PubMed CentralPubMedGoogle Scholar
- Mammano F, Petit C, Clavel F: Resistance-associated loss of viral fitness in human immunodeficiency virus type 1: phenotypic analysis of protease and gag coevolution in protease inhibitor-treated patients. J Virol. 1998, 72: 7632-7637.PubMed CentralPubMedGoogle Scholar
- de Oliveira T, Engelbrecht S, van Rensburg EJ, Gordon M, Bishop K, zur Megede J, Barnett SW, Cassol S: Variability at Human Immunodeficiency Virus Type 1 Subtype C Protease Cleavage Sites: an Indication of Viral Fitness?. J Virol. 2003, 77: 9422-9430. 10.1128/JVI.77.17.9422-9430.2003.PubMed CentralView ArticlePubMedGoogle Scholar
- zur Megede J, Engelbrecht S, de Oliveira T, Cassol S, Scriba TJ, van Rensburg EJ, Barnett SW: Novel evolutionary analyses of full-length HIV type 1 subtype C molecular clones from Cape Town, South Africa. AIDS Res Hum Retroviruses. 2002, 18: 1327-1332. 10.1089/088922202320886370.View ArticlePubMedGoogle Scholar
- Morgado MG, Guimaraes ML, Galvao-Castro B: HIV-1 polymorphism: a challenge for vaccine development – a review. Mem Inst Oswaldo Cruz. 2002, 97: 143-150.View ArticlePubMedGoogle Scholar
- Burns CC, Gleason LM, Mozaffarian A, Giachetti C, Carr JK, Overbaugh J: Sequence variability of the integrase protein from a diverse collection of HIV type 1 isolates representing several subtypes. AIDS Res Hum Retroviruses. 2002, 18: 1031-1041. 10.1089/08892220260235399.View ArticlePubMedGoogle Scholar
- Bairoch A, Apweiler R: The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 2000, 28: 45-48. 10.1093/nar/28.1.45.PubMed CentralView ArticlePubMedGoogle Scholar
- Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL: GenBank. Nucleic Acids Res. 2005, 33 (Database Issue): D34-38. 10.1093/nar/gki063.PubMed CentralView ArticlePubMedGoogle Scholar
- Henikoff JG, Greene EA, Pietrokovski S, Henikoff S: Increased coverage of protein families with the BLOCKS database servers. Nucleic Acids Res. 2000, 28: 228-230. 10.1093/nar/28.1.228.PubMed CentralView ArticlePubMedGoogle Scholar
- Hulo N, Sigrist CJA, Saux VL, Langendijk-Genevaux PS, Bordoli L, Gattiker A, de Castro E, Bucher P, Bairoch A: Recent improvements to the PROSITE database. Nucleic Acids Res. 2004, 32 (Database Issue): 134-137. 10.1093/nar/gkh044.View ArticleGoogle Scholar
- Servant F, Bru C, Carrere S, Courcelle E, Gouzy J, Peyruc D, Kahn D: ProDom: Automated clustering of homologous domains. Brief Bioinform. 2002, 3: 246-251.View ArticlePubMedGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215: 403-410. 10.1006/jmbi.1990.9999.View ArticlePubMedGoogle Scholar
- Schwede T, Kopp J, Guex N, Peitsch MC: SWISS-MODEL: an automated protein homology-modeling server. Nucleic Acids Res. 2003, 31: 3381-3385. 10.1093/nar/gkg520.PubMed CentralView ArticlePubMedGoogle Scholar
- Salwinski L, Miller CS, Smith AJ, Pettit FK, Bowie JU, Eisenberg D: The Database of Interacting Proteins: 2004 update. Nucleic Acids Res. 2004, 32 (Database Issue): 449-451. 10.1093/nar/gkh086.View ArticleGoogle Scholar
- de Oliveira T, Salemi M, Gordon M, Vandamme AM, van Rensburg EJ, Engelbrecht S, Coovadia HM, Cassol S: Mapping Sites of Positive Selection and Amino Acid Diversification in the HIV Genome: An Alternative Approach to Vaccine Design?. Genetics. 2004, 167: 1047-1058. 10.1534/genetics.103.018135.PubMed CentralView ArticlePubMedGoogle Scholar
- Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The Protein Data Bank. Nucleic Acids Res. 2000, 28: 235-242. 10.1093/nar/28.1.235.PubMed CentralView ArticlePubMedGoogle Scholar
- De Oliveira T, Doherty RS, Seebregts C, Monosi B, Gordon M, Cassol S: The BioAfrica Website: An Integrated Bioinformatics Website for Studying the Explosive HIV-1 Subtype C Epidemic in Africa. Digital Biology: The Emerging Paradigm Conference, NIH: 6 – 7. 2003, Maryland, USA, NovemberGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.