Skip to content

Advertisement

  • Review
  • Open Access

BioAfrica's HIV-1 Proteomics Resource: Combining protein data with bioinformatics tools

  • 1Email author,
  • 1,
  • 2,
  • 1,
  • 1 and
  • 1, 3
Retrovirology20052:18

https://doi.org/10.1186/1742-4690-2-18

  • Received: 30 September 2004
  • Accepted: 09 March 2005
  • Published:

Abstract

Most Internet online resources for investigating HIV biology contain either bioinformatics tools, protein information or sequence data. The objective of this study was to develop a comprehensive online proteomics resource that integrates bioinformatics with the latest information on HIV-1 protein structure, gene expression, post-transcriptional/post-translational modification, functional activity, and protein-macromolecule interactions. The BioAfrica HIV-1 Proteomics Resource http://bioafrica.mrc.ac.za/proteomics/index.html is a website that contains detailed information about the HIV-1 proteome and protease cleavage sites, as well as data-mining tools that can be used to manipulate and query protein sequence data, a BLAST tool for initiating structural analyses of HIV-1 proteins, and a proteomics tools directory. The Proteome section contains extensive data on each of 19 HIV-1 proteins, including their functional properties, a sample analysis of HIV-1HXB2, structural models and links to other online resources. The HIV-1 Protease Cleavage Sites section provides information on the position, subtype variation and genetic evolution of Gag, Gag-Pol and Nef cleavage sites. The HIV-1 Protein Data-mining Tool includes a set of 27 group M (subtypes A through K) reference sequences that can be used to assess the influence of genetic variation on immunological and functional domains of the protein. The BLAST Structure Tool identifies proteins with similar, experimentally determined topologies, and the Tools Directory provides a categorized list of websites and relevant software programs. This combined database and software repository is designed to facilitate the capture, retrieval and analysis of HIV-1 protein data, and to convert it into clinically useful information relating to the pathogenesis, transmission and therapeutic response of different HIV-1 variants. The HIV-1 Proteomics Resource is readily accessible through the BioAfrica website at: http://bioafrica.mrc.ac.za/proteomics/index.html

Keywords

  • Protein Data Bank
  • Protease Cleavage Site
  • Natural Polymorphism
  • Proteomics Tool
  • Protein Structure Analysis

Background

Although the HIV-1 genome contains only 9 genes, it is capable of generating more than 19 gene products. These products can be divided into three major categories: structural and enzymatic (Gag, Pol, Env); immediate-early regulatory (Tat, Rev and Nef), and late regulatory (Vif, Vpu, Vpr) proteins. Tat, Rev and Nef are synthesized from small multiply-spliced mRNAs; Env, Vif, Vpu and Vpr are generated from singly-spliced mRNAs, the Gag and Gag-Pol precursor polyproteins are synthesized from full-length mRNA. The matrix (p17), capsid (p24) and nucleocapsid (p7) proteins are produced by protease cleavage of Gag and Gag-Pol, a fusion protein derived by ribosomal frame-shifting. Cleavage of Nef generates two different protein isoforms; one myristylated, the other non-myristylated. The viral enzymes (protease, reverse transcriptase, RNase H and integrase) are formed by protease cleavage of Gag-Pol. Alternative splicing, together with co-translational and post-translational modification, leads to additional protein variability [1].

Phylogenetic analysis, on its own, provides little information about the conformational, immunological and functional properties of HIV-1 proteins, but instead, focuses on the evolution and historical significance of sequence variants. To understand the clinical significance of genetic variation, sequence analysis needs to be combined with methods that assess change in the structural and biological properties of HIV-1 proteins. At present, information and tools for the systematic analysis of HIV-1 proteins are limited, and are scattered across a wide-range of online resources [2, 3]. To facilitate studies of the biological consequences of genetic variation, we have developed an integrated, user-friendly proteomics resource that integrates common approaches to HIV-1 protein analysis (Figure 1). We are currently using this resource to better understand the structure-function relationships underlying the emergence of antiretroviral drug resistance, and to examine the process of immune escape from cytotoxic T-lymphocytes (CTLs).
Figure 1
Figure 1

Site map of BioAfrica's HIV-1 Proteomics Resource, showing the separation of Beginner's and the Advanced area of the website, along with all major subject headings.

We have categorized the Proteomics Resource into the following main subject headings (Figure 2 &3):
Figure 2
Figure 2

Schematic representation of BioAfrica's HIV-1 Proteomics Resource, showing its five major components: the HIV-1 Proteome (General Overview, Domains/Folds/Motifs, Genomic Location, Protein-Macromolecule Interactions, Primary and Secondary Database Entries, and References and Recommended Readings), the HIV-1 Protease Cleavage Sites section, the HIV-1 Protein Data-mining Tool, the HIV-1 BLAST Structure Tool, and the Proteomics Tools Directory (for Beginners and Advanced investigators).

Figure 3
Figure 3

The central webpage of BioAfrica's HIV Proteomics Resource http://bioafrica.mrc.ac.za/proteomics/index.html

1. HIV Proteome– Information about structure and sequence, as well as references and tutorials, for each of the HIV-1 proteins (Figure 4);
Figure 4
Figure 4

The central webpage of the HIV-1 Proteome section of the BioAfrica website http://bioafrica.mrc.ac.za/proteomics/HIVproteome.html.

2. HIV-1 Cleavage Sites– Information about the position and sequence of HIV-1 Gag, Pol and Nef cleavage sites (Figure 5);
Figure 5
Figure 5

The HIV-1 Protease Cleavage Sites section of the BioAfrica website http://bioafrica.mrc.ac.za/proteomics/HIVcleavagesites.html.

3. HIV Protein Data Mining Tool– Application for detecting the characteristics of HIV-1 M group isolate (subtype A to K) proteins using information available in public databases and tools (Figure 6);
Figure 6
Figure 6

The central webpage of the HIV-1 Protein Data Mining Tool section of the BioAfrica website, where a specific HIV-1 genomic region is selected to be analyzed http://bioafrica.mrc.ac.za/proteomics/TOOLprot.html.

4. HIV Structure BLAST– Similarity search for analyzing HIV protein sequences with corresponding structural data (Figure 7);
Figure 7
Figure 7

The BLAST HIV-1 protein structure similarity search is an online tool that searches for all protein structure data within the PDB that have an amino acid sequence similar to the query sequence http://bioafrica.mrc.ac.za/blast/hivPDBblast.html.

5. Proteomics Online Tools– Directory of data resources and tools available for both protein sequence and protein structure analyses of HIV (Figure 8 &9).
Figure 8
Figure 8

The introductory listing of proteomics resources for HIV research chosen to give a general overview of online tools and databases relevant for the analysis of HIV protein data http://bioafrica.mrc.ac.za/proteomics/proteomicstools.html.

Figure 9
Figure 9

The advanced listing of online tools and databases relevant for the analysis of HIV protein data http://bioafrica.mrc.ac.za/proteomics/proteomics-advanced.html.

The proteome link

Protease cleavage sites link

Protein data-mining tools link

The blast structure tool link

The proteomics tools directory link

Conclusion

The impending rollout of antiretroviral therapy to millions of HIV-1-infected people in sub-Saharan Africa provides a unique opportunity to monitor the efficacy of non-B treatment programs from their very inception, and to obtain critical new information for the optimization of treatment strategies that are safe, affordable and appropriate for the developing world. An integral part of this massive humanitarian effort will be the collection of large amounts of clinical and laboratory data, including genetic information on viral subtype and resistance mutations, as well as routine CD4+ T-cell counts and viral load measurements. The mere collection of this data, however, does not ensure that it will be used to its maximum potential. To achieve full benefit from this explosive source of new information, the data will need to be appropriately collated, stored, analyzed and interpreted.

The rapidly emerging field of Bioinformatics has the capacity to greatly enhance treatment (and vaccine) efforts by serving as a bridge between Medical Informatics and Experimental Science. By correlating genetic variation and potential changes in protein structure with clinical risk factors, disease presentation, and differential response to treatment and vaccine candidates, it may be possible to obtain valuable new insights that can be used to support and guide rationale decision-making, both at the clinical and public health levels. The HIV-1 Proteomics Resource, described in this report, is an initial first step in the development of improved methods for extracting and analyzing genomics data, converting it into biologically useful information related to the structure, function and physiology of HIV-1 proteins, and for assessing the role these proteins play in disease progression and response to therapy. The Resource, developed at the Molecular Virology and Bioinformatics Unit of the Africa Centre of Health and Population Studies, is a centralized user-friendly database that is easily accessed through the BioAfrica website at http://bioafrica.mrc.ac.za/proteomics[23].

List of abbreviations used

AA: 

Amino Acid

BLAST: 

Basic Local Alignment Search Tool

CKII: 

casein kinase II

CTLs: 

cytotoxic T-lymphocytes

DIP: 

Database of Interacting Proteins

DNA: 

deoxyribonucleic acid

Env: 

envelope glycoprotein

Gag: 

group-specific antigen polyprotein

GIF: 

Graphics Interchange Format

HIV: 

Human Immunodeficiency Virus

HIV-1: 

Human Immunodeficiency Virus Type-1

HTTP: 

Hypertext Transfer Protocol

LTR: 

long-terminal repeat

mRNA: 

messenger RNA

NCBI: 

National Center for Biotechnology Information

Nef: 

negative factor

PDB: 

Protein Data Bank

pI: 

isoelectric point

PIs: 

protease inhibitors

PKC: 

protein kinase C

Pol: 

polymerase polyprotein

Rev: 

ART/TRS anti-repression transactivator protein

RNA: 

ribonucleic acid

RNase H: 

ribonuclease H

Tat: 

transactivating regulatory protein

Vif: 

virion infectivity factor

Vpr: 

viral protein R

Vpu: 

viral protein U

Declarations

Acknowledgements

Development of the Bioafrica HIV-1 Proteomics Resource was supported by a program grant from the Wellcome Trust U.K. (#061238). The website is hosted by the South African Medical Research Council (MRC).

Authors’ Affiliations

(1)
Molecular Virology and Bioinformatics Unit, Africa Centre for Health and Population Studies, Doris Duke Medical Research Institute, Nelson R. Mandela School of Medicine, University of KwaZulu-Natal, Durban, South Africa
(2)
Biomedical Informatics Research Division, South African Medical Research Council, Cape Town, South Africa
(3)
Department of Medical Virology, University of Pretoria, Pretoria, South Africa

References

  1. Freed EO: HIV-1 replication. Somat Cell Mol Genet. 2001, 26: 13-33. 10.1023/A:1021070512287.View ArticlePubMedGoogle Scholar
  2. Apweiler R, Bairoch A, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin MJ, Natale DA, O'Donovan C, Redaschi N, Yeh LSL: UniProt: The Universal Protein knowledgebase. Nucleic Acids Res. 2004, 32: D115-119. 10.1093/nar/gkh131.PubMed CentralView ArticlePubMedGoogle Scholar
  3. Kuiken C, Korber B, Shafer RW: HIV sequence databases. AIDS Rev. 2003, 5: 52-61.PubMed CentralPubMedGoogle Scholar
  4. Ratner L, Haseltine W, Patarca R, Livak KJ, Starcich B, Josephs SF, Doran ER, Rafalski JA, Whitehorn EA, Baumeister K: Complete nucleotide sequence of the AIDS virus, HTLV-III. Nature. 1985, 313: 277-284. 10.1038/313277a0.View ArticlePubMedGoogle Scholar
  5. Kao SY, Calman AF, Luciw PA, Peterlin BM: Anti-termination of transcription within the long terminal repeat of HIV-1 by tat gene product. Nature. 1987, 330: 489-493. 10.1038/330489a0.View ArticlePubMedGoogle Scholar
  6. Feinberg MB, Baltimore D, Frankel AD: The role of Tat in the human immunodeficiency virus life cycle indicates a primary effect on transcriptional elongation. Proc Natl Acad Sci USA. 1991, 88: 4045-4049.PubMed CentralView ArticlePubMedGoogle Scholar
  7. Cullen BR: Human Immunodeficiency Virus as a Prototypic Complex Retrovirus. J Virol. 1991, 65: 1053-1056.PubMed CentralPubMedGoogle Scholar
  8. Mammano F, Petit C, Clavel F: Resistance-associated loss of viral fitness in human immunodeficiency virus type 1: phenotypic analysis of protease and gag coevolution in protease inhibitor-treated patients. J Virol. 1998, 72: 7632-7637.PubMed CentralPubMedGoogle Scholar
  9. de Oliveira T, Engelbrecht S, van Rensburg EJ, Gordon M, Bishop K, zur Megede J, Barnett SW, Cassol S: Variability at Human Immunodeficiency Virus Type 1 Subtype C Protease Cleavage Sites: an Indication of Viral Fitness?. J Virol. 2003, 77: 9422-9430. 10.1128/JVI.77.17.9422-9430.2003.PubMed CentralView ArticlePubMedGoogle Scholar
  10. zur Megede J, Engelbrecht S, de Oliveira T, Cassol S, Scriba TJ, van Rensburg EJ, Barnett SW: Novel evolutionary analyses of full-length HIV type 1 subtype C molecular clones from Cape Town, South Africa. AIDS Res Hum Retroviruses. 2002, 18: 1327-1332. 10.1089/088922202320886370.View ArticlePubMedGoogle Scholar
  11. Morgado MG, Guimaraes ML, Galvao-Castro B: HIV-1 polymorphism: a challenge for vaccine development – a review. Mem Inst Oswaldo Cruz. 2002, 97: 143-150.View ArticlePubMedGoogle Scholar
  12. Burns CC, Gleason LM, Mozaffarian A, Giachetti C, Carr JK, Overbaugh J: Sequence variability of the integrase protein from a diverse collection of HIV type 1 isolates representing several subtypes. AIDS Res Hum Retroviruses. 2002, 18: 1031-1041. 10.1089/08892220260235399.View ArticlePubMedGoogle Scholar
  13. Bairoch A, Apweiler R: The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 2000, 28: 45-48. 10.1093/nar/28.1.45.PubMed CentralView ArticlePubMedGoogle Scholar
  14. Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL: GenBank. Nucleic Acids Res. 2005, 33 (Database Issue): D34-38. 10.1093/nar/gki063.PubMed CentralView ArticlePubMedGoogle Scholar
  15. Henikoff JG, Greene EA, Pietrokovski S, Henikoff S: Increased coverage of protein families with the BLOCKS database servers. Nucleic Acids Res. 2000, 28: 228-230. 10.1093/nar/28.1.228.PubMed CentralView ArticlePubMedGoogle Scholar
  16. Hulo N, Sigrist CJA, Saux VL, Langendijk-Genevaux PS, Bordoli L, Gattiker A, de Castro E, Bucher P, Bairoch A: Recent improvements to the PROSITE database. Nucleic Acids Res. 2004, 32 (Database Issue): 134-137. 10.1093/nar/gkh044.View ArticleGoogle Scholar
  17. Servant F, Bru C, Carrere S, Courcelle E, Gouzy J, Peyruc D, Kahn D: ProDom: Automated clustering of homologous domains. Brief Bioinform. 2002, 3: 246-251.View ArticlePubMedGoogle Scholar
  18. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215: 403-410. 10.1006/jmbi.1990.9999.View ArticlePubMedGoogle Scholar
  19. Schwede T, Kopp J, Guex N, Peitsch MC: SWISS-MODEL: an automated protein homology-modeling server. Nucleic Acids Res. 2003, 31: 3381-3385. 10.1093/nar/gkg520.PubMed CentralView ArticlePubMedGoogle Scholar
  20. Salwinski L, Miller CS, Smith AJ, Pettit FK, Bowie JU, Eisenberg D: The Database of Interacting Proteins: 2004 update. Nucleic Acids Res. 2004, 32 (Database Issue): 449-451. 10.1093/nar/gkh086.View ArticleGoogle Scholar
  21. de Oliveira T, Salemi M, Gordon M, Vandamme AM, van Rensburg EJ, Engelbrecht S, Coovadia HM, Cassol S: Mapping Sites of Positive Selection and Amino Acid Diversification in the HIV Genome: An Alternative Approach to Vaccine Design?. Genetics. 2004, 167: 1047-1058. 10.1534/genetics.103.018135.PubMed CentralView ArticlePubMedGoogle Scholar
  22. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The Protein Data Bank. Nucleic Acids Res. 2000, 28: 235-242. 10.1093/nar/28.1.235.PubMed CentralView ArticlePubMedGoogle Scholar
  23. De Oliveira T, Doherty RS, Seebregts C, Monosi B, Gordon M, Cassol S: The BioAfrica Website: An Integrated Bioinformatics Website for Studying the Explosive HIV-1 Subtype C Epidemic in Africa. Digital Biology: The Emerging Paradigm Conference, NIH: 6 – 7. 2003, Maryland, USA, NovemberGoogle Scholar

Copyright

© Doherty et al; licensee BioMed Central Ltd. 2005

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Advertisement