GuavaH: a compendium of host genomic data in HIV biology and disease
© Bartha et al.; licensee BioMed Central Ltd. 2014
Received: 3 December 2013
Accepted: 7 January 2014
Published: 15 January 2014
There is an ever-increasing volume of data on host genes that are modulated during HIV infection, influence disease susceptibility or carry genetic variants that impact HIV infection. We created GuavaH (Genomic Utility for Association and Viral Analyses in HIV, http://www.GuavaH.org), a public resource that supports multipurpose analysis of genome-wide genetic variation and gene expression profile across multiple phenotypes relevant to HIV biology.
We included original data from 8 genome and transcriptome studies addressing viral and host responses in and ex vivo. These studies cover phenotypes such as HIV acquisition, plasma viral load, disease progression, viral replication cycle, latency and viral-host genome interaction. This represents genome-wide association data from more than 4,000 individuals, exome sequencing data from 392 individuals, in vivo transcriptome microarray data from 127 patients/conditions, and 60 sets of RNA-seq data. Additionally, GuavaH allows visualization of protein variation in ~8,000 individuals from the general population. The publicly available GuavaH framework supports queries on (i) unique single nucleotide polymorphism across different HIV related phenotypes, (ii) gene structure and variation, (iii) in vivo gene expression in the setting of human infection (CD4+ T cells), and (iv) in vitro gene expression data in models of permissive infection, latency and reactivation.
The complexity of the analysis of host genetic influences on HIV biology and pathogenesis calls for comprehensive motors of research on curated data. The tool developed here allows queries and supports validation of the rapidly growing body of host genomic information pertinent to HIV research.
The field of HIV research has adopted genome-wide technologies in order to meet the goal of understanding the complex interplay between host and pathogen. A growing number of approaches allow the interrogation of DNA variation (genome-wide genotyping, exome and whole genome sequencing), RNA variation (transcriptome analyses by gene expression arrays or deep sequencing), as well as large-scale functional screens (gene silencing using siRNA or shRNA, gain of function using gene overexpression). This is complemented with proteome and protein interaction analyses. The objective of these studies is to characterize the behavior of any gene/protein in the context of HIV infection in vitro or in vivo.
These studies are generally evaluated using strict statistics, which are necessary considering the large number of hypotheses that are simultaneously tested in most genome-wide scans. In addition, many studies require external validation, such as association results in a separate set of infected individuals, or expression results across various biological conditions. Accessing those resources is complex because raw data, or complete sets of analysis statistics are rarely available – or require re-contacting the original sources. Currently, there is a lack of integrated analysis tools by which researchers can easily access well curated data; to reinforce their own observations, for external replication or for generation of novel hypotheses.
GuavaH currently provides results from GWAS of HIV disease phenotypes including more than 4,000 individuals. GWAS use large-scale genotyping technology (usually arrays interrogating 500,000 to 1 million single nucleotide polymorphisms, SNPs) complemented with statistical approaches that allow imputation of millions of additional variants that are not directly measured by the assay. The main challenge of GWAS is the stringent statistical threshold for claiming association (usually p < 5 × 10-8). The power to identify SNPs associated with a given phenotype depends on the frequency and the effect size of the genetic variant, and on sample size. Thus, large numbers of study participants and meta-analyses across studies are required. GuavaH includes association results on HIV control (set point plasma viral load [1, 2] and elite control ) and on susceptibility to infection in a cohort of highly exposed seronegative individuals . In addition to these traditional GWAS of clinically related outcomes GuavaH includes data from a recent genome-to-genome analysis of host genetic variants impacting the nucleic acid sequence of the infecting virus . The genome-to-genome approach identifies loci of host-pathogen conflict independently of clinical data. Thus, GuavaH allows the interrogation of any SNP across multiple studies and phenotypes, and facilitates the validation of associations identified in other studies.
The GuavaH resource also includes functional transcriptome analyses from in vivo and in vitro studies. The in vivo data were obtained by microarray studies of CD4+ T cells from 127 individuals chronically infected with HIV, and representing the full spectrum of viral load . These data can be contrasted with temporal in vitro analysis of the HIV replication cycle in a T cell line (Sup T1), representing 12 data points from HIV infected cells and 12 data points from uninfected cells analysed by sequencing . For example, Figure 1 illustrates the in vivo and in vitro increase in TRIM5α expression during active HIV-1 infection. Given the growing importance of latency research, we also incorporated detailed RNA sequencing data on the dynamic process of entering and maintaining latency in a primary cell model, and on the expression changes in host and viral transcripts upon reactivation with various pharmacological agents and immunological stimuli. GuavaH allows the interrogation of any gene across studies and cellular systems, and facilitates the validation of expression profiles identified in other studies.
Online resources on host genes in HIV biology and disease
Associated sites to GuavaH
Querying of cellular responses to HIV in vitro (SupT1 cells)
Querying of expression data during HIV latency and upon reactivation in a primary CD4+ T cell model
Interactive HIV-host genome-to-genome map of the HLA class I locus and viral genome variation
Interactive overlapping of output from genome-wide surveys of host cell genes linked to HIV infection
NCBI HIV-1 Human protein interaction database
The HIV-1, human protein interaction data are based on literature reports.
Visualization, interpretation and analysis of pathway knowledge
VirusMINT – Virus molecular interaction database
Interactions between human and HIV proteins are integrated in the human protein interaction network
Promoting easy access to genome-wide association and functional data fits the goal defined in 2009 by The Global HIV Vaccine Enterprise of understanding the role of host genetics in HIV research: “New high-throughput genetic approaches have the potential to identify major genetic factors contributing to clinical outcome in HIV-1 infection. Ideally, every human gene that impacts on each mode of HIV transmission and disease outcome should be identified to improve our understanding of the mechanisms of protection” . GuavaH is a useful tool for visualizing the host genomic effects attributable to a given gene of interest and its potential functional implications in a variety of in vitro and in vivo settings of HIV infection.
Availability of supporting data
GuavaH provides access to published datasets and to unpublished data upon discussion with the researchers in charge of the original work. It also allows depositing of new sets of data for public or private querying. Contact: firstname.lastname@example.org
Paul de Bakker, Florencia Pereyra, Bruce Walker, David Goldstein, Pejman Mohammadi, Julia di Iulio and Margalida Rotger for their contributions to the original data presented in this website.
- Fellay J, Ge D, Shianna KV, Colombo S, Ledergerber B, Cirulli ET, Urban TJ, Zhang K, Gumbs CE, Smith JP, et al: Common genetic variation and the control of HIV-1 in humans. PLoS Genet. 2009, 5: e1000791-10.1371/journal.pgen.1000791.PubMed CentralView ArticlePubMedGoogle Scholar
- Fellay J, Shianna KV, Ge D, Colombo S, Ledergerber B, Weale M, Zhang K, Gumbs C, Castagna A, Cossarizza A, et al: A whole-genome association study of major determinants for host control of HIV-1. Science. 2007, 317: 944-947. 10.1126/science.1143767.PubMed CentralView ArticlePubMedGoogle Scholar
- International HIVCS, Pereyra F, Jia X, McLaren PJ, Telenti A, de Bakker PI, Walker BD, Ripke S, Brumme CJ, Pulit SL, et al: The major genetic determinants of HIV-1 control affect HLA class I peptide presentation. Science. 2010, 330: 1551-1557.View ArticleGoogle Scholar
- Lane J, McLaren PJ, Dorrell L, Shianna KV, Stemke A, Pelak K, Moore S, Oldenburg J, Alvarez-Roman MT, Angelillo-Scherrer A, et al: A genome-wide association study of resistance to HIV infection in highly exposed uninfected individuals with hemophilia A. Hum Mol Genet. 2013, 22: 1903-1910. 10.1093/hmg/ddt033.PubMed CentralView ArticlePubMedGoogle Scholar
- Bartha I, Carlson JM, Brumme CJ, McLaren PJ, Brumme ZL, John M, Haas DW, Martinez-Picado J, Dalmau J, Lopez-Galindez C, et al: A genome-to-genome analysis of associations between human genetic variation, HIV-1 sequence diversity, and viral control. eLife. 2013, 2: e01123-10.7554/eLife.01123.PubMed CentralView ArticlePubMedGoogle Scholar
- MacArthur DG, Balasubramanian S, Frankish A, Huang N, Morris J, Walter K, Jostins L, Habegger L, Pickrell JK, Montgomery SB, et al: A systematic survey of loss-of-function variants in human protein-coding genes. Science. 2012, 335: 823-828. 10.1126/science.1215040.PubMed CentralView ArticlePubMedGoogle Scholar
- Rotger M, Dang KK, Fellay J, Heinzen EL, Feng S, Descombes P, Shianna KV, Ge D, Gunthard HF, Goldstein DB, et al: Genome-wide mRNA expression correlates of viral control in CD4+ T-cells from HIV-1-infected individuals. PLoS Pathog. 2010, 6: e1000781-10.1371/journal.ppat.1000781.PubMed CentralView ArticlePubMedGoogle Scholar
- Mohammadi P, Desfarges S, Bartha I, Joos B, Zangger N, Munoz M, Gunthard HF, Beerenwinkel N, Telenti A, Ciuffi A: 24 hours in the life of HIV-1 in a T cell line. PLoS Pathog. 2013, 9: e1003161-10.1371/journal.ppat.1003161.PubMed CentralView ArticlePubMedGoogle Scholar
- Bushman FD, Barton S, Bailey A, Greig C, Malani N, Bandyopadhyay S, Young J, Chanda S, Krogan N: Bringing it all together: big data and HIV research. Aids. 2013, 27: 835-838. 10.1097/QAD.0b013e32835cb785.PubMed CentralView ArticlePubMedGoogle Scholar
- McMichael AJ, McCutchan F: Host genetics and viral diversity: report from a global HIV vaccine enterprise working group. Nat Prec. 2010, doi:10.1038/npre.2010.4797.2Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.