Early and chronic datasets are comparable with respect to host and viral diversity. The 17 HLA-A, 23 HLA-B and 19 HLA-C alleles (total 56) observed at frequencies > 1% in the early and/or chronic datasets are displayed in Panels A-C, respectively. The early and chronic datasets were comparable with respect to all HLA allele frequencies except HLA-A*02:06, A*30:02 and B*39:01 whose frequencies were higher in the early cohort compared to the chronic cohort (denoted by “*” for p < 0.05 and “*” for p < 0.01, Fisher’s exact test). Note however that no HIV-1 polymorphisms restricted by these three HLA class I alleles were assessed in the present study (see Figure 2 and Additional file 1). Panel D: Unrooted maximum-likelihood phylogenies of early (left), chronic (middle) and combined cohort (right) Gag sequences, on a distance scale of 0.01 substitutions per nucleotide site. Mean patristic (pairwise) genetic distances between Gag sequences were comparable for early and chronic cohorts; moreover, no gross cohort-specific clustering is observed in the combined phylogeny.