Phylogenetic analysis using full-length clade C sequences from diverse geographical locations. A. The graphic represents maximum likelihood phylogenetic tree using full − length clade C sequences from Southeast Asia (India, China and Myanmar) and Southern-Africa (South Africa, Zambia and Botswana) with reference clade B as out-group. The analysis identified a genetic divergence of HIV-1CIndieC1 (C.IN.93IN901 marked with filled Blue Circle) and HIV-1C1084i (C.ZM.HIV1084i marked with Red square) with distinct phylogenetic clusters. B. Clinical isolates from two different geographical locations (Southern-Africa and Southeast Asia; n = 250) were included in the Maximum Likelihood (ML) phylogenetic tree along with the reference HIV-1C sequences from the Los Alamos database from Zambia, Botswana, South Africa and India (n =21) and HIV-1B sequences (n = 4) from the US, Netherlands, Thailand and France (outliers). Only the representative sequences were chosen from Los Alamos Database to avoid “cohort effect” in the phylogenetic analysis. The tree was constructed in MEGA 5  software, with general time reversible with inverse gamma distribution (GTR + G + I), which has been predicted as the best-fit model. The Southern African sequences are indicated by filled squares (Red: Zambia and Pink: South Africa) and Southeast Asian sequences were marked with filled circles (Green: Bangladesh and Blue: India). C. The Predicted amino acid sequence of tat gene from the Zambian HIV-1C molecular clone HIV-11084i is aligned with sequences from representative virus isolates of HIV-1B (HIV − 1BJR−CSF; HIV-1BADA) and HIV-1C (HIV-1CMJ4; HIV-1CIndieC1) as well as consensus sequences using ClustalW software. Clade C signature amino acid residues previously defined by Ranga et al. , present in the majority of HIV-1C Tat sequences in the Los Alamos database , are Clade C signature amino acid residues previously defined by Ranga et al. , present in the majority of HIV-1C Tat sequences in the Los Alamos database , are indicated via asterisks below consenus clade C sequence (CON_C). With the exception of C31, all these residues are conserved across all clade C isolates. Dots represent the residues identical with the consensus of the respective clade. Residues in red indicate C30C31/C30S31 motif in Tat. Sequences are grouped by clade and indicated on left.