The role of recombination in the emergence of a complex and dynamic HIV epidemic

Retrovirology

Table 1 Comparison of subtype assignments (jpHMM results versus current database assignment that is based on the original literature)

		AG set				BC set
Num of sequences		Full length (world) N = 140				Full length (world) N = 509					Fragments (Asia) N = 4413
Database subtype		A	G	02	AG	B	C	07	08	BC	B	C	07	08	BC
Num of sequences		72	12	48	8	152	334	7	4	12	3133	1048	17	171	44
Num of problematic sequences ¹		1	0	2	0	15	12	0	0	3	0	0	0	0	0
Num of discordant sequences ²		0	0	1	0	2	0	0	0	2	24	6	6	102	27
		BF set
Num of sequences		Full length (world) N = 220							Fragments (S. America) N = 4153
Database subtype	B	F		12	17	28	29	BF	B	F	12	17	28	29	BF
Num of sequences	152	12		11	2	3	4	36	3070	242	261	0	0	0	580
Num of problematic sequences ¹	15	0		0	0	0	0	0	0	0	0	0	0	0	0
Num of discordant sequences ²	2	2		6	2	1	1	1	74	19	31	0	0	0	107

1. Problematic sequences are those that could not be unequivocally assigned. They meet one of the following criteria: 1) Contain an unusually high content of IUPAC code N (defined as > 100 continuous Ns, or > 7% N for sequences of length < 1000 nt, or > 5% N for sequences of length 1000-2999, or > 3% N for sequences of length 3000 or above); 2) Contain an artifactual deletion of > 100 nt.
2. Classification of the sequences was compared between the database assignments (of which the majority were extracted from the literature) and the jpHMM predictions.

ISSN: 1742-4690