Skip to main content

Table 1 Comparison of subtype assignments (jpHMM results versus current database assignment that is based on the original literature)

From: The role of recombination in the emergence of a complex and dynamic HIV epidemic

   

AG set

BC set

Num of sequences

Full length (world)

N = 140

Full length (world)

N = 509

Fragments (Asia)

N = 4413

Database subtype

A

G

02

AG

B

C

07

08

BC

B

C

07

08

BC

Num of sequences

72

12

48

8

152

334

7

4

12

3133

1048

17

171

44

Num of problematic sequences 1

1

0

2

0

15

12

0

0

3

0

0

0

0

0

Num of discordant sequences 2

0

0

1

0

2

0

0

0

2

24

6

6

102

27

   

BF set

Num of sequences

  

Full length (world)

N = 220

Fragments (S. America)

N = 4153

Database subtype

B

F

12

17

28

29

BF

B

F

12

17

28

29

BF

Num of sequences

152

12

11

2

3

4

36

3070

242

261

0

0

0

580

Num of problematic sequences 1

15

0

0

0

0

0

0

0

0

0

0

0

0

0

Num of discordant sequences 2

2

2

6

2

1

1

1

74

19

31

0

0

0

107

  1. 1. Problematic sequences are those that could not be unequivocally assigned. They meet one of the following criteria: 1) Contain an unusually high content of IUPAC code N (defined as > 100 continuous Ns, or > 7% N for sequences of length < 1000 nt, or > 5% N for sequences of length 1000-2999, or > 3% N for sequences of length 3000 or above); 2) Contain an artifactual deletion of > 100 nt.
  2. 2. Classification of the sequences was compared between the database assignments (of which the majority were extracted from the literature) and the jpHMM predictions.