Skip to main content

Table 1 Comparison of subtype assignments (jpHMM results versus current database assignment that is based on the original literature)

From: The role of recombination in the emergence of a complex and dynamic HIV epidemic

    AG set BC set
Num of sequences Full length (world)
N = 140
Full length (world)
N = 509
Fragments (Asia)
N = 4413
Database subtype A G 02 AG B C 07 08 BC B C 07 08 BC
Num of sequences 72 12 48 8 152 334 7 4 12 3133 1048 17 171 44
Num of problematic sequences 1 1 0 2 0 15 12 0 0 3 0 0 0 0 0
Num of discordant sequences 2 0 0 1 0 2 0 0 0 2 24 6 6 102 27
    BF set
Num of sequences    Full length (world)
N = 220
Fragments (S. America)
N = 4153
Database subtype B F 12 17 28 29 BF B F 12 17 28 29 BF
Num of sequences 152 12 11 2 3 4 36 3070 242 261 0 0 0 580
Num of problematic sequences 1 15 0 0 0 0 0 0 0 0 0 0 0 0 0
Num of discordant sequences 2 2 2 6 2 1 1 1 74 19 31 0 0 0 107
  1. 1. Problematic sequences are those that could not be unequivocally assigned. They meet one of the following criteria: 1) Contain an unusually high content of IUPAC code N (defined as > 100 continuous Ns, or > 7% N for sequences of length < 1000 nt, or > 5% N for sequences of length 1000-2999, or > 3% N for sequences of length 3000 or above); 2) Contain an artifactual deletion of > 100 nt.
  2. 2. Classification of the sequences was compared between the database assignments (of which the majority were extracted from the literature) and the jpHMM predictions.