Extrapolation of data volume needed for prediction of the phenotype vector from V3 sequence. (A) Error of the predicted phenotype vectors. Predictions are calculated via LOOCV, error is estimated by the Euclidean distance between the predicted and observed phenotype vectors. Blue and red bars mark R5 and X4 reference clones, respectively. (B) Estimation of the training set size effect on the prediction error. Error functions were fitted to simulated training set sizes (2-22) for all tested clones (thick black line), and for the subsets of X4 (dashed red line) and R5 (dashed blue line) clones, respectively. Thin black lines represent the top and bottom 0.25 quantiles of the averaged error function for all clones with two vertical gray lines indicating the distance between the quantiles. Dashed horizontal lines represent cut-offs for recognizing R5/X4 (black) and dual-tropic (magenta) viruses, respectively. The training set size of the averaged function at two cut-offs are pointed to by arrows and indicated in the legend.