Our HIV data analysis web portal stores patients' records obtained from different Midwest patients. We considered patients records that contains at least five laboratory tests during different patient's treatment times. Among other things, each patient laboratory test (from now on referred to a patient's record) provides CD4 count and RNA level readings.
To ensure the selection of unique datasets, we obtained ten randomly selected datasets each with 1,300 patients' records, without replacement, from the 32,297 records. To make sure that the selected records are independent, we performed a Chi Square test which yielded a p value of 0.0032. We then calculated the CD4 and RNA range of spread by calculating the standard deviation (S.D.) for each patient's CD4 and RNA level. We also calculated the correlation value between the CD4 and RNA reading for each patient. At the end of this process, we obtained 3,900 datasets. In order to summarize the data, we used the resulting 3,900 datasets to construct twenty eight groups of patients' records where each group has no more than 5% difference between their average CD4, RNA, and correlation values.