A major susceptibility locus for HTLV-1 infection in childhood maps to chromosome 6q27

Human T-cell leukemia/lymphoma virus type 1 (HTLV-1) is a human oncoretrovirus causing adult T-cell leukemia/lymphoma and chronic neuromyelopathy. We previously showed by segregation analysis that a dominant gene controls HTLV-1 infection through breast-feeding in children of African origin. Here, we report the mapping of this locus by a genome-wide linkage analysis based on the genetic model provided by segregation analysis. Five pedigrees of African origin with HTLV-1 seropositive children were included in the study. Signiﬁcant evidence for linkage (LOD score of 3.36, P 5 0.00004) was obtained for chomosomal region 6q27 when using the robust analysis including only HTLV-1-infected subjects. When HTLV-1 serone-gative children born to infected mothers were added in the analysis, a maximum LOD score of 2.79 ( P 5 0.0002) was obtained for chomosome 2p25. This result was mostly due to the largest pedigree of our sample, which alone gave a LOD score of 2.90 ( P 5 0.00013). We further excluded the role of exonic variants of two candidate genes located in the linked regions, CCR6 (chemokine receptor 6) in 6q27 and ID2 (inhibitor of DNA binding 2) in 2p25. Our results, mapping a major susceptibility locus to chromosome 6q27 and suggesting genetic heterogeneity with another locus at 2p25, pave the way to the determination of the molecular basis of predisposition to HTLV-1 infection in children.


INTRODUCTION
Human T-cell leukemia/lymphoma virus type 1 (HTLV-1), the first human oncoretrovirus to be discovered (1), causes a lymphoproliferative malignancy of CD4-activated cells known as adult T-cell leukemia/lymphoma (ATL) and a chronic myelopathy called tropical spastic paraparesis/HTLV-1-associated myelopathy (TSP/HAM) (2).HTLV-1, which infects 15 to 20 million people, is not a ubiquitous virus.Instead, there are clusters of high endemicity in certain geographic areas or ethnic groups.HTLV-1 antibody prevalence rates, based on strict diagnostic criteria (western blot and/or specific immunofluorescence confirmation), may range from 0.01 to 15% in adults of the general population (3).Areas with HTLV-1 seroprevalences of .2% in adults (i.e.high HTLV-1 seroprevalence) include the south-western islands of Japan, the Caribbean Basin, parts of South America, tropical Africa and parts of the Middle East (Iran) and Melanesia (3).Three modes of transmission have been demonstrated for HTLV-1.Mother-to-child transmission varies from 10 to 20% and is thought to occur after the decline of protective IgG maternal antibodies (around 6-9 months of age) via the ingestion of maternal lymphocytes containing the HTLV-1 provirus during breast-feeding (4)(5)(6).HTLV-1 is also transmitted by sexual intercourse, mainly from men to women (7,8).This may account for the higher age-specific HTLV-1 seroprevalence in women than in men.Finally, intravenous transmission, through needle sharing or blood transfusion, appears to be the most efficient mode of transmission (9).Transmission via the sexual or intravenous routes has been specifically linked to TSP/HAM development.ATL seems to be rare following transmission by blood transfusion, with only a very small number of cases reported, all in immuno-compromised patients.Several studies have shown that ATL in adults is very likely to result from early childhood infection (10)(11)(12) In 1994, we began a large epidemiological study to determine risk factors for HTLV-1 infection in a highly endemic general population of African origin from French Guiana, with an estimated overall HTLV-1 seroprevalence of 8% (13).In a previous study, we presented evidence for the existence of a dominant major gene conferring predisposition to HTLV-1 infection in children ( 10 years old) born to HTLV-1 seropositive mothers (14).We report here the results of a genomewide scan carried out to map this major locus by model-based linkage analysis and to investigate the possible role of candidate genes located in linked regions.

RESULTS
Figure 1 presents the families included in the linkage analysis.Table 1 presents the three regions with multipoint LOD scores .1 for either infected-only or mixed analysis during the primary screen.In these regions (2p25, 6q27 and 11p15), we genotyped 14, 16 and 4 additional microsatellites, respectively.A decrease in maximum LOD score was observed for 11p15, whereas the two remaining regions had a multipoint LOD score .2.08 (P , 0.001).A maximum multipoint LOD score of 3.36 (P ¼ 0.00004) was obtained for region 6q27 (Fig. 2A), at marker D6S297, in the infected-only analysis.Positive LOD scores at this location were obtained for all families except family D, for which a slightly negative value (20.07) was obtained (Table 2).In the mixed analysis, a maximum multipoint LOD score of 2.79 (P ¼ 0.0002) was obtained for region 2p25 (Fig. 2B) at D2S297.This LOD score was entirely due to a single family (family A in Fig. 1), which itself had a maximum LOD score of 2.90 (P ¼ 0.00013) at D2S162 (Table 2).We searched for a possible founder effect in this isolated population by constructing haplotypes in the two regions of interest for all families.We found no common haplotypes among the five families (data not shown).
We then defined a one LOD score interval corresponding roughly to a 90% confidence region (15) for each of the two linked regions.We listed all known genes included in these regions: 44 annotated genes within a 6.1 Mb segment on chromosome 6q26 -q27, between markers D6S1277 and D6S1693, and 30 annotated genes within a 3.4 Mb segment on chromosome 2p25.2-p24.1, between D2S2952 and D2S168.As a first effort to identify the causative gene, we analyzed one attractive positional candidate within each of the two regions.As the HTLV-1 infected children have been infected through breast-feeding, we identified two candidate genes involved in anti-infectious immunity via the development of Peyer's patches, the first barrier to infection in the digestive tract, as documented by reports on the corresponding invalidated mice.The first of these genes, CCR6, is located close to D6S169 and encodes chemokine receptor 6, which is also involved in the development of regulatory T lymphocytes (16,17).The second, inhibitor of DNA binding 2 (ID2), is located close to D2S162 and encodes an inhibitor of basic helix -loop -helix transcription factors.These factors have been shown to play an important role in regulating lymphopoiesis and T-cell development (18 -21).
The CCR6 and ID2 variants detected by the sequencing of the five HTLV-1 seropositive children are shown in Figure 3A and B, respectively.Six CCR6 SNPs and one small deletion within untranslated exon 3 of ID2 (þ1656delA) were found in three out of five of the children tested and were genotyped in the whole case/control sample.They were all in Hardy-Weinberg equilibrium both in the group of 48 HTVL-1 seronegative subjects (including children born of seropositive mothers and young adults) and in the group of 59 HTLV-1 seropositive children and ATL.Except þ2778 A.G, the five remaining CCR6 SNPs were in very strong linkage disequilibrium (R 2 .0.8) defining only two common haplotypes, and no significant difference in the distribution of the six SNPs was observed between HTLV-1 seropositive and seronegative subjects (Table 3).Although the frequency of the ID2 deletion was slightly higher in seropositive (0.31) than in seronegative (0.24) subjects, this difference was not significant (P ¼ 0.39) as well as the difference in the genotypic distribution between the two groups (P ¼ 0.51).

DISCUSSION
We report here the first genome-wide scan by linkage analysis searching for major genes conferring susceptibility to HTLV-1 infection.Host genetic factors for susceptibility to HTLV-1 infection and related diseases (ATL, TSP/ HAM) have previously been investigated by means of association studies, assessing the role of specific candidate genes.The genes of the major histocompatibility complex have been most studied (reviewed in 22,23), but few associations with HTLV-1 infection or ATL have been replicated.Certain HLA class I alleles were found to be associated with ATL in Japanese patients (24,25), whereas the frequency of some class II alleles has been reported to be particularly high in ATL patients and HTLV-1-infected subjects of African origin (26).Our genome-wide scan provided no evidence for linkage to the major histocompatibility complex region (6p21).Negative LOD scores were also obtained for chromosomal region 1p35 -31.3, which contains the SLC2A1 gene encoding the glucose transporter GLUT1, which is the only known coreceptor of HTLV-1 on T-cells (27,28).
The main finding of this study is the identification of a major locus conferring predisposition to childhood HTLV-1 infection on chromosome 6q27.Evidence for linkage to this region was obtained in four families, in the infected-only analysis, with the LOD score for the remaining family (family D) being only slightly negative (20.07) at D6S297 where the linkage peak is located.Interestingly, the LOD score of family D became more negative (20.62 at D6S297 and 21.68 at D6S386) in the mixed analysis (Table 2), explaining to a large extent the overall decrease of LOD score in this analysis.This observation is consistent with genetic heterogeneity in our sample, although very few families were studied for this hypothesis to be formally tested.Close to the 6q27 linkage peak, we identified CCR6 as a relevant candidate gene (16,17,29,30).The CCR6 SNPs detected in HTLV-1 infected children and tested in the case/control sample defined only two common haplotypes which were clearly not associated with HTLV-1 infection.
A second interesting LOD score, due mostly to family A (LOD ¼ 2.90) was obtained for chromosome 2p25.This observation again raises questions about genetic heterogeneity, although it should be noted that family A also made a substantial contribution (LOD score of 1.11 in the infected-only analysis) to 6q27 linkage (Table 2).A larger sample is required for the formal investigation of possible heterogeneity.We further investigated the 2p25 region by analyzing one attractive candidate gene, ID2 (19,21,31,32).
Only one variant, a small deletion in untranslated exon 3 of ID2, was found in the HTLV-1-infected child from family A. However, this variant was not located within the dominant risk haplotype of the 2p25 region segregating within family A. Finally, although this variant was slightly more common in the sample of HTLV-1-infected subjects than in seronegative controls, this difference was not significant.All these results indicate that this ID2 deletion is not involved in HTLV-1 infection.
In conclusion, our results provide molecular evidence, based on linkage analysis, that a major locus with a  Infected-only analysis considered only the HTLV-1 seropositive subjects.In the mixed analysis, HTLV-1 seronegative children born to a seropositive mother after a first HTLV-1 seropositive child with a current age ,10 years were also included.

3308
Human Molecular Genetics, 2006, Vol. 15, No. 22 dominant mode of inheritance is responsible for conferring predisposition to HTLV-1 infection in children of African origin.This locus mapped to chromosome 6q27, and the role of variants located within the coding regions of CCR6 can be excluded.This finding paves the way for identification of the molecular mechanisms of HTLV-1 infection through breast-feeding.It may also have major implications for our understanding of ATL, which occurs in young adults who are very likely to have been infected during childhood.Following on the strategy previously used to identify susceptibility variants in leprosy (33), the next step in our work will be linkage disequilibrium mapping of the 6q27 region to identify the polymorphisms associated with predisposition to HTLV-1 infection.

Subjects studied
A large epidemiological study was conducted from November 1994 to November 1998 in two isolated villages in French Guiana, located in the Amazonian rainforest of north-eastern South America.The methodology of this study has been described elsewhere (13,14) and will be summarized only briefly here.Demographic and medical data were collected by interview and/or from medical files.Information concerning familial relationships was obtained by means of several interviews, and the validity of genealogical data was checked with the local medical team and the population.Serological data concerning HTLV-1 infection were obtained by both ELISA (Cobas Core, antiHTLV-1/2 EIA; Roche, Basel, Switzerland) and immunofluorescence assays (IFA) on HTLV-1-producing MT2 cells.All samples giving positive or borderline ELISA or IFA results were subjected to western blotting for confirmation (western blot HTLV2.3;Diagnostic Biotech, Singapore), with stringent criteria for a positive result (13).
All the subjects included in this survey belong to an ethnic group referred to as the Noir-Marron.This group is descended from African slaves who escaped from the plantations of 18th century Suriname.Our previous segregation analysis for this population provided evidence for the existence of a dominant major gene conferring predisposition to HTLV-1 infection in children.Thus, for the linkage analysis reported here, we focused on the Noir-Marron pedigrees from the two studied villages with at least one HTLV-1-infected child aged 10 years, born to a seropositive mother and breast-fed.Twelve children met these criteria out of 559 children of the same age (HTLV-1 seroprevalence of 2.2% in children 10 years of age).DNA was available for 10 of them, who belonged to five families (Fig. 1).These families provided a total of 46 subjects for whom DNA samples could be obtained.All children and, more generally, all subjects of those families were breast-fed for at least 12 months.
We also collected an additional sample for a case/control association study investigating the role of specific polymorphisms of two candidate genes located within the regions found to be linked to HTLV-1 infection.The cases were unrelated HTLV-1-infected subjects who were either the children of HTLV-1 seropositive mothers from the Noir-Marron population of French Guiana (24 subjects) or the ATL patients of African ancestry from French Guiana and the French West Indies (35 subjects), who are very likely to have been infected during childhood.The controls were independent HTLV-1 seronegative subjects who were either the children of HTLV-1 seropositive mothers from French Guiana (27 subjects) or young individuals ( 25 years old) from a general population from Cameroon (21 subjects).
Informed consent was obtained from adults or from the parents of minors, and the study was carried out in accordance with the human experimentation guidelines of the CCPPRB (Comite ´Consultatif de Protection des Personnes dans la Recherche Biome ´dicale) of Necker Hospital, Paris, France and the CNIL (Commission Nationale de l'Informatique et des Liberte ´s).

Genotyping
Genomic DNA was extracted from blood samples, using the QIamp DNA blood Mini Kit (Qiagen, Hilden, Germany).The primary genetic scan was carried out with a genome-wide panel of 382 polymorphic microsatellite markers with an average spacing of 10 cM and an average heterozygosity of 0.78 -0.82 (ABI Prism Linkage Mapping Set 2, version 2.5, Applied Biosystems).No Mendelian inconsistencies were observed among the members of each family.Additional microsatellite markers were genotyped in regions with a multipoint LOD score .1.
Two genes within the two linked regions were identified by positional mapping and analyzed in detail.The first was ID2, a gene with three exons, located on chromosome 2p25.1 and encoding an inhibitor of basic helix -loop -helix transcription factors (34,35).The second was the gene encoding chemokine receptor 6 (CCR6) CCR6, which consists of four exons and is located on chromosome 6q27 (36,37).Direct sequencing was carried out to search for variants within the coding and flanking regions of these two genes in one HTLV-1 seropositive child from each of the five families (Fig. 1).
Oligonucleotide primers were designed to amplify DNA fragments containing exons by PRIMER3 software (38).
Additional internal primers were also generated and used for sequencing (Supplementary Material, Table S1).PCR was performed with ExTaq DNA polymerase (Takara Bio.Kyoto, Japan) using 25 ng of human DNA in a 15 ml reaction according to manufacturer's instruction.Sequencing reactions with all the primers were performed according to the dye terminator method, using an Applied Biosystems 3730 DNA Analyzer (Applied Biosystems, Foster City, CA, USA).Results were aligned and analyzed for the identification of genetic variations using Genalys software [Takahashi et al. (39), test version available at www.cng.fr].
The detected variants were numbered in accordance with the reference sequences of human ID2 and CCR6 genomic DNA (GenBank accession nos NM_002166 and NM_031409, respectively), with the A of the ATG initiation codon denoted þ1.Variants found in at least three of the five children were subsequently genotyped for the whole case/control sample described earlier and tested for association with HTLV-1 seropositive/seronegative status.

Statistical analysis
Model-based linkage analysis was performed, using the dominant model generated from our previous segregation analysis in the same Noir-Marron population (14).The frequency of the dominant allele conferring predisposition to HTLV-1  Infected-only analysis considered only the HTLV-1 seropositive subjects.In the mixed analysis, HTLV-1 seronegative children born to a seropositive mother after a first HTLV-1 seropositive child with a current age ,10 years were also included.According to the model, all seropositive children under the age of 10 years are predicted to be genetic cases.Detailed penetrance curves were presented in a previous paper (14).
As HTLV-1 exposure during childhood could be uncertain for seronegative subjects, in particular for adults for whom it is impossible to know the HTLV-1 status of their mother when they were breast-fed, we began by carrying out a robust linkage analysis considering only the HTLV-1 seropositive subjects (infected-only analysis).This strategy corresponds to the classical affected-only analysis that is often recommended in the model-based linkage analysis of complex traits when the unaffected phenotype raises some uncertainty (40,41).We then carried out a mixed analysis including the HTLV-1 seronegative children born to a seropositive mother after a first HTLV-1 seropositive child aged ,10.The use of this procedure ensured that all the seronegative children studied had likely been exposed to HTLV-1 infection, as all the children in these families were breast-fed by an infected mother.
Linkage analyses were performed with GENEHUNTER (42) and MERLIN (43,44).The multipoint LOD score results for the chromosomal regions of interest (presented in Table 1 and Fig. 2) were obtained by parametric MERLIN analysis, using marker allele frequencies estimated from the founders of our population, as performed by MERLIN.The information content of the regions of interest was analyzed with GENEHUNTER.Genetic distances between markers were obtained from the Marshfield database and ordered according to the physical map when necessary.In addition, we performed simulations using the SLINK software (45,46) to assess both the validity of LOD-score distribution (simulations under the null hypothesis of no linkage) and the power to detect linkage (simulations under the hypothesis of linkage) within our sample.More specifically, we generated replicates of phenotypic and genotypic data under our dominant model (keeping unknown the phenotypes and genotypes of subjects who were actually missing) within our sample.We simulated 50 000 replicates under the null hypothesis of no linkage and confirmed that the observed distributions of P-values were fully consistent with asymptotic expectations, using both the infected-only and the mixed analyses.Thus, asymptotic P-values for LOD scores are provided in the text.Simulations under the hypothesis of linkage showed that our sample had a power of 60.3 and 81.8% to detect linkage with a type-I error of 0.001 (LOD score .2.08) using the infected-only and the mixed analyses, respectively.
For the association study, genotype distributions and allelic frequencies in HTLV-1 seropositive subjects and seronegative controls were compared using classical x 2 tests (PROC FREQ procedure of the SAS program, version 6.12, SAS Institute, Cary, NC, USA).Linkage disequilibrium measures, such as R 2 , between polymorphisms of a given gene were estimated using HAPLOVIEW (47).

Figure 1 .
Figure 1.Families included in the linkage analyses.HTLV-1-infected and uninfected subjects are shown in black and white, respectively.Question marks indicate individuals with unknown HTLV-1 serological status and without available DNA.Stars indicate individuals with known HTLV-1 serological status but without available DNA.The arrow indicates the infected child in each family, whose DNA was sequenced for SNP detection.Note that in family B, four of the infected individuals developed an ATL.

Figure 2 .
Figure 2. Multipoint linkage analysis in chromosomal regions 6q27 (A) and 2p25 (B).The black solid line with triangles (O) corresponds to LOD scores for all families in the mixed analysis, the black solid line with squares (B) corresponds to LOD scores for all families in the infected-only analysis.The gray solid line with triangles (O) corresponds to the LOD score for family A in the mixed analysis only.

Figure 3 .
Figure 3. Variants identified in CCR6 (A) and ID2 (B).Boxes correspond to exons, with the coding sequences of the exons shown in black.Variants are numbered with the A of the ATG initiation codon as ' þ 1', and rs numbers are indicated for variants already reported in databases.Variants shown in bold were genotyped for the entire case/control sample.

Table 1 .
Maximum multipoint LOD scores .1 obtained in the five Noir-Marron families from French Guiana in the model-based linkage analysis
No significant differences were observed between the two groups for either the allelic or the genotypic distribution of the variants.3310