An insight to HTLV-1-associated myelopathy/tropical spastic paraparesis (HAM/TSP) pathogenesis; evidence from high-throughput data integration and meta-analysis

Background Human T-lymphotropic virus 1-associated myelopathy/tropical spastic paraparesis (HAM/TSP) is a progressive disease of the central nervous system that significantly affected spinal cord, nevertheless, the pathogenesis pathway and reliable biomarkers have not been well determined. This study aimed to employ high throughput meta-analysis to find major genes that are possibly involved in the pathogenesis of HAM/TSP. Results High-throughput statistical analyses identified 832, 49, and 22 differentially expressed genes for normal vs. ACs, normal vs. HAM/TSP, and ACs vs. HAM/TSP groups, respectively. The protein–protein interactions between DEGs were identified in STRING and further network analyses highlighted 24 and 6 hub genes for normal vs. HAM/TSP and ACs vs. HAM/TSP groups, respectively. Moreover, four biologically meaningful modules including 251 genes were identified for normal vs. ACs. Biological network analyses indicated the involvement of hub genes in many vital pathways like JAK-STAT signaling pathway, interferon, Interleukins, and immune pathways in the normal vs. HAM/TSP group and Metabolism of RNA, Viral mRNA Translation, Human T cell leukemia virus 1 infection, and Cell cycle in the normal vs. ACs group. Moreover, three major genes including STAT1, TAP1, and PSMB8 were identified by network analysis. Real-time PCR revealed the meaningful down-regulation of STAT1 in HAM/TSP samples than AC and normal samples (P = 0.01 and P = 0.02, respectively), up-regulation of PSMB8 in HAM/TSP samples than AC and normal samples (P = 0.04 and P = 0.01, respectively), and down-regulation of TAP1 in HAM/TSP samples than those in AC and normal samples (P = 0.008 and P = 0.02, respectively). No significant difference was found among three groups in terms of the percentage of T helper and cytotoxic T lymphocytes (P = 0.55 and P = 0.12). Conclusions High-throughput data integration disclosed novel hub genes involved in important pathways in virus infection and immune systems. The comprehensive studies are needed to improve our knowledge about the pathogenesis pathways and also biomarkers of complex diseases.


Background
HTLV-associated myelopathy/tropical spastic paraparesis (HAM/TSP) is a chronic neurodegenerative disease with progressive characteristics that disturbs the functioning of the sensory and motor nerves [1]. Indeed, infection with HTLV-1 can lead to asymptomatic carrier (AC) state or two diseases including Adult T-Cell Leukemia Lymphoma (ATLL) or/and HAM/TSP [2].
About 10-20 million people worldwide have been infected with HTLV-1 [3]. Endemic areas include the Middle East, Japan, the Caribbean basin, Central Africa, the Melanesian Islands, and South America. Only 2-5% of those infected with the virus develop HAM/TSP [4,5].
Patients with HAM/TSP often have symptoms such as back pain, stiffness, and pain in the lower limbs, urinary frequency, and progressive weakness. Mild cognitive impairment is also common. The clinical signs of the disease imitate multiple sclerosis when the spinal cord is involved, such that sick people need walking aids after 1 year of illness [6].
HTLV-1 may weaken or impair the immune system, which results in autoimmunity to neurons. It also provides an immunosuppressive microenvironment that authorizes the HTLV-1 infected cells to escape host immune response and causes HTLV-1-associated diseases [7].
Studies on HTLV-1 as a factor that deregulates the host's immune system has lasted for many years and has sometimes yielded polemical results. Despite various studies on how to treat HAM/TSP, it is still a challenge for clinicians [8][9][10][11][12]. Therefore, identifying prognostic biomarkers that implicated in the pathogenesis is vital to understand the development and progression of a disease, as well as its diagnosis and treatment. Since now, different genes that are involved in mTOR, NF-kappa B, PI3K, and MAPK signaling pathways have been known in the HAM/TSP cases. Also, apoptosis can occur in the cell nucleus of the HAM/TSP patients [2,13,14].
Microarray technology can simultaneously measure tens of thousands of genes from different tissue samples in a high-throughput and cost-effective manner [15]. However, the results may be irreproducible [16] or be influenced by the data perturbations [17,18]. One possible solution to find robust information is the integration of multiple datasets which is called meta-analysis [19][20][21][22]. To this end, various statistical procedures are employed to combine and analyze the results of the independent studies. Meta-analysis increases the validity of the results and makes the possible estimation of gene expression differences [23].
In this study, we integrated 16 datasets in three groups to find gene signatures by network analysis of differentially expressed genes. The results specified the genes and pathways, which possibly have critical roles in the development of the HAM/TSP pathogenesis. Flow cytometry was employed to determine the ratio of CD4+ to CD8+ and better understanding the pathogenesis of the virus. Moreover, the real-time PCR confirmed different expressions of the determined genes in the HAM/TSP cases versus AC and normal subjects.

Database searching and identification of eligible studies
We searched the Gene Expression Omnibus (http://www. ncbi.nlm.nih.gov/geo/) and ArrayExpress (https ://www. ebi.ac.uk/array expre ss/) by end of 2018 to find datasets reporting the expression levels of miRNA and mRNA in the HAM/TSP and AC subjects. To find the relevant reports, keywords including Human T-lymphotropic virus 1-associated myelopathy/tropical spastic paraparesis, HTLV-1, TSP, HAM/TSP, asymptomatic carrier, AC, ACs were firstly used. The inclusion criteria were then research and regular studies that performed the highthroughput microarray studies on the human subjects. The normal samples were also considered to compare with these groups. The exclusion criteria were studies performed on the non-human samples, cell line, and nonblood samples. Moreover, two independent investigators searched and gathered data from each included study. The quality and consistency of the studies were evaluated by the R package MetaQC (0.1.13) [24]. Finally, the obtained data were classified into three groups named as ACs vs. normal, HAM/TSP vs. normal, and HAM/TSP vs. ACs.

Pre-processing and meta-analysis
The expression data in each group were background corrected and quantile normalized using the Affy package implemented in R programming language (3.6.1) (http://www.r-proje ct.org). The datasets were integrated Conclusions: High-throughput data integration disclosed novel hub genes involved in important pathways in virus infection and immune systems. The comprehensive studies are needed to improve our knowledge about the pathogenesis pathways and also biomarkers of complex diseases. Keywords: HTLV-1, HTLV-1-associated myelopathy/tropical spastic paraparesis, HAM/TSP, High-throughput data integration, Meta-analysis, Microarray individually at miRNA and mRNA levels using random effect method (REM) and then differentially expressed miRNAs (DEMs) and differentially expressed genes (DEGs) were identified by the R package MetaDE (1.0.5), respectively. The low number of DEGs caused that the p-values of less than 0.005 and logFC > |1| were further considered as a significant difference to have more DEGs and networks construction. The experimentally validated targets of each DEMs were obtained using miR-TarBase (http://miRTa rBase .cuhk.edu.cn/) [25] and then integrated super-horizontally with DEGs. The common genes were considered for further analysis.

Networks construction
To construct the network comprises protein-protein interactions (PPIs) in each group, the STRING database version 11.0 was employed [26]. Seven interaction sources including physical interactions, functional association, high-throughput experiments, genomic context, co-expression, databases, and text-mining were considered. Then, the PPIs networks were analyzed in terms of degree by NetworkAnalyzer in Cytoscape 3.7.1. The degree is defined as the number of edges connected to a node [27]. The genes with higher aforementioned criteria were considered as hub genes.

Module finding and pathways analysis
The ACs vs normal network clustering was implemented using the fast unfolding clustering algorithm in Gephi (0.9.2) [2,28,29]. The biologically meaningful modules were then chosen. The networks and modules were visualized by Cytoscape (3.7.1). To find the meaningful pathways in which hub genes are involved, g:Profiler web tool (version: 1185_e69_eg16) was employed [30]. The overall expressed gene lists for each group were considered as the background. Ten top pathway terms with higher P-value were selected for further interpretations.

Patient population and sample collection
The blood samples were collected from eight patients with ACs, eight patients with HAM/TSP, and eight normal samples who referred to the neurology department of Ghaem Hospital, Mashhad University of Medical Sciences (MUMS). All specimens were collected after acquiring informed consent from the patient's guardians. Two trained neurologists affirmed the diagnosis of HAM/TSP according to WHO criteria. All contributors had seropositive test for HTLV-1 by enzyme-linked immunosorbent assay (ELISA, Diapro, Italy). The results of serology were confirmed by PCR [31]. The participants had no history of treatment with IFNs. This study was approved by the Ethics Committee of Biomedical Research at TUMS (IR.TUMS.SPH.REC.1396.242).

Flow cytometry analysis
To determine T helper and cytotoxic cells populations in HAM/TSP, ACs and normal groups; PerCP anti CD3 antibody (bio legend company cat no: 344813), Phicoerythrin (PE) anti CD4 antibody (bio legend company cat no: 317409) and PE anti CD8 antibody (bio legend company cat no: 301007) were used. Fresh peripheral blood samples were treated by lysis buffer for destroying the red blood cells and platelets. Samples were analyzed on a FACS caliber Becton Dichinson. All analyses were done in the lymphocyte gate.

HTLV-1 proviral Load
Peripheral blood mononuclear cells (PBMCs) were isolated from EDTA-treated blood samples using Ficoll density gradient medium (Cedarlane, Hornsby, ON, Canada). The commercial blood mini kit (Qiagen, Germany) was applied to extract DNA from PBMCs. In order to measure the PVL of HTLV-I in PBMCs, a realtime PCR using a commercial real-time-based absolute quantification kit (HTLV-1 RG; Novin Gene, Karaj, Iran) was performed [32].

Quantitative real-time PCR
Total RNA was extracted from fresh PBMCs using TriPure isolation reagent (Roche, Germany) according to the manufacturer's instructions. Double-stranded cDNA was synthesized using the RevertAid TM firststrand cDNA synthesis kit (Fermentas, Germany). Following primers and probes were designed and used to determine the expression levels of STAT1, PSMB8, TAP1 : STAT1 (forward primer: 5ʹ-AAC ATG GAG GAG TCC ACC AATG-3ʹ, reverse primer: 5ʹ-GAT CAC CAC AAC GGG CAG AG-3ʹ and TaqMan probe: FAM-TCT GGC GGC TGA ATT TCG GCA CCT -BHQ1), PSMB8 (forward primer: 5ʹ-GTT CAG ATT GAG ATG GCC CATG-3ʹ, reverse primer: 5ʹ-CGT TCT CCA TTT CGC AGA TAG TAC -3ʹ and TaqMan probe: FAM-CCA CCA CGC TCG CCT TCA AGT TCC -BHQ1), TAP1 (forward primer: 5ʹ-TAC CGC CTT CGT TGT CAG TTATG-3ʹ, reverse primer: 5ʹ-GAG CCC AGG CAG CCT AGA AG-3ʹ and TaqMan probe: Fam-CGC ACA GGG TTT CCA GAG CCGCC-BHQ1). The primers and probes of Tax and HBZ were synthesized according to published data [33]. The relative 2 standard curves real-time PCR was carried out on the cDNA samples using TaqMan master mix (Takara, Otsu, Japan) and a Q-6000 machine (Qiagen, Germany). The GAPDH gene was employed as a housekeeping gene to normalize the mRNA expression levels, and also to control the error between samples [32,34].

Statistical analysis
Statistical analysis was carried out using GraphPad Prism Software Version 7 (GraphPad software, Inc). Quantitative data were expressed as mean ± SEM and percentages. The comparisons between various groups were accomplished using ANOVA. Pearson's or Spearman's tests were used for the analysis of the correlation between variables. The outcomes were considered significant if P ≤ 0.05.

Differentially expressed genes and miRNAs
A total of four miRNAs including hsa-mir-218, hsamir-206, hsa-mir-31, and hsa-mir-34A were identified as DEMs between normal and AC group. The target genes of the mentioned DEMs were further identified in miR-TarBase. A total of 663 genes were identified as target and added to 180 DEGs obtained across microarray datasets. After removing duplicate genes, 832 DEGs were specified. Also, a total of 49 and 22 genes were identified as DEGs for normal vs. HAM/TSP and ACs vs. HAM/TSP groups, respectively (Additional file 1: Table S1).

Protein-protein interactions networks (PPINs) and Module finding
To explore more information about the relationships between the DEGs, PPINs were constructed by STRING. The networks were analyzed in terms of topology and centrality parameters. The nodes with a higher degree and betweenness were selected as hub genes. From these analyses, 24 and 6 hub genes were specified for normal vs. HAM/TSP and ACs vs. HAM/TSP groups, respectively (Fig. 1a, b). The highly connected network of the Normal vs. AC group caused that the modules were explored. A total of 23 modules were identified, which four of them including 251 genes were highly connected and biologically meaningful (Fig. 2a-d).
The color of each node in network is representative of the degree level from bold to pale color, which in turn shows the important role of that node in the network.

Pathway enrichment
In order to find the biological pathway controlled by nodes of each network, the enrichment analysis was carried out. The modules identified from Normal vs. AC group enriched in the following pathways: Module 1: Metabolism of RNA, mRNA Splicing, RNA  (Table 2).

Demographic data
The mean age of three groups was as follow: normal controls: 41 ± 2.8, ACs: 42 ± 3.5, and HAM/TSP patients: 48 ± 3.6. Any significant difference was found between the ages of three groups.

Flow cytometry
Flow Cytometry Data Analyze of T helper and cytotoxic T lymphocytes was done by Flowjo 7.6.1. No significant difference was found among the three groups in terms of the percentage of T helper (P = 0.55) and cytotoxic T lymphocytes (P = 0.12) (Fig. 3).

HTLV-1 proviral load
All HAM/TSP patients had proviral loads (PVLs) in the range of 216-1160 and all ACs had PVLs in the range of 32-140. The mean PVL of HTLV-1 in the HAM/TSP patients was 455.8 ± 114.7, which was significantly higher (P = 0.002) than that in the ACs (60.88 ± 12.92) (Fig. 4a).

Real time-quantitative PCR for validation of expression changes
The expression levels of Tax and HBZ were measured in the samples, which revealed the insignificant up-regulation of Tax in ACs group (1.41 ± 0.27) than that in HAM/ Fig. 1 The PPINs constituted between the identified hub DEGs of a Normal vs. HAM/TSP and b ACs vs. HAM/TSP groups. The color is indicative of the degree level, so that bold color indicates the higher degree of node TSP (1.22 ± 0.16) group (P = 0.42) and significant higher expression level of HBZ in HAM/TSP group (0.08 ± 0.01) than that in ACs group (0.009 ± 0.001) (P = 0.0008) (Fig. 4b, c). Moreover, the network analyses revealed STAT1 and PSMB8 as the nodes with high degree value in normal vs. TSP and ACs vs. TSP groups. Therefore, we examined them with TAP1 as a random gene for further step of validating the meta-analysis results. The differential expression of these genes was analyzed by comparing expression levels in PBMCs of normal, ACs, and HAM/ TSP subjects using RT-qPCR. To this purpose, the differential expressions of genes were analyzed by comparing expression levels in normal, AC, and HAM/TSP samples. The results revealed the meaningful downregulation of STAT1 in HAM/TSP (1.8 ± 0.43) samples than those in the AC (3.6 ± 0.52) and normal (3.3 ± 0.36) samples (P = 0.01 and P = 0.02, respectively) (Fig. 4d). The remarkable down-regulation of TAP1 in HAM/TSP (1.2 ± 0.27) samples than those in the AC (3.0 ± 0.56) and normal (2.7 ± 0.61) samples was observed (P = 0.008 and P = 0.02, respectively) (Fig. 4e). Also, the expression level of PSMB8 has significantly increased in the HAM/ TSP (8.5 ± 1.5) samples than those in the AC (3.8 ± 0.74) and normal (3.1 ± 0.61) samples (P = 0.04 and P = 0.01, respectively) (Fig. 4f ). Moreover, the correlation analysis

Discussion
Despite four decades of researches on HTLV-1, many questions remain regarding the pathogenicity mechanism and key proteins involved in various pathological pathways. Moreover, it is also ambiguous which factors and proteins determine the final destiny of infection by HTLV1 toward HAM/TSP or/and ATLL, while some infected subjects remain in the form of asymptomatic carriers. Microarray technology is widely used to analyze and measure gene expression at the high-throughput scale. Despite the high benefits of using this technology, the result of a population cannot be generalized to another population. Data integration and providing a metaanalysis of the reported data improve the validity and reliability of the results. Genomics, transcriptomics, and proteomics data can be combined to find biomarkers and possible pathogenesis pathways [23].
From differential expression analysis of the miRNAs samples between normal and ACs groups, four miRNAs including hsa-mir-218, hsa-mir-206, hsa-mir-31, and hsamir-34A were identified, which can be considered as biomarkers for diagnosis of AC state.
In complying with previous reports, the identified DEGs were involved in the immune system of the HAM/ TSP subjects. Moreover, the involved molecular network as the primary model was introduced through collection and integration of high-throughput data. We validated two main hub genes of STAT1 and PSMB8, and also TAP1 to confirm our results.
STAT1 is an important intermediary in responding to IFNs. After binding IFN-I to the cellular receptor, signal transduction occurs through protein kinases which results in the activation of Jak kinase. It, in turn, causes phosphorylation of tyrosine in STAT1 and STAT2. The activated STATs are embedded in the dimer with ISGF3 and IRF9 and enter the nucleus which leads to up-regulation of IFNs and enhances the antiviral response [41,42]. The significant down-regulation of STAT1 in patients with HAM/TSP was observed compared with Fig. 4 a HTLV-I-proviral load. The PVL in HAM/TSP patients was significantly higher than in ACs (P = 0.002). b Tax gene expression. No significant difference was found between ACs and HAM/TSP groups (P = 0.42). c HBZ gene expressions which was significantly higher in the HAM/TSP group than that in the ACs group (P = 0.0008). d STAT1 gene expressions in the Normal, ACs, and HAM/TSP groups. STAT1 gene expression in the HAM/TSP was significantly higher than in Normal (P = 0.02). The STAT1 between AC and HAM/TSP patients was statistically different (P = 0.01). No significant difference was found between Normal and AC patients (P = 0.91). e TAP1 gene expressions in the Normal, ACs, and HAM/TSP groups. TAP1 gene expression in the HAM/TSP was significantly higher than in Normal (P = 0.02). The TAP1 between AC and HAM/TSP patients was statistically different (P = 0.008). No significant difference was found between Normal and AC patients (P = 0.72). e PSMB8 gene expressions in the Normal, ACs, and HAM/TSP groups. PSMB8 gene expression in the HAM/TSP was significantly higher than in Normal (P = 0.01). The PSMB8 between AC and HAM/TSP patients was statistically different (P = 0.04). No significant difference was found between Normal and AC patients (P = 0.64) asymptomatic carriers and healthy individuals. The decrease in the expression of STAT1 is the response of the infected cells to escape HTLV-1 from the immune response associated with HAM/TSP.
The expression change of STAT1 in ATLL patients has been reported in several studies [43]. However, no studies have addressed the dysregulation of STAT1 expression in HAM/TSP patients. The reduction of STAT1 and subsequent MHC-I in this disease can significantly affect the action of CD8 and NK cells as important cells in the HAM/TSP pathogenesis [44,45].
A significant increase was observed in the expression of PSMB8 in patients with HAM/TSP in comparison to those who carry the virus and normal subjects. PSMB8 is one of the 17 subunits essential for the synthesis of the 20S proteasome unit [46]. The targeting of proteasome in the HAM/TSP disease is a known mechanism which affects the pathogenicity of HTLV-1 by increasing the activity of genes such as IKBKG [2]. PSMB8 can influence the immune responses due to involvement in the process of apoptosis [47], so its increase in patients with HAM/ TSP may be because of this function. Although previous studies reported the role of apoptosis in the HAM/TSP pathogenesis [2], there is no comprehensive information regarding the role of PSMB8.
TAP1 is another gene which significantly down-regulated in the HAM/TSP group compared with asymptomatic carriers and normal groups. TAP1 protein which is expressed by the TAP gene involves the transfer of antigen from the cytoplasm to the endoplasmic reticulum to accompany with MHC-I. HTLV-1 seems to run out from the antiviral response in association with MHC-I due to impairment in the TAP1 function [48]. Such occurrence was also observed as a result of infections by other viruses such as EBV, CMV, and adenovirus [49]. Similar to STAT1, a It is noteworthy that the immune decrease in the TAP1 expression can also significantly affect CD8 and NK cells [44,45]. Therefore, it seems that escaping from CTLimmune response is one of the important mechanisms for pathogenicity in HAM/TSP; however, more accurate and detailed studies are needed. In HAM/TSP, the disorder expression of the STAT1 and TAP1 proteins can disrupt the immune system.
The enrichment of modules identified from normal vs. ACs group revealed the involvement of hub genes in Infectious disease, Viral Messenger RNA Synthesis, Metabolism of RNA, Pathways in Cancer, Human T-cell leukemia virus 1 infection, and Antigen processing which activate after virus infection and asymptomatic state. These hub genes can be more evaluated in further studies.
The mechanisms involved in the HAM/TSP development are complicated, so identification of proteins which have different expressions than the normal group is critical to find the complete pathogenesis pathway [2].
Determining viral factors such as proviral load along with measuring the expression levels of Tax and HBZ genes will be effective in finding the virus action in the patient group. Moreover, host-related factors such as age, family history of the disease, genetics, and host immune status are important [52][53][54][55][56][57].
Destruction of cells in the central nervous system may be due to the release of inflammatory substances from lymphocytes produced by the immune response to the contaminated TCD4+ cells, which are referred as "bystander" damage. It is most likely the mechanism of tissue damage in HAM/TSP disease. In this study, there was no significant difference in the ratio of CD4 to CD8 in the HAM/TSP patients than asymptomatic carriers and healthy subjects; however, a slight increase was observed in the asymptomatic carriers group in comparison to the HAM/TSP and healthy subjects. This may be due to the function of the immunity system to prevent virus replication and progress toward HAM/ TSP disease, but more studies with a higher sample size are required. Eventually, patients with HAM/TSP have impairment in their immune system induced by the HTLV-1 infection, which includes the innate and adaptive immunity to develop the disease and increase apoptosis [2].

Conclusion
We employed meta-analysis of high throughput data to find the involved genes in the pathogenesis mechanisms of HAM/TSP disease. The network analysis disclosed novel hub genes involved in important pathways in virus infection and then interferon, cytokine, interleukin, and immune systems. Finally, the comprehensive studies are needed to improve our knowledge about the pathogenesis pathways and also biomarkers of complex diseases.
Additional file 1: Table S1. List of the identified DEGs in each group.