Linear Motif discovery based on motif definition. To discover linear motif (LM), multiple sequence alignment (MSA), Shannon entropy plots and sequence context in terms of structural disorder were used. (A) shows the Shannon entropy plot of HIV-1 Tat and Rev to analyze variation in sequence conservation. This plot quantifies diversity using MSA at each single amino acid position in full length protein. Amino acid positions that do not exhibit any changes in protein sequences have entropy of zero whereas a position that shows highly variable substitution is represented by large peaks. The basic region of both Tat and Rev with a motif pattern RXXRRXRRR is present in regions that show more than 90% conservation indicating their importance in viral genome. Disorder predictions using IUPred plot (B) indicates that Arginine Rich Motif (ARM) in both Tat and Rev is above the threshold of 0.5 predicting their nature as natively disordered polypeptide. (C) shows the sequence comparison of the ARM in different viral proteins of Human immunodeficiency virus type 1 (HIV-1), Human T-lymphotropic virus (HTLV1), Human herpes virus (HHV) and adenovirus (Ad). Tat, Rev, Rex, UL56, UL54, pVII, E4orf4 are viral open reading frames and their corresponding ARM are shown with arginine residues highlighted in red. The numbers identify residue positions in full length proteins. (D) shows the distribution of basic amino acids in two LM patterns in the host proteome. One is AXXAAX that is a well defined classical NLS pattern of host while AXXAAXAAA is an uncharacterized viral LM. This analysis highlights the specificity of ARM from viral proteins as a Short linear motif (SLiM).