The Shapiro—Senapathy algorithm (S&S) is an algorithm for predicting splice junctions in genes of animals and plants.[1][2] This algorithm has been used to discover disease-causing splice site mutations and cryptic splice sites.
A splice site is the border between an exon and intron in a gene. These sites contain a particular sequence motif, which is necessary for recognition and processing by the RNA splicing machinery.[1]
The S&S algorithm uses sliding windows of eight nucleotides, corresponding to the length of the splice site sequence motif, to identify these conserved sequences and thus potential splice sites.[1] Using a weighted table of nucleotide frequencies, the S&S algorithm outputs a consensus-based percentage for the possibility of the window containing a splice site.[1]
The S&S algorithm serves as the basis of other software tools, such as Human Splicing Finder,[3] Splice-site Analyzer Tool,[4] dbass (Ensembl),[5] Alamut,[6] and SROOGLE.[7]
Specific mutations in different splice sites in various genes causing breast cancer (e.g., BRCA1, PALB2), ovarian cancer (e.g., SLC9A3R1, COL7A1, HSD17B7), colon cancer (e.g., APC, MLH1, DPYD), colorectal cancer (e.g., COL3A1, APC, HLA-A), skin cancer (e.g., COL17A1, XPA, POLH), and Fanconi anemia (e.g., FANC, FANA) have been uncovered. The mutations in the donor and acceptor splice sites in different genes causing a variety of cancers that have been identified by S&S are shown in Table 1.
Specific mutations in different splice sites in various genes that cause inherited disorders, including, for example, Type 1 diabetes (e.g., PTPN22, TCF1 (HCF-1A)), hypertension (e.g., LDL, LDLR, LPL), Marfan syndrome (e.g., FBN1, TGFBR2, FBN2), cardiac diseases (e.g., COL1A2, MYBPC3, ACTC1), eye disorders (e.g., EVC, VSX1) have been uncovered. A few example mutations in the donor and acceptor splice sites in different genes causing a variety of inherited disorders identified using S&S are shown in Table 2.
Xeroderma pigmentosum, an autosomal recessive disorder is caused by faulty proteins formed due to new preferred splice donor site identified using S&S algorithm and resulted in defective nucleotide excision repair.[31]
Type I Bartter syndrome (BS) is caused by mutations in the gene SLC12A1. S&S algorithm helped in disclosing the presence of two novel heterozygous mutations c.724 + 4A > G in intron 5 and c.2095delG in intron 16 leading to complete exon 5 skipping.[32]
Mutations in the MYH gene, which is responsible for removing the oxidatively damaged DNA lesion are cancer-susceptible in the individuals. The IVS1+5C plays a causative role in the activation of a cryptic splice donor site and the alternative splicing in intron 1, S&S algorithm shows, guanine (G) at the position of IVS+5 is well conserved (at the frequency of 84%) among primates. This also supported the fact that the G/C SNP in the conserved splice junction of the MYH gene causes the alternative splicing of intron 1 of the β type transcript.[33]
Splice site scores were calculated according to S&S to find EBV infection in X-linked lymphoproliferative disease.[61] Identification of Familial tumoral calcinosis (FTC) is an autosomal recessive disorder characterized by ectopic calcifications and elevated serum phosphate levels and it is because of aberrant splicing.[62]
Application of S&S in hospitals for clinical practice and research
Applying the S&S technology platform in modern clinical genomics research hasadvance diagnosis and treatment of human diseases.
In the modern era of Next Generation Sequencing (NGS) technology, S&S is applied in clinical practice extensively. Clinicians and molecular diagnostic laboratories apply S&S using various computational tools including HSF,[3] SSF,[4] and Alamut.[6] It is aiding in the discovery of genes and mutations in patients whose disease are stratified or when the disease in a patient is unknown based on clinical investigations.
In this context, S&S has been applied on cohorts of patients in different ethnic groups with various cancers and inherited disorders. A few examples are given below.
Clinical and Mutational Characterizations of Ten Indian Patients with Beta-Ketothiolase Deficiency[68]
2016
Indian
10 Patients
4
Unclear speech developmental delay
Progressive SCAR14 with unclear speech, developmental delay, tremor, and behavioral problems caused by a homozygous deletion of the SPTBN2 pleckstrin homology domain[69]
Dr. Senapathy's original objective in developing a method for identifying splice sites was to find complete genes in raw uncharacterized genomic sequence that could be used in the human genome project.[73][2] In the landmark paper with this objective,[73] he described the basic method for identifying the splice sites within a given sequence based on the Position Weight Matrix (PWM)[1] of the splicing sequences in different eukaryotic organism groups for the first time. He also created the first exon detection method by defining the basic characteristics of an exon as the sequence bounded by an acceptor and a donor splice sites that had S&S scores above a threshold, and by an ORF that was mandatory for an exon. An algorithm for finding complete genes based on the identified exons was also described by Dr. Senapathy for the first time.[73][2]
Dr. Senapathy demonstrated that only deleterious mutations in the donor or acceptor splice sites that would drastically make the protein defective would reduce the splice site score (later known as the Shapiro–Senapathy score), and other non-deleterious variations would not reduce the score. The S&S method was adapted for researching the cryptic splice sites caused by mutations leading to diseases. This method for detecting deleterious splicing mutations in eukaryotic genes has been used extensively in disease research in the humans, animals and plants over the past three decades, as described above.
The basic method for splice site identification, and for defining exons and genes was subsequently used by researchers in finding splice sites, exons and eukaryotic genes in a variety of organisms. These methods also formed the basis of all subsequent tools development for discovering genes in uncharacterized genomic sequences. It also was used in a different computational approaches including machine learning and neural network, and in alternative splicing research.
Discovering the mechanisms of aberrant splicing in diseases
The Shapiro–Senapathy algorithm has been used to determine the various aberrant splicing mechanisms in genes due to deleterious mutations in the splice sites, which cause numerous diseases. Deleterious splice site mutations impair the normal splicing of the gene transcripts, and thereby make the encoded protein defective. A mutant splice site can become “weak” compared to the original site, due to which the mutated splice junction becomes unrecognizable by the spliceosomal machinery. This can lead to the skipping of the exon in the splicing reaction, resulting in the loss of that exon in the spliced mRNA (exon-skipping). On the other hand, a partial or complete intron could be included in the mRNA due to a splice site mutation that makes it unrecognizable (intron inclusion). A partial exon-skipping or intron inclusion can lead to premature termination of the protein from the mRNA, which will become defective leading to diseases. The S&S has thus paved the way to determine the mechanisms by which a deleterious mutation could lead to a defective protein, resulting in different diseases depending on which gene is affected.
lead to exon skipping, intron inclusion, or the use of a cryptic splice site, resulting in either a truncated protein or a protein lacking a small region of the coding sequence[76]
An example of splicing aberration (exon skipping) caused by a mutation in the donor splice site in the exon 8 of MLH1 gene that led to colorectal cancer is given below. This example shows that a mutation in a splice site within a gene can lead to a profound effect in the sequence and structure of the mRNA, and the sequence, structure and function of the encoded protein, leading to disease.
S&S in cryptic splice sites research and medical applications
The proper identification of splice sites has to be highly precise as the consensus splice sequences are very short and there are many other sequences similar to the authentic splice sites within gene sequences, which are known as cryptic, non-canonical, or pseudo splice sites. When an authentic or real splice site is mutated, any cryptic splice sites present close to the original real splice site could be erroneously used as authentic site, resulting in an aberrant mRNA. The erroneous mRNA may include a partial sequence from the neighboring intron or lose a partial exon, which may result in a premature stop codon. The result may be a truncated protein that would have lost its function completely.
Shapiro–Senapathy algorithm can identify the cryptic splice sites, in addition to the authentic splice sites. Cryptic sites can often be stronger than the authentic sites, with a higher S&S score. However, due to the lack of an accompanying complementary donor or acceptor site, this cryptic site will not be active or used in a splicing reaction. When a neighboring real site is mutated to become weaker than the cryptic site, then the cryptic site may be used instead of the real site, resulting in a cryptic exon and an aberrant transcript.
Numerous diseases have been caused by cryptic splice site mutations or usage of cryptic splice sites due to the mutations in authentic splice sites.[78][79][80][81][82]
The mRNA splicing plays a fundamental role in gene functional regulation. Very recently, it has been shown that A to G conversions at splice sites can lead to mRNA mis-splicing in Arabidopsis.[88] The splicing and exon–intron junction prediction coincided with the GT/AG rule (S&S) in the Molecular characterization and evolution of carnivorous sundew (Drosera rotundifolia L.) class V b-1,3-glucanase.[89] Unspliced (LSDH) and spliced (SSDH) transcripts of NAD+ dependent sorbitol dehydroge nase (NADSDH) of strawberry (Fragaria ananassa Duch., cv. Nyoho) were investigated for phytohormonal treatments.[90]
Ambra1 is a positive regulator of autophagy, a lysosome-mediated degradative process involved both in physiological and pathological conditions. Nowadays, this function of Ambra1 has been characterized only in mammals and zebrafish.[84] Diminution of rbm24a or rbm24b gene products by morpholino knockdown resulted in significant disruption of somite formation in mouse and zebrafish.[85] Dr.Senapathy algorithm used extensively to study intron-exon organization of fut8 genes. The intron-exon boundaries of Sf9 fut8 were in agreement with the consensus sequence for the splicing donor and acceptor sites concluded using S&S.[86]
^ abHoudayer, Claude (2011), "In Silico Prediction of Splice-Affecting Nucleotide Variants", In Silico Tools for Gene Discovery, Methods in Molecular Biology, vol. 760, Humana Press, pp. 269–281, doi:10.1007/978-1-61779-176-5_17, ISBN9781617791758, PMID21780003
^Damiola, Francesca; Schultz, Inès; Barjhoux, Laure; Sornin, Valérie; Dondon, Marie-Gabrielle; Eon-Marchais, Séverine; Marcou, Morgane; Caron, Olivier; Gauthier-Villars, Marion (2015-11-12). "Mutation analysis of PALB2 gene in French breast cancer families". Breast Cancer Research and Treatment. 154 (3): 463–471. doi:10.1007/s10549-015-3625-7. ISSN0167-6806. PMID26564480. S2CID12852074.
^ abDudley, Beth; Brand, Randall E.; Thull, Darcy; Bahary, Nathan; Nikiforova, Marina N.; Pai, Reetesh K. (August 2015). "Germline MLH1 Mutations Are Frequently Identified in Lynch Syndrome Patients With Colorectal and Endometrial Carcinoma Demonstrating Isolated Loss of PMS2 Immunohistochemical Expression". The American Journal of Surgical Pathology. 39 (8): 1114–1120. doi:10.1097/pas.0000000000000425. ISSN0147-5185. PMID25871621. S2CID26069072.
^Mensenkamp, Arjen R.; Vogelaar, Ingrid P.; van Zelst–Stams, Wendy A.G.; Goossens, Monique; Ouchene, Hicham; Hendriks–Cornelissen, Sandra J.B.; Kwint, Michael P.; Hoogerbrugge, Nicoline; Nagtegaal, Iris D. (March 2014). "Somatic Mutations in MLH1 and MSH2 Are a Frequent Cause of Mismatch-Repair Deficiency in Lynch Syndrome-Like Tumors". Gastroenterology. 146 (3): 643–646.e8. doi:10.1053/j.gastro.2013.12.002. ISSN0016-5085. PMID24333619.
^ abvan der Post, Rachel S.; Vogelaar, Ingrid P.; Manders, Peggy; van der Kolk, Lizet E.; Cats, Annemieke; van Hest, Liselotte P.; Sijmons, Rolf; Aalfs, Cora M.; Ausems, Margreet G.E.M. (October 2015). "Accuracy of Hereditary Diffuse Gastric Cancer Testing Criteria and Outcomes in Patients With a Germline Mutation in CDH1". Gastroenterology. 149 (4): 897–906.e19. doi:10.1053/j.gastro.2015.06.003. ISSN0016-5085. PMID26072394.
^Castiglia, Daniele; Pagani, Elena; Alvino, Ester; Vernole, Patrizia; Marra, Giancarlo; Cannavò, Elda; Jiricny, Josef; Zambruno, Giovanna; D'Atri, Stefania (June 2003). "Biallelic somatic inactivation of the mismatch repair gene MLH1 in a primary skin melanoma". Genes, Chromosomes and Cancer. 37 (2): 165–175. doi:10.1002/gcc.10193. ISSN1045-2257. PMID12696065. S2CID1228058.
^ abSidwell, R.U.; Sandison, A.; Wing, J.; Fawcett, H.D.; Seet, J-E.; Fisher, C.; Nardo, T.; Stefanini, M.; Lehmann, A.R. (July 2006). "A novel mutation in the XPA gene associated with unusually mild clinical features in a patient who developed a spindle cell melanoma". British Journal of Dermatology. 155 (1): 81–88. doi:10.1111/j.1365-2133.2006.07272.x. ISSN0007-0963. PMID16792756. S2CID42003864.
^ abNozu, Kandai; Iijima, Kazumoto; Kawai, Kazuo; Nozu, Yoshimi; Nishida, Atsushi; Takeshima, Yasuhiro; Fu, Xue Jun; Hashimura, Yuya; Kaito, Hiroshi (10 July 2009). "In vivo and in vitro splicing assay of SLC12A1 in an antenatal salt-losing tubulopathy patient with an intronic mutation". Human Genetics. 126 (4): 533–538. doi:10.1007/s00439-009-0697-7. ISSN0340-6717. PMID19513753. S2CID20181541.
^Becker, A. J.; Löbach, M.; Klein, H.; Normann, S.; Nöthen, M. M.; von Deimling, A.; Mizuguchi, M.; Elger, C. E.; Schramm, J. (March 2001). "Mutational analysis of TSC1 and TSC2 genes in gangliogliomas". Neuropathology and Applied Neurobiology. 27 (2): 105–114. doi:10.1046/j.0305-1846.2001.00302.x. ISSN0305-1846. PMID11437991. S2CID9696988.
^Schick, Volker; Majores, Michael; Engels, Gudrun; Spitoni, Sylvia; Koch, Arend; Elger, Christian E.; Simon, Matthias; Knobbe, Christiane; Blümcke, Ingmar (2006-09-30). "Activation of Akt independent of PTEN and CTMP tumor-suppressor gene mutations in epilepsy-associated Taylor-type focal cortical dysplasias". Acta Neuropathologica. 112 (6): 715–725. doi:10.1007/s00401-006-0128-y. ISSN0001-6322. PMID17013611. S2CID35008161.
^Muller, Danièle; Mazoyer, Sylvie; Stoppa-Lyonnet, Dominique; Sinilnikova, Olga M.; Andrieu, Nadine; Fricker, Jean-Pierre; Bignon, Yves-Jean; Longy, Michel; Lasset, Christine (2015-12-01). "Mutation analysis of PALB2 gene in French breast cancer families". Breast Cancer Research and Treatment. 154 (3): 463–471. doi:10.1007/s10549-015-3625-7. ISSN1573-7217. PMID26564480. S2CID12852074.
^Masunaga, Takuji; Ogawa, Junki; Akiyama, Masashi; Nishikawa, Takeji; Shimizu, Hiroshi; Ishiko, Akira (2017). "Compound heterozygosity for novel splice site mutations of ITGA6 in lethal junctional epidermolysis bullosa with pyloric atresia". The Journal of Dermatology. 44 (2): 160–166. doi:10.1111/1346-8138.13575. ISSN1346-8138. PMID27607025. S2CID3934121.
^Hansen, Thomas vO; Nielsen, Finn C.; Gerdes, Anne-Marie; Ousager, Lilian B.; Jensen, Uffe B.; Skytte, Anne-Bine; Albrechtsen, Anders; Rossing, Maria (February 2017). "Genetic screening of the FLCN gene identify six novel variants and a Danish founder mutation". Journal of Human Genetics. 62 (2): 151–157. doi:10.1038/jhg.2016.118. ISSN1435-232X. PMID27734835. S2CID24558301.
^Jääskeläinen, Pertti; Kuusisto, Johanna; Miettinen, Raija; Kärkkäinen, Päivi; Kärkkäinen, Satu; Heikkinen, Sami; Peltola, Paula; Pihlajamäki, Jussi; Vauhkonen, Ilkka (4 November 2002). "Mutations in the cardiac myosin-binding protein C gene are the predominant cause of familial hypertrophic cardiomyopathy in eastern Finland". Journal of Molecular Medicine. 80 (7): 412–422. doi:10.1007/s00109-002-0323-9. ISSN0946-2716. PMID12110947. S2CID7089974.
^Attanasio, M; Lapini, I; Evangelisti, L; Lucarini, L; Giusti, B; Porciani, MC; Fattori, R; Anichini, C; Abbate, R (2008-04-23). "FBN1 mutation screening of patients with Marfan syndrome and related disorders: detection of 46 novel FBN1 mutations". Clinical Genetics. 74 (1): 39–46. doi:10.1111/j.1399-0004.2008.01007.x. ISSN0009-9163. PMID18435798. S2CID205406696.
^Rossing, Maria; Albrechtsen, Anders; Skytte, Anne-Bine; Jensen, Uffe B; Ousager, Lilian B; Gerdes, Anne-Marie; Nielsen, Finn C; Hansen, Thomas vO (2016-10-13). "Genetic screening of the FLCN gene identify six novel variants and a Danish founder mutation". Journal of Human Genetics. 62 (2): 151–157. doi:10.1038/jhg.2016.118. ISSN1434-5161. PMID27734835. S2CID24558301.
^Abdelkreem, Elsayed; Akella, Radha Rama Devi; Dave, Usha; Sane, Sudhir; Otsuka, Hiroki; Sasai, Hideo; Aoyama, Yuka; Nakama, Mina; Ohnishi, Hidenori (2016-12-08), "Clinical and Mutational Characterizations of Ten Indian Patients with Beta-Ketothiolase Deficiency", JIMD Reports, 35, Springer Berlin Heidelberg: 59–65, doi:10.1007/8904_2016_26, ISBN9783662558324, PMC5585108, PMID27928777
^Yıldız Bölükbaşı, Esra; Afzal, Muhammad; Mumtaz, Sara; Ahmad, Nafees; Malik, Sajid; Tolun, Aslıhan (2017-06-21). "Progressive SCAR14 with unclear speech, developmental delay, tremor, and behavioral problems caused by a homozygous deletion of the SPTBN2 pleckstrin homology domain". American Journal of Medical Genetics Part A. 173 (9): 2494–2499. doi:10.1002/ajmg.a.38332. ISSN1552-4825. PMID28636205. S2CID5586800.
^Davoodi-Semiromi, Abdoreza; Lanyon, George W.; Davidson, Rosemary; Connor, Michael J. (2000-11-06). "Aberrant RNA splicing in the hMSH2 gene: Molecular identification of three aberrant RNA in Scottish patients with colorectal cancer in the West of Scotland". American Journal of Medical Genetics. 95 (1): 49–52. doi:10.1002/1096-8628(20001106)95:1<49::aid-ajmg10>3.0.co;2-p. ISSN1096-8628. PMID11074494.
^van den Hurk, José A. J. M.; van de Pol, Dorien J. R.; Wissinger, Bernd; van Driel, Marc A.; Hoefsloot, Lies H.; de Wijs, Ilse J.; van den Born, L. Ingeborgh; Heckenlively, John R.; Brunner, Han G. (2003-06-25). "Novel types of mutation in the choroideremia (CHM) gene: a full-length L1 insertion and an intronic mutation activating a cryptic exon". Human Genetics. 113 (3): 268–275. doi:10.1007/s00439-003-0970-0. ISSN0340-6717. PMID12827496. S2CID23750723.
^Infante, Joana B.; Alvelos, Maria I.; Bastos, Margarida; Carrilho, Francisco; Lemos, Manuel C. (January 2016). "Complete androgen insensitivity syndrome caused by a novel splice donor site mutation and activation of a cryptic splice donor site in the androgen receptor gene". The Journal of Steroid Biochemistry and Molecular Biology. 155 (Pt A): 63–66. doi:10.1016/j.jsbmb.2015.09.042. ISSN0960-0760. PMID26435450. S2CID33393364.
^Niba, E.; Nishuda, A.; Tran, V.; Vu, D.; Matsumoto, M.; Awano, H.; Lee, T.; Takeshima, Y.; Nishio, H. (June 2016). "Cryptic splice site activation by a splice donor site mutation of dystrophin intron 64 is determined by intronic splicing regulatory elements". Neuromuscular Disorders. 26: S96. doi:10.1016/j.nmd.2016.06.042. ISSN0960-8966. S2CID54267534.
^Qadah, Talal; Finlayson, Jill; Joly, Philippe; Ghassemifar, Reza (2013-11-25). "Molecular and Cellular Analysis of a NovelHBA2Mutation (HBA2: c.94A>G) Shows Activation of a Cryptic Splice Site and Generation of a Premature Termination Codon". Hemoglobin. 38 (1): 13–18. doi:10.3109/03630269.2013.858639. ISSN0363-0269. PMID24274170. S2CID28120011.
^ abGasparini, Fabio; Skobo, Tatjana; Benato, Francesca; Gioacchini, Giorgia; Voskoboynik, Ayelet; Carnevali, Oliana; Manni, Lucia; Valle, Luisa Dalla (2016-02-01). "Characterization of Ambra1 in asexual cycle of a non-vertebrate chordate, the colonial tunicate Botryllus schlosseri, and phylogenetic analysis of the protein group in Bilateria". Molecular Phylogenetics and Evolution. 95: 46–57. doi:10.1016/j.ympev.2015.11.001. ISSN1055-7903. PMID26611831.
^ abMichalko, Jaroslav; Renner, Tanya; Mészáros, Patrik; Socha, Peter; Moravčíková, Jana; Blehová, Alžbeta; Libantová, Jana; Polóniová, Zuzana; Matušíková, Ildikó (2016-08-31). "Molecular characterization and evolution of carnivorous sundew (Drosera rotundifolia L.) class V β-1,3-glucanase". Planta. 245 (1): 77–91. doi:10.1007/s00425-016-2592-5. ISSN0032-0935. PMID27580619. S2CID23450167.
^ abWongkantrakorn, N.; Duangsrisai, S. (2015-02-15). "The level of mRNA NAD-SDH is regulated through RNA splicing by sugars and phytohormones". Russian Journal of Plant Physiology. 62 (2): 279–282. doi:10.1134/s1021443715010161. ISSN1021-4437. S2CID5619745.
^Feng, Jiayue; Li, Jing; Liu, Hong; Gao, Qinghua; Duan, Ke; Zou, Zhirong (2012-10-03). "Isolation and Characterization of a Calcium-Dependent Protein Kinase Gene, FvCDPK1, Responsive to Abiotic Stress in Woodland Strawberry (Fragaria vesca)". Plant Molecular Biology Reporter. 31 (2): 443–456. doi:10.1007/s11105-012-0513-8. ISSN0735-9640. S2CID14378361.
^Philip, Anna; Syamaladevi, Divya P.; Chakravarthi, M.; Gopinath, K.; Subramonian, N. (2013-03-19). "5′ Regulatory region of ubiquitin 2 gene from Porteresia coarctata makes efficient promoters for transgene expression in monocots and dicots". Plant Cell Reports. 32 (8): 1199–1210. doi:10.1007/s00299-013-1416-3. ISSN0721-7714. PMID23508257. S2CID12170634.