It may be still fair to say that in the apparent present era of functional genomics, the challenge is to elucidate gene function such as that of A1BG, its likely regulatory networks and signaling pathways.[1] "Since regulation of gene expression in vivo mainly occurs at the transcriptional level, identifying the location of genetic regulatory elements is a key to understanding the machinery regulating gene transcription. A major goal of current genome research is to identify the locations of all gene regulatory elements, including promoters, enhancers, silencers, insulators and boundary elements, and to analyze their relationship to the current annotation of human genes."[2][3] Although "many genome-wide strategies have been developed for identifying functional elements", "no method yet has the resolution to precisely identify all regulatory elements or can be readily applied to the entire human genome."[4]
"The experimental evidence demonstrates that genome binding specificity is achieved through the interplay of at least three factors: DNA sequence; DNA shape; and occlusion by chromatin."[5]
There is one CRISPRi-validated cis-regulatory element on 19q13.43: Gene ID: 116286197 LOC116286197. And, four Sharpr-MPRA regulatory regions: (1) Gene ID: 112553117 LOC112553117 Sharpr-MPRA regulatory region 1998, Gene ID: 112553119 LOC112553119 Sharpr-MPRA regulatory region 10473, Gene ID: 112577453 LOC112577453 Sharpr-MPRA regulatory region 7872, and Gene ID: 112577454 is Sharpr-MPRA regulatory region 9894.
Def. nucleotide "sequences, usually upstream, which are recognized by specific regulatory transcription factors, thereby causing gene response to various regulatory agents", [that] "may be found in both promoter and enhancer regions"[6] are called response elements.
Some bZIP proteins, "including LIP19, OsZIP-2a, and OsZIP-2b, do not bind to DNA sequences. Instead, these bZIP proteins form heterodimers with other bZIPs to regulate transcriptional activities (Nantel and Quatrano, 1996; Shimizu et al., 2005)."[7]
"This genomic region represents a DNase I hypersensitive site (DHS) that was predicted to be an enhancer by the ENCODE (ENCyclopedia Of DNA Elements) project based on various combinations of H3K27 acetylation and binding of p300, GATA1 and RNA polymerase II in K562 erythroleukemia cells. It was validated as a high-confidence cis-regulatory element for the ZNF582 (zinc finger protein 582) gene on chromosome 19 based on multiplex CRISPR/Cas9-mediated perturbation in K562 cells."[8]
Gene ID: 116286197 CRISPRi-validated cis-regulatory element chr19.6329 is at NC_000019.10 (56186901..56187499).[8]
Gene ID: 147948 ZNF582 is at NC_000019.10 (56382751..56393585, complement).[9] The CRISPRi-validated cis-regulatory element chr19.6329 is (56382751 - 56186901) = 195850 nts from the beginning of ZNF582.
"This genomic sequence was predicted to be a transcriptional regulatory region based on chromatin state analysis from the ENCODE (ENCyclopedia Of DNA Elements) project. It was validated as a functional enhancer by the Sharpr-MPRA technique (Systematic high-resolution activation and repression profiling with reporter tiling using massively parallel reporter assays) in K562 erythroleukemia cells (group: K562 Activating DNase unmatched - State 1:Tss, active promoter, TSS/CpG island region), with weaker activation in HepG2 liver carcinoma cells (group: HepG2 Activating DNase matched - State 1:Tss)."[10]
"This genomic sequence was predicted to be a transcriptional regulatory region based on chromatin state analysis from the ENCODE (ENCyclopedia Of DNA Elements) project. It was validated as a functional enhancer by the Sharpr-MPRA technique (Systematic high-resolution activation and repression profiling with reporter tiling using massively parallel reporter assays) in HepG2 liver carcinoma cells (group: HepG2 Activating DNase matched - State 5:Enh, candidate strong enhancer, open chromatin). It also displayed weak repressive activity by Sharpr-MPRA in K562 erythroleukemia cells (group: K562 Repressive non-DNase unmatched - State 24:Quies, heterochromatin/dead zone)."[11]
"This genomic sequence was predicted to be a transcriptional regulatory region based on chromatin state analysis from the ENCODE (ENCyclopedia Of DNA Elements) project. It was validated as a functional enhancer by the Sharpr-MPRA technique (Systematic high-resolution activation and repression profiling with reporter tiling using massively parallel reporter assays) in both HepG2 liver carcinoma cells (group: HepG2 Activating DNase unmatched - State 1:Tss, active promoter, TSS/CpG island region) and K562 erythroleukemia cells (group: K562 Activating DNase unmatched - State 1:Tss)."[12]
"This genomic sequence was predicted to be a transcriptional regulatory region based on chromatin state analysis from the ENCODE (ENCyclopedia Of DNA Elements) project. It was validated as a functional enhancer by the Sharpr-MPRA technique (Systematic high-resolution activation and repression profiling with reporter tiling using massively parallel reporter assays) in K562 erythroleukemia cells (group: K562 Activating DNase unmatched - State 1:Tss, active promoter, TSS/CpG island region), with weaker activation in HepG2 liver carcinoma cells (group: HepG2 Activating DNase matched - State 1:Tss)."[13]
"The growth hormone-regulated transcription factors STAT5 and BCL6 coordinately regulate sex differences in mouse liver, primarily through effects in male liver, where male-biased genes are upregulated and many female-biased genes are actively repressed."[14] "CUX2, a highly female-specific liver transcription factor, contributes to an analogous regulatory network in female liver. Adenoviral overexpression of CUX2 in male liver induced 36% of female-biased genes and repressed 35% of male-biased genes. In female liver, CUX2 small interfering RNA (siRNA) preferentially induced genes repressed by adenovirus expressing CUX2 (adeno-CUX2) in male liver, and it preferentially repressed genes induced by adeno-CUX2 in male liver. CUX2 binding in female liver chromatin was enriched at sites of male-biased DNase hypersensitivity and at genomic regions showing male-enriched STAT5 binding. CUX2 binding was also enriched near genes repressed by adeno-CUX2 in male liver or induced by CUX2 siRNA in female liver but not at genes induced by adeno-CUX2, indicating that CUX2 binding is preferentially associated with gene repression. Nevertheless, direct CUX2 binding was seen at several highly female-specific genes that were positively regulated by CUX2, including A1bg [A1BG in humans], Cyp2b9, Cyp3a44, Tox [TOX in humans], and Trim24 [TRIM24 in humans]."[14]
"Gene list comparisons were performed using the compare class utility provided by the Regulatory Sequence Analysis Tools (34). Comparisons were made with previous 3-AT and rapamycin data sets (5, 14) and with several predefined gene lists such as genes induced by promoters bound in chromatin immunoprecipitation (ChIP-chip) experiments (35), genes in the MIPS functional catalogue (36), and gene ontology categories (37) as described by Godard et al. (38). The significance of overlap between gene lists was quantitatively determined by the hypergeometric distribution (39), using the number of probe sets on the S98 array as the population size, or by calculating the representation factor (40) using the web utility Microarray Analysis Tools. Upstream noncoding regulatory sequences were retrieved and analyzed using Regulatory Sequence Analysis Tools (34). The program DNA-Pattern was used to search for and catalogue occurrences of consensus GCRE (TGABTVW) and GATA (GATAAG, GATAAH, GATTA) motifs in yeast promoters. The program oligo-analysis (41) was used to search the promoter regions of co-regulated genes for overrepresented sequence motifs. Analysis of the 5-noncoding regions of the GCN4-dependent activation core identified the consensus [general control responsive element] GCRE motif (TGABTVW)."[15]
"The ABA responsive element (ABRE) is a key cis‐regulatory element in ABA signalling. However, its consensus sequence (ACGTG(G/T)C) is present in the promoters of only about 40% of ABA‐induced genes in rice aleurone cells, suggesting other ABREs may exist."[16]
"Many ABA‐inducible genes in various species contain a conserved cis‐regulatory ABA responsive element (ABRE) with the consensus sequence ACGTG(G/T)C (Hattori et al. 2002; Shen et al. 2004)."[16]
Specific "sequences considered as exact Abf1 motif occurrences": CGTNNNNNACGA(C/T), CGTNNNNNA(C/T)GAC, CGTNNNNNA(C/T)GA(C/T), CGTNNNNN(A/G)(C/T)GA(C/T).[5]
"Most bZIP proteins show high binding affinity for the ACGT motifs, which include CACGTG (G box), GACGTC (C box), TACGTA (A box), AACGTT (T box), and a GCN4 motif, namely TGA(G/C)TCA (Landschulz et al., 1988;[17] Nijhawan et al., 2008[18])."[7]
"The human TGF-β1 promoter region contains two binding sequences for AP-1, designated AP-1 box A (TGACTCT) and box B (TGTCTCA), which mediate the up-regulation of promoter activity after [High glucose] HG stimulation."[19]
Abscisic acid-responsive elements (CACGTG).[20]
"The [palindromic E-box motif (CACGTG)] motif is bound by the transcription factor Pho4, [and has the] class of basic helix-loop-helix DNA binding domain and core recognition sequence (Zhou and O'Shea 2011)."[5]
The Pho4 homodimer binds to DNA sequences containing the bHLH binding site 5'-CACGTG-3'.[21]
The upstream activating sequence (UAS) for Pho4p is 5'-CAC(A/G)T(T/G)-3' in the promoters of HIS4 and PHO5 regarding phosphate limitation with respect to regulation of the purine and histidine biosynthesis pathways [66].[22]
The "3' end of mature hTR (45) has an ACA trinucleotide 3 nt upstream of its 3' end. In addition, the 3' region of hTR contains a single H box consensus sequence (5'-AGAGGA-3')."[23]
The "binding affinities of both bZIP proteins were similar to CREA/T (ATGACGTCAT), a CRE sequence with flanking adenine and thymine (A/T) at positions -4 and +4. [The] bZIP domains of both STF1 and HY5 have similar binding properties for recognizing ACGT-containing elements (ACEs). [Although] the G-box is a known target site for the HY5 protein, the C-box sequences are the preferred binding sites for both STF1 and HY5."[24]
"AP-2 proteins can bind to G/C-rich elements, such as 5’-[G/C]CCN(3,4)GG[G/C]-3’ (41, 42)."[25]
Consensus sequences for the Activating protein 2 (AP-2) are GCCTGGCC.[26]
"The ATF4 binding consensus sequence has been reported as (G/A/C)TT(G/A/T)C(G/A)TCA (38), which matches the ChIP-seq data."[27]
"The 3′UTRs were searched for the 13-bp pattern WWWUAUUUAUWW with mismatch=−1 which was computationally derived as previously described ( 2 ). The pattern was further statistically validated against larger sets of mRNA data (10 872 mRNA with 3′UTR; GenBank 119) showing occurrence of the motif in 6.8% of human mRNA."[28]
"3′ untranslated regions play an important role in regulating mRNA fate by complexing with RNA binding proteins that help control mRNA localization, translation, and stability [1, 2, 3]. Identification of a consensus UUAUUUAU sequence in the 3′ UTRs of human and mouse mRNAs encoding tumor necrosis factor (TNF-α) and a variety of other inflammatory mediators led to the suggestion that these AU-rich elements AREs) could be important for regulating gene expression [4]."[29]
The upstream activating sequence (UAS) for Adr1p is 5'-TTGGGG-3' or 5'-TTGG(A/G)G-3'.[22]
The upstream activating sequence (UAS) for Aft1p is 5'-PyPuCACCCPu-3' or 5'-(C/T)(A/G)CACCC(A/G).[22]
"The GCC box, also referred to as the AGC box (10), GCC element (11), or AGCCGCC sequence (13), is an ethylene-responsive element found in the promoters of a large number of [pathogenesis related] PR genes whose expression is up-regulated following pathogen attack."[30]
"The androgen response element sequence, 5'-GGTACACGGTGTTCT-3', was obtained from the National Center of Biotechnology Information (NCBI)."[31]
5'-TGGAGAACAGCCTGTTCTCCA-3' or 5'-AGAACAGCCTGTTCT-3'[32] "Using the identified AREs within our experiment a refined extended canonical ARE model is proposed and deposited in transcription factor databases [...]."[32]
The consensus sequence is 5'-A/C-T-C/T-3'.[33] The core nucleotides for AGCE1 include 5'-A/C-T-C/T-G-T-G-3', "located between the TATA box and transcription initiation site (positions −25 to −1) is an authentic regulator of human AG transcription."[34]
5'-GC(A/C/T)(A/G/T)(A/G/T)(C/G/T)T(A/C)A-3' is the consensus sequence of a functional antioxidant response element at the HIF1A locus.[35], an antioxidant response element (ARE).
"The 3' flanking area contained the highly conserved hexanucleotide sequence A-A-T-A-A-A found in eukaryotic messages between the terminator codon and the polyadenylylation site (44)."[36]
"Chen and Shyu [11] divided AREs into two classes of AUUUA-containing AREs and a third class of non-AUUUA AREs. Class I AUUUA-containing AREs had 1-3 copies of scattered AUUUA motifs coupled with a nearby U-rich region or U stretch, whereas class II AUUUA-containing AREs had at least two overlapping copies of the nonamer UUAUUUA(U/A)(U/A) in a U-rich region. Non-AUUUA AREs had a U-rich region and other unknown features, and the relationship of these sequences to AUUUA-containing AREs remains poorly understood. Subsequent studies based on analyses of a set of 4884 AUUUA-containing AREs led to a new classification based primarily on the number of overlapping AUUUA-repeats [8, 9, 10]. This classification system, with five clusters distinguished by the number of repeats, was used to identify AUUUA-containing AREs in the human genome. AREs identified using this classification were found to be abundant in 3′ UTRs of human genes."[29]
The "genome binding of two [auxin response factors] ARFs (ARF2 and ARF5/Monopteros [MP]) differ largely because these two factors have different preferred ARF binding site (ARFbs) arrangements (orientation and spacing)."[37] "ARFbs were originally defined as TGTCTC (Ulmasov et al., 1995, Guilfoyle et al., 1998), [...]. More recently, protein binding microarray (PBM) experiments suggested that TGTCGG are preferred ARFbs, [...] (Boer et al., 2014, Franco-Zorrilla et al., 2014, Liao et al., 2015)."[37]
A more general consensus sequence may be 1(C/G/T)-2N-3(G/T)-4G-5(C/T)-6(C/T)-7N-8N-9N-10N, where ARF2[b] is 1(C/G/T)-2(A/C/T)-3(G/T)-4G-5(C/T)-6(C/T)-7(G/T)-8(C/G)-9(A/C/T)-10(A/G/T) and ARF5/MP[b] is 1(C/G/T)-2N-3(G/T)-4G-5T-6C-7(G/T)-8N-9-10N.[37] ARF1[b] has 4G.[37]
While there appear to be at least two B boxes, TGGGCA is one B-box,[38] where the "mP2 EB fragment used for binding was the 118 nucleotide fragment extending from the Dde I site at position -140 to the Dde I site at position -23 [...]. This fragment contains the GC, E, B, CAAT, and TATA boxes."[38]
The other is associated with the human transforming growth factor b1 binding sequences.[39]
And, has the consensus sequence 5'-TGTCTCA-3'.
The factor II B recognition element is BREu.
"The transcription factor II B recognition elements BREu "CGACGCA" and BREd "ATGGTTG" were upstream (− 279 to − 273 of the transcript) and downstream (− 165 to − 159 of the transcript) of the TATA box, respectively."[40]
The general consensus sequence using degenerate nucleotides is 5’-SSRCGCC-3’, where S = G or C and R = A or G.[41]
The consensus sequence is 5’-G/C G/C G/A C G C C-3’.[42]
"Altogether, the specific contacts observed suggest a consensus binding motif of 5′-T-T-A-x-x-x-x-T-3′."[43] "Dimerization of [cadaverine C-terminal] CadC enables the binding of two DBDs to the two Cad1 consensus target sites."[43] "The DNA consensus sequence 5′-T-T-A-x-x-x-x-T-3′ is present once in the quasi-palindromic Cad1 17-mer DNA, consistent with the formation of a 1:1 complex. However, a second consensus facilitates the formation of the 2:1 complex of CadC with Cad1 41-mer DNA as evidenced by the CadC model with the minimal Cad1 26-mer DNA that spans the two AT-rich regions, i.e. consensus sites."[43]
The upstream activating sequence for the calcineurin-responsive transcription factor (Crz1p) is 5'-TG(A/C)GCCNC-3'.[22]
"The putative ChREBP binding sites [are] ChoRE1 (CACGTGACCGGATCTTG, -324 to -308) and ChoRE2 (TCCGCCCCCATCACGTG, -298 to - 282) [...], where the 5-nt spacer [Carb and Carb1 is] between the two E-boxes in ChoRE motifs [CarbE1, CarbE2 and CarbE3]."[44]
"GARE and a novel CARE (CAACTC regulatory elements) elements are present in the promoter of rice RAmy1A (Ueguchi-Tanaka et al. 2000; Sutoh and Yamauchi 2003)."[45]
"RIN [Ripening Inhibitor] binds to DNA sequences known as the CA/T-rich-G (CArG) box, which is the general target of MADS box proteins (Ito et al., 2008)."[46]
"MADS-box proteins bind to a consensus sequence, the CArG box, that has the core motif CC(A/T)6GG (15)."[47]
"Of the [Flowering Locus C] FLC binding sites, 69% contained at least one CArG-box motif with the core consensus sequence CCAAAAAT(G/A)G and an AAA extension at the 3′ end [...]."[47]
Three "other MADS-box flowering-time regulators, SOC1, SVP, and AGAMOUS-LIKE 24 (AGL24), bind to two different CArG-box motifs at 502 bp (CTAAATATGG) and 287 bp (CAATAATTGG) upstream of the translation start in the SEP3 gene (24), consistent with different specificities for the different MADS-box proteins."[47] These together with the core motif CC(A/T)6GG (15) suggest a more general CArG-box motif of (C(C/A/T)(A/T)6(A/G)G).
The upstream activating sequence (UAS) for Cat8p is 5'-CGGNBNVMHGGA-3', where N = A, C, G, T, B = C, G, T, V = A, C, G, M = A, C, and H = A, C, T; i.e. 5'-CGG(A/C/G/T)(C/G/T)(A/C/G/T)(A/C/G)(A/C)(A/C/T)GGA-3'.[22]
"The M-CAT consensus sequence [is] CATTCCT".[48]
"A [chloramphenicol acetyltransferase] CAT-box-like element, GCCATT [34], adjacent to the GC-box, is conserved in the three promoters."[48]
"Most bZIP proteins show high binding affinity for the ACGT motifs, which include [...] GACGTC (C-box) [...]."[7]
Analysis "of the recombinant (soybean [Glycine max] TGACG-motif binding factor 1) STF1 protein revealed the C-box (nGACGTCn) to be a high-affinity binding site (Cheong et al., 1998). [...] To test whether STF1 and HY5 have similar DNA-binding properties, the binding properties of each were compared with eight different DNA sequences that represent G-, C-, and C/G-box motifs [TGACGTGT]. C-box sequences carrying the mammalian cAMP responsive element (CRE; TGACGTCA) motif and the Hex sequence (TGACGTGGC), a hybrid C/G-box (Cheong et al., 1998), were high-affinity binding sites for both proteins [...]."[24]
The human ribosomal protein L11 gene (HRPL11) has [...] two potential snRNA-coding sequences in intron 4: the C box beginning at +4131 (GGTGATG), [...] a D box beginning at +4237 (TCCTG), [...].[49]
"Members of the box C/D snoRNA family, which are the subject of the present report, possess characteristic sequence elements known as box C (UGAUGA) and box D (GUCUGA)."[50]
Substituting T for U yields C box = 5'-AGTAGT-3' in the translation direction on the template strand.
The upstream activating sequence (UAS) for the Hap4p is 5'-CCAAT-3'.[22]
"The 5' non-coding part contains the sequence elements characteristic for eukaryotic promoters such as TATA and CAAT boxes as well as an inverted motif typical for cell-cycle regulated genes named "cell-cycle box" (CCB). The consensus sequence of CCB is CACGAAAA (Nasmyth, 1985), however, more relaxed variants such as CACGAAA, ACGAAA and C-CGAAA were described in budding yeast CLN1 and CLN2 (Ogas et al, 1991)."[51]
"The minimum DNA-binding elements are 6-bp CGCG box, (A/C/G)CGCG(C/G/T)."[41]
"The circadian control element (circadian; Anderson et al., 1994) was found in 10 FvTCP genes."[52]
Circadian control elements (CAANNNNATC).[20]
The "Class C" DNA binding site at position -379/-374 in a reverse (-) orientation with a consensus sequence of CACGNG of the bHLH Hey-1 protein had a strong DNA binding activity.[53]
A "putative cold-responsive element (CRE) [...] is specified by a conserved 5-bp core sequence (CCGAC) typical for C-repeat (CRT)/dehydration-responsive elements (DRE) that are recognized by cold-specific transcription factors (TFs) [16]."[54]
"Chlamydomonas reinhardtii activates the transcription of the Cyc6 and the Cpx1 genes (encoding cytochrome c6 and coprogen oxidase) in response to copper deficiency."[55]
"A consensus copper-response element [CuRE] TTTGC(T/G)C(A/G) (12) is a binding site for Mac1p."[55]
"An additional EMSA result demonstrated that [Aspergillus fumigatus (Af)] AfMac1 directly binds to a copper response element in the promoter regions of the ctrA2 and ctrC genes with a defined consensus DNA motif (5′-TGTGCTCA-3′) (Park et al., 2017[56]), which is strikingly similar to the Mac1-binding motif in S. cerevisiae (Jamison McDaniels et al., 1999; Keller et al., 2000), suggesting that the mechanism of Mac1-mediated copper homeostasis may be conserved across fungal species."[24]
"In barley, the combination of an ABRE and one of two known coupling elements CE1 (TGCCACCGG) and CE3 (GCGTGTC) constitutes an ABA responsive complex (ABRC) in the regulation of the ABA‐inducible genes HVA1 and HVA22 (Shen and Ho 1995; Shen et al. 1996)."[16]
"In Arabidopsis, the CE3 element is practically absent; thus, Arabidopsis relies on paired ABREs to form ABRCs (Gomez‐Porras et al. 2007) or on the coupling of a DRE (TACCGACAT) with ABRE (Narusaka et al. 2003; Nakashima et al. 2006)."[16]
"To identify potential cis-regulatory elements in the promoter sequences of ZmGRXCC genes, the 1500 bp sequences of each [maize CC-type glutaredoxin (GRX)] ZmGRXCC gene upstream of the ATG start codon were selected from the maize genome as the promoter, and the promoter sequence was screened using PlantCARE [32]. The elements searched included [...] CE3 (coupling element 3, -CACGCG-) for ABA responsiveness [...]."[57]
"Within the cAMP-responsive element of the somatostatin gene, we observed an 8-base palindrome, 5'-TGACGTCA-3', which is highly conserved in many other genes whose expression is regulated by cAMP."[58]
The upstream activating sequence (UAS) for the Aca1p, the basic "leucine zipper (bZIP) transcription factor [55] involved in carbon source utilization" is 5'-TGACGTCA-3'[22] the same as a CRE.
The upstream activating sequence (UAS) for the Sko1p, involved "in osmotic and oxidative stress responses" is 5'-TGACGTCA-3'[22] the same as a CRE.
Root specific elements are the same as CREs.
"Cytokinin fulfills its diverse roles in planta through a series of transcriptional responses."[59]
"Cytokinin employs a two-component multi-step phosphorelay for its perception and signaling transduction12–14. In Arabidopsis, there are three cytokinin receptors (ARABIDOPSIS HISTIDINE KINASEs; AHK2, 3, 4) and eleven type-B response regulators (ARABIDOSPIS RESPONSE REGULATORs; B-ARRs)8,15."[59]
"The cytokinin transcriptional response centrally affects the family of ARRs. Type-B ARRs (B-ARRs) are transcription factors (TFs) with a GARP-like DNA binding domain at their C-termini and a receiver domain at their N-termini. Type-A ARRs (A-ARRs) are similar to the N-termini receiver domain of B-ARRs but do not possess a DNA binding domain."[59]
"The most well-understood mechanism for controlling cytoplasmic polyadenylation is regulation of mRNAs containing the cytoplasmic polyadenylation element (CPE; consensus UUUUUAU) by CPE-binding protein (CPEB)1."[60]
"Cytoplasmic polyadenylation is determined by the cytoplasmic polyadenylation element (CPE; consensus sequence UUUUUAU) that resides in mRNA 3′ untranslated regions (UTRs)."[61]
"Most paralogous FOX proteins bind to the canonical DNA response element 5′-RYAAAYA-3′ (R = A or G, Y = C or T)11–13."[62]
There is one D box 5'-AGTCTG-3'.[50]
The human ribosomal protein L11 gene (HRPL11) has two potential snRNA-coding sequences in intron 4: a D box beginning at +4237 (TCCTG).[49]
A D-box is (TGAGTGG).[63]
"A consensus sequence, 5'-TAGCCGCCGRRRR-3' (where R = an unspecified purine nucleoside [A/G],was generated from these data."[64]
"The extent of homology for the entire 13 bp ranged from 56 to 100%. However, for the symmetrical core sequence CCGCC 75 to 100% homology was observed with only conservative substitutions occurring in the nonhomologous positions."[64]
The downstream B recognition element [(A/G)T(A/G/T)(G/T)(G/T)(G/T)(G/T)] designated as the BREd,[47] or dBRE, is an additional core promoter element that occurs downstream of the TATA box and is recognized by general transcription factor II B.[47]
A core promoter that contains all three subelements of the downstream core element [DCE] may be much less common than one containing only one or two.[65] "SI resides approximately from +6 to +11, SII from +16 to +21, and SIII from +30 to +34."[65]
The consensus sequence for the DCE is CTTC...CTGT...AGC.[65] These three consensus elements are referred to as subelements: "SI is CTTC, SII is CTGT, and SIII is AGC."[65]
The early DPE consensus sequence was RGWCGTG.[66][67]
The DPE consensus sequence is the more general sequence RGWYVT, or (A/G)G(A/T)(C/T)(A/C/G)T.[68]
The DPE in "the ATP‐binding cassette subfamily G member 2 gene in the marine pufferfish Takifugu rubripes" is 5'-AGTCTC-3'.[40]
"The most dramatic impact on immunoglobulin gene enhancer activity was observed upon mutation of sites that contain an E2-box motif (G/ACAGNTGN)."[69]
"We scanned the ORE1 promoter and found a putative EIN3 binding site (EBS), ATGAACCT, located 1056~1064 bp upstream from the start codon (ATG) of the gene [...]."[70]
"EIN3/EIL1 transcription factors were reported to bind to a consensus DNA sequence of A[CT]G[AT]A[CT]CT [34,35]."[70]
"The released aminoterminal of ATF6 (ATF6-N) then migrates to the nucleus and binds to the ER stress response element (ERSE) containing the consensus sequence CCAAT-N9-CCACG to activate genes encoding ER chaperones, ERAD components, and XBP1 (Chen et al., 2010; Yamamoto et al., 2004; Yoshida et al., 2001)."[71]
Endosperm expression (TGTGTCA).[20]
The consensus sequence for the E-box element is CANNTG, with a palindromic canonical sequence of CACGTG.[72]
Ethylene responsive elements (ATTTCAAA).[20]
"Most paralogous FOX proteins bind to the canonical DNA response element 5′-RYAAAYA-3′ (R = A or G, Y = C or T)11–13."[62]
"[A] short sequence [in TSE1 contains] a GAACT motif that [binds] a tendon-specific nuclear protein."[73]
"Although this GARC [GA responsive complex] may not always be tripartite, most often it includes three sequence motifs, the TAACAAA box or GA responsive element (GARE), the pyrimidine box CCTTTT, and the TATCCAC box (Skriver et al., 1991;Gubler and Jacobsen, 1992; Rogers et al., 1994)."[74]
Several GA-responsive cis-acting elements (GARE) and GARE-like elements (TAACAA/GA, or TAACGTA) have been identified in the promoters of hydrolase genes expressed in the aleurone (Ueguchi-Tanaka et al. 2000; Sutoh and Yamauchi 2003; Washio 2003), expansin genes expressed in internodes (Lee et al. 2001), and many GAMYB-regulated genes expressed in anthers (Tsuji et al. 2006)."[45]
GTGA-box has the consensus sequence GATA.[75]
"A GC box sequence, one of the most common regulatory DNA elements of eukaryotic genes, is recognized by the Spl transcription factor; its consensus sequence is represented as 5'-G/T G/A GGCG G/T G/A G/A C/T-3' [or 5′-KRGGCGKRRY-3′] (Briggs et al., 1986)."[76]
"The GCC box, also referred to as the AGC box (10), GCC element (11), or AGCCGCC sequence (13), is an ethylene-responsive element found in the promoters of a large number of [pathogenesis related] PR genes whose expression is up-regulated following pathogen attack."[30]
"The program DNA-Pattern was used to search for and catalogue occurrences of consensus GCRE (TGABTVW) [TGA(C/G/T)T(A/C/G)(A/T)] and GATA (GATAAG, GATAAH, GATTA) motifs in yeast promoters."[15]
"The predicted Gln3p and Gcn4p binding sites in the UGA3 promoter are [...] the consensus Gln3p (GATA) and Gcn4p (GCRE) [TGAGTCA] binding sites present in the minimal UGA3 promoter at -206 and -112, respectively, [...]."[15]
"The transcription factors Uga3, Dal81 and Leu3 belong to the class III family (Zn(II)2Cys6 proteins), and they recognize highly related sequences rich in GGC triplets [15]."[77]
"MEME analysis identified phylogenetically conserved CCGN4CGG motifs in promoters of several [branched-chain amino acid] BCAA biosynthetic genes"[78]
Gibberellin responsive elements (CCTTTTG, AAACAGA).[20]
"Computer analysis of the nt −653 to nt −483 region identified two sites that resemble the [γ-interferon activated sequence] GAS consensus sequence, TTNCNNNAA (19). Similar GAS-like sites have been shown to mediate the effects of various cytokines, including [growth hormone] GH, on the transcription of other genes (19, 20). The first site, TTCCTAGAA (ALS-GAS1), is located between nt −633 and nt −625; the second site, TTAGACAAA (ALS-GAS2), is located between nt −553 and nt −545."[79]
"DNA-binding by the GR-DBD has been well-characterized; it is highly sequence-specific, directly recognizing invariant guanine nucleotides of two AGAACA [TGTTCT] half sites called the glucocorticoid response element (GRE), and binds as a dimer in head-to-head orientation with mid-nanomolar affinity (4,12–18). [...] The consensus DNA glucocorticoid response element (GRE) is comprised of two half-sites (AGAACA) separated by a three base-pair spacer (13,15,60,61)."[80]
The upstream activating sequence (UAS) for Gcr1p is 5'-CTTCC-3' for the transcriptional activator involved in the regulation of glycolysis [77].[22]
"Comparison of the sequence of the newly cloned mouse MMP-9 promoter region with our previous human isolate revealed that [...] four units of GGGG(T/A)GGGG sequence (GT box) were conserved between the two species."[81]
"The similar UPRE-1 is also found in the promoter region of the P. pastoris KAR2 (CAGCGTG), INO1 (CAACTTG) and HAC1 (CAACTTG) genes [15]. The presence of an HAC1 UPRE implies that Hac1p can up-regulate its own transcription. Unconventional splicing of HAC1 mRNA after ER stress signaling generates the active form of basic leucine zipper (bZIP) transcription factor Hac1p, which binds to the UPRE [16]."[82]
"The box H/ACA snoRNAs [...] have the consensus H box sequence (5'-ANANNA-3') but have no other primary sequence identity."[23]
The "3' end of mature [human telomerase] hTR (45) has an ACA trinucleotide 3 nt upstream of its 3' end. In addition, the 3' region of hTR contains a single H box consensus sequence (5'-AGAGGA-3')."[23]
"Comparison with the murine telomerase RNA (mTR) (7) suggests that the snoRNA-like features of hTR are evolutionarily conserved. The mTR 3' end [...] includes consensus H (5'-ACAGGA-3') and ACA box sequences."[23]
An H box has a consensus sequence of 3'-ACACCA-5'.[83]
"In humans, telomerase is composed of a reverse transcriptase (hTERT), which uses the RNA component (hTERC) to dock onto the 3′ single-stranded telomere end. hTERT may then processively synthesise telomeric repeats from the template provided by hTERC, before dissociating7–9. All telomerase RNAs possess a 3′ end element necessary for its stability10. In hTERC, this is two stem-loop structures separated by an H-box (ANANNA) and ACA motif (H/ACA). The binding of telomerase factors dyskerin, NOP10, and NHP2 at the H/ACA motif form the so-called ‘pre-ribonucleoprotein complex’, before GAR1 binds in transition to the mature RNP11,12. hTERC then binds to chaperone TCAB1, which assists its trafficking to the Cajal bodies where the functional telomerase complex localises13. Recruitment to the telomeres in S-phase is mediated by the protective complex shelterin14,15. Correct assembly of the telomerase complex, with appropriate co-factors for maturation, stability, and subcellular localisation, is necessary for its function and thus telomere maintenance."[84]
"The KAP-2 protein [...] binds to the H-box (CCTACC) element in the bean CHS15 chalcone synthase promoter".[85]
"Two distinct sequence elements, the H-box (consensus CCTACC(N)7CT) and the G-box (CACGTG), are required for stimulation of the chs15 promoter by 4-CA."[86]
"In response to elevated temperatures, cells from many organisms rapidly transcribe a number of mRNAs. In Saccharomyces cerevisiae, this protective response involves two regulatory systems: the heat shock transcription factor (Hsf1) and the Msn2 and Msn4 (Msn2/4) transcription factors."[87]
"Yeast Hsf1 is an essential protein that binds to inverted repeats of nGAAn called heat shock elements (HSEs) within the promoters of many HSPs and activates their transcription."[87]
The Hex sequence has the consensus (TGACGTGGC).[24]
"Most HMG box proteins contain two or more HMG boxes and appear to bind DNA in a relatively sequence-aspecific manner (5, 13, 15, 16 and references therein). [...] they all appear to bind to the minor groove of the A/T A/T C A A A G-motif (10, 14, 18-20)."[88]
Gene ID: 6927 is HNF1A HNF1 homeobox A aka TCF1 on 12q24.31: "The protein encoded by this gene is a transcription factor required for the expression of several liver-specific genes. The encoded protein functions as a homodimer and binds to the inverted palindrome 5'-GTTAATNATTAAC-3'. Defects in this gene are a cause of maturity onset diabetes of the young type 3 (MODY3) and also can result in the appearance of hepatic adenomas. Alternative splicing results in multiple transcript variants encoding different isoforms."[89]
"Canonical Wnt signaling results in the accumulation and binding of β-catenin to DNA-binding partner TCF1."[90] TCF-1 binding site is CCTTTGA.[90]
"HNF3 can bind to the site in the absence of HNF6 (Lahuna et al. 1997)."[91]
"Transcription factors Pax-4 and Pax-6 are known to be key regulators of pancreatic cell differentiation and development. [...] The gene-targeting experiments revealed that Pax-4 and Pax-6 cannot substitute for each other in tissue with overlapping expression of both genes. [The] DNA-binding specificities of Pax-4 and Pax-6 are similar. The Pax-4 homeodomain [HD] was shown to preferentially dimerize on DNA sequences consisting of an inverted TAAT motif, separated by 4-nucleotide spacing."[92]
The "crucial difference between the binding sites of Antennapedia class and TTF-1 HDs is in the motifs 5'-TAAT-3', recognized by Antennapedia [a Hox gene, a subset of homeobox genes, first discovered in Drosophila which controls the formation of legs during development], and 5'-CAAG-3', preferentially bound by TTF-1. [The] binding of wild type and mutants TTF-1 HD to oligonucleotides containing either 5'-TAAT-3' or 5'-CAAG-3' indicate that only in the presence of the latter motif the Gln50 in TTF-1 HD is utilized for DNA recognition."[93]
The upstream activating sequence (UAS) for the Hsf1p is NGAAN.[22]
"Yeast Hsf1 is an essential protein that binds to inverted repeats of nGAAn called heat shock elements (HSEs) within the promoters of many HSPs and activates their transcription."[87]
"Putative EPO promoter HREs. Location of two conserved potential promoter HREs (pHRE1 and pHRE2; [CACGC]) close to the GATA ([GATA]) and [Wilms tumor gene] WT1 ([GCCTCTCCCCCACCCCCACCCGCGCACGCAC]) sites in the EPO proximal 5' region. [For human and other vertebrates] is a UCSC Genome Browser output (version hg19), including 161 transcription factor ChIP-sequencing (ChIP-seq) tracks derived from the ENCODE database (version 3), clusters of DNaseI hypersensitivity sites (HSS) from 125 cell types, and the transcriptional start site (TSS), with a closer view of the region in 50 vertebrates extracted using the 100-MULTIZ whole-genome multiple sequence alignment algorithm."[94]
"Deletion analysis by a series of 5′-deletion constructs identified the responsive region to RUNX-2 as being between −81 bp and −76 bp, containing a putative RUNX-2 binding sequence (TGAGGG), which is similar to that identified in the promoter region of human interleukin-3 (TGTGGG) (33)."[95]
"Deletion, mutagenesis, and tandem repeat analyses identified the core responsive element as the region between −89 and −60 bp (termed the hypertrophy box [HY box]), which showed specific binding to RUNX‐2."[95]
The Inr has the consensus sequence YYANWYY.[96]
"Kadonaga and colleagues (Vo ngoc et al. 2017) devised and implemented a novel multistep approach that combines experimental and computational methods to reinvestigate the human Inr consensus sequence. First, they generated two 5′-GRO-seq (5′ end-selected global run-on followed by sequencing) libraries with human MCF-7 cells to identify the 5′ ends of nascent capped transcripts. Second, they developed a peak-calling algorithm named FocusTSS to find transcripts in the 5′-GRO-seq data sets that were initiated at a focused position on the genome, hence identifying clear TSSs to enable analysis of Inr sequences. FocusTSS identified 7678 TSSs that were in both data sets. Third, to identify sequence motifs enriched among the focused TSSs, they used the HOMER motif discovery tool (Heinz et al. 2010), which yielded an Inr-like consensus sequence of BBCABW from −3 to +3 (where, B = C/G/T, W = A/T, and +1 is [A]). Forty percent of the focused TSSs contained a perfect match to the BBCABW consensus Inr."[97]
Consensus sequence for an Inr-like/TCT is TTCTCT.[40]
"This ICRE (consensus sequence TYTTCACATGY) contains the core sequence CANNTG, which is also known as an E box and which serves as a recognition site for DNA-binding proteins of the basic helix-loop-helix (bHLH) family (3). Members of the bHLH family comprise determinants of cellular differentiation and proliferation in mammalian and invertebrate systems such as the myogenic transcription factors MyoD, MRF4, myogenin and Myf-5(4) as well as factors not restricted to specialized tisues (E12, E47, daughterless, c-Myc and Mad; 5-7). Proteins of the bHLH group may form either homodimers or heterodimers or both, dependent on the individual structure of the respective interaction surface provided by the HLH domain(8)."[98]
"The UAS INO is thus also referred to as the inositol/choline-responsive element (ICRE). The UAS INO contains the consensus sequence CATGTGAAAT, which includes the canonical basic helix-loop-helix (bHLH) binding site CANNTG (Lopes et al. 1991)."[99]
"All IN01 fusion constructs that retained regulation in response to the phospholipid precursors inositol and choline, contained at least one copy of a nine bp repeated element (consensus, 5'-ATGTGAAAT-3')."[100]
Consensus sequence for IRF-3 is 5'-GCTTTCC-3'.[44]
There "are totally 11 members (from IRF-1 to IRF-11) identified from vertebrates [10]. All the IRF members share a well-conserved N-terminal helix-turn-helix IRF superfamily domain (also called DNA-binding domain, DBD) with five conserved tryptophan (Trp) residues, which could recognize DNA sequences containing 5’-GAAA-3’ tetranucleotide, such as the IFN-stimulated response element (ISREs, GAAANNGAAA) [11, 12]. Moreover, it was reported that the IRS consensus (-78/-66, AANNGAAA), which existed in the promoter region of IFN-β, could be bound by the IRF family members [13]. As for the C-terminus, most of the IRFs share an IRF-3 superfamily domain, which was also named as IRF associated domain 1 (IAD1). IRF-1 and IRF-2 do not possess conserved IAD1 domain, but they contain non-conserved activation domain (the last 100 amino acids of IRF-1 were rich in tyrosine) or repression domain (the final 25 amino acids of IRF-2 were rich in histidine, arginine and lysine) in their C-terminus, respectively."[101]
Jasmonic acid-responsive elements (TGACG, CGTCA).[20]
"Krüppel-like factor 1 (KLF1/EKLF) is a transcription factor that globally activates genes involved in erythroid cell development. [...] KLF1 belongs to the KLF family of transcription factors that binds the G-rich strand of so-called CACCC-box motifs located in regulatory regions of numerous erythroid genes."[102]
"Using the in vitro CASTing method, we identified a new set of sequences bound by [congenital dyserythropoietic anemia] CDA-KLF1, and based on them we defined the consensus binding site as 5′-NGG-GG(T/G)-(T/G)(T/G)(T/G)-3′. It differs from the consensus binding sites for [wild-type] WT-KLF1, 5′-NGG-G(C/T)G-(T/G)GG-3′, and for [neonatal anemia] Nan-KLF1, 5′-NGG-G(C/A)N-(T/G)GG-3′, as well."[102]
With HAS1 ending at zero and TDA1 beginning at above 1000 bp, Leu3 is from 536 - 545 nts yielding consensus sequences (C/G)C(G/T)NNNN(A/C)G(C/G), 569 - 574 Mig1 (C/T)(C/T)CC(A/G)G and Sdd4 (A/C/T)CCCAC, 585 - 592 Rgt1 (A/T)(A/T)N(A/T)(C/T)CCG, 610 - 617 Rgt1 (A/T)(A/T)N(A/T)(C/T)CCG, 630 - 637 Rgt CGG(A/G)(A/T)N(A/T)(A/T).[103]
The trp promoter has a consensus -35 sequence (TTGACA).[104]
"The conserved region contains a consensus M-box element (TCACATGA) for binding of MITF. This MITF binding site is aligned and conserved between at least 11 different species [...]. The clear conservation of these elements suggests that gpnmb has similar regulation in all mammals."[105]
Consensus sequences: NN(A/C/T)(A/C/T)NC(C/T)(A/C/T)(A/C/T)(A/T)(A/C/T)(A/C/T)N(A/G)(C/G/T)(A/C/T)NNN.[5]
The primary consensus sequence is apparently TT(A/T)CCNN(A/T)TNGG(A/T)AA.[5]
A "subset of bound and unbound motif occurrences [...] contained all of the most highly conserved nucleotides of the 16-bp pseudosymmetric motif (TTnCCnnnTnnGGnAA) ([...] Shore and Sharrocks 1995; Hughes and de Boer 2013)."[5]
Consensus sequence for Met31 binding motif is AAACTGTGG in sulfur amino acid metabolism.[22]
"Both Met31p and Met32p bind to the 5′‐AAACTGTG‐3′ core sequence which is, besides the 5′‐TCACGTG‐3′ element, the second regulatory element known to be involved in the regulation of several MET genes (Thomas et al., 1989)."[106]
"To execute the transcriptional regulation, [Metal-responsive transcription factor-1] MTF-1 binds to the specific site, called [metal responsive element] MRE (core sequence = TGCRCNC), in the promoter region of target gene (Günther et al. 2012a)."[107]
"These genomic sequences were analysed for the presence of [...] middle sporulation elements (MSE) motif (5'-ACACAAA-3') using the NCBI BLAST tool."[108]
Of the midsporulation element (MSE), "a minimal element, CRCAAA(A/T), is sufficient for sporulation specificity."[109]
The upstream activating sequence (UAS) for the Mig1p transcription factor is 5'-SYGGGG-3' or 5'-(C/G)(C/T)GGGG-3'.[22]
The upstream activating sequence (UAS) for the Msn2,4p transcription factor is 5'-CCCCT-3'.[22]
"[msn2p] is a transcription factor that binds to stress-response elements (STREs) resulting in the induction of more than 200 genes.10,11 STRE has a core pentameric cis-acting sequence CCCCT and is located in promoter regions of the induced genes."[110]
"The 24-nt Xenopus Mos [polyadenylation response element] PRE (Charlesworth et al, 2002) contained a match to the SELEX-derived murine Musashi RNA binding consensus sequence (G/AU1−3AGU) (Imai et al, 2001), and included a 3′ U residue essential for PRE function (Charlesworth et al, 2002) [...]."[111]
"Regarding the 3′ UTR cis-regulatory sequences such as AREs (PAS) [110], BRD-Box [111] and MBE [112] mediates negative post-transcriptional regulation by affecting mRNA transcript stability and translational efficiency [110], [140]. In our case, the 3′ cis-regulatory signals, BRD-Box and MBE, located upstream and downstream PAS [...] may regulate tissue-specific alternative polyadenylation which has been detected in approximately 54% of human genes [142]."[112]
"The [Musashi-binding element] MBE consensus sequence is (G/A)U1–3AGU."[113]
"These elements fit the type II MYB consensus sequence A(A/C)C(A/T)A(A/C)C, suggesting that they are MYB recognition elements (MREs)."[114]
MYB binding site involved in drought induction (TAACTG).[20]
Myocyte enhancer factor-2 (MEF2) proteins are a family of transcription factors which through control of gene expression are important regulators of cellular differentiation and consequently play a critical role in embryonic development.[115] In adult organisms, Mef2 proteins mediate the stress response in some tissues.[115]
"The current study delineates the conformational paradigm, clustered recognition, and comparative DNA binding preferences for MEF2A and MEF2B-specific MADS-box/MEF2 domains at the YTA(A/T)4TAR consensus motif."[116] Y = (C/T) and R = (A/G). The consensus sequence is (C/T)TA(A/T)(A/T)(A/T)(A/T)TA(A/G).[116]
"The 3′ UTR of eEF1A contains a putative [Nanos/Pumilio response element (PRE)] PRE sequence (TGTAAAT), suggesting that it is a Nanos/Pumilio target."[117]
Recurrent "kmers occurring in 3′ UTRs were identified in the single-read FLASH data. The polyadenylation signal [ ATA box ] AATAAA had the highest Z-score, and three similar sequences were also found in the top 10 kmers. However, on correlation with down-regulation, the TGTAAAT motif was found by MEME (Bailey and Elkan 1994) at 294 sites (E-value 2.7 × 10−562), which is different from the motif identified by RIP-seq [...]."[118]
"Human pColQ1a carries consensus sequences for transcriptional factors E-protein (E-box, CANNTG), NFAT (GGAAA), c-Ets transcription factor [c-Ets, (C/A)GGA(A/T)], Elk-1, N-box (CCGGAA), and MEF2 (CTAAAAATAA), which play essential roles in muscle-specific and NMJ-specific transcriptional activities (Lee et al., 2004)."[119]
"The [basic helix–loop–helix] bHLH proteins from group E were usually bound to the CACGCG [Coupling element] or CACGAG (N-box) motif."[120]
"Group E comprises two families in which the proteins have a conserved Pro or Gly residue within the basic region that mediates preferential binding to the N-box sequences CACGGC or CACGAC."[121]
"The HEY1 gene binds E-box (CANGTG) and N-box (CACNAG) sites (31,32)."[122]
The "putative consensus binding sites of Notch target genes in human IDE promoter" included the N-box from "the first translation start site (ATG)" -3711/-3715 position in a forward (+) orientation with a consensus sequence of CACNAG of the bHLH protein HES-1 with a strong DNA binding activity.[53] For the closer binding position -310/-305 in a reverse (-) orientation of the bHLH protein Hey-1 CACNAG had a weak DNA binding activity.[53] The "Class C" DNA binding site at position -379/-374 in a reverse (-) orientation with a consensus sequence of CACGNG of the bHLH Hey-1 protein had a strong DNA binding activity.[53]
"The Saccharomyces cerevisiae Ndt80 protein is the founding member of a class of p53-like transcription factors that is known as the NDT80/PhoG-like DNA-binding family."[123]
Ndt80 [Non-DiTyrosine 80] is a meiosis-specific transcription factor required for successful completion of meiosis and spore formation.[124] The DNA-binding domain of Ndt80 has been isolated, and the structure reveals that this protein is a member of the Ig-fold family of transcription factors.[125] Ndt80 also competes with the repressor SUM1 for binding to promoters containing MSEs.[126]
Direct "binding of Ndt80 to the [mid sporulation element CRCAAAA/T (Ozsarac et al. 1997)] MSE is necessary for activating transcription of the middle sporulation genes."[127]
"These genomic sequences were analysed for the presence of [...] middle sporulation elements (MSE) motif (5'-ACACAAA-3') using the NCBI BLAST tool."[108]
Mutation "of the core NFATp binding sequence (GGAAAA) in the IL2 promoter NFAT site entirely eliminates the function of the site, as does mutation of an adjacent non-canonical AP-1 site that is not essential for NFATp binding but that is required for formation of the NFATp-Fos-Jun complex(6, 15).3"[128]
Nuclear factor 1 (NF-1) is a family of closely related transcription factors that constitutively bind as dimers to specific sequences of DNA with high affinity.[129] Family members contain an unusual DNA binding domain that binds to the recognition sequence 5'-TTGGCXXXXXGCCAA-3'.[130]
Consensus sequences for the nuclear factor 1 are TGGCA, TGGCG and TGGAA.[131]
An apparent consensus sequence for the NF1 is TGG(A/C)(A/G).
There is only one nucleotide difference between the SESN2 CARE and the ATF4-inducible asparagine synthase gene ASNS consensus sequence (GTTTCATCA).[132]
The upstream activating sequence (UAS) for the Oaf1p transcription factor is 5'-CGGN3TNAN9-12CCG-3'.[22]
"As a transcription factor, ORE1 was reported to bind to consensus DNA sequences of [ACG][CA]GT[AG]N{5,6}[CT]AC[AG] [29] or T[TAG][GA]CGT[GA][TCA][TAG] [37]."[70]
Consensus sequences are 5'-(A/C/G)(A/C)GT(A/G)N5,6(C/T)AC(A/G)-3' or 5'-T(A/G/T)(A/G)CGT(A/G)(A/C/T)(A/G/T)-3'.[70]
"A p53 consensus DNA RE is composed of a tandem of two decameric palindromic sequences (half-sites) 5′-RRRCWWGYYY-3′, where R = purine, Y = pyrimidine and W is either A or T. There is a variability in composition of p53 REs, thus two half-sites can be separated by a spacer DNA, typically 0–13 bp in length and many p53 DNA REs have varying numbers of half-sites (19,20,22,33–37)."[133]
p53 response elements found in the promoter of TUG1: 5'-CAGGCCC-3' and 5'-GGGCGTG-3'.[44]
"As VRI [target gene: vrille (VRI)] accumulates in the nucleus during the mid to late day, it binds VRI/PDP1ϵ binding sites (V/P-boxes) [consensus of V box: A(/G)TTA(/T)T(/C), of P-box: GTAAT(/C)], to repress Clk and cry transcription (Hardin, 2004)."[134]
Consensus sequence is 5'-CGACCCC-3'.[44]
"The [palindromic E-box motif (CACGTG)] motif is bound by the transcription factor Pho4, [and has the] class of basic helix-loop-helix DNA binding domain and core recognition sequence (Zhou and O'Shea 2011)."[5]
The Pho4 homodimer binds to DNA sequences containing the bHLH binding site 5'-CACGTG-3'.[21]
The upstream activating sequence (UAS) for Pho4p is 5'-CAC(A/G)T(T/G)-3' in the promoters of HIS4 and PHO5 regarding phosphate limitation with respect to regulation of the purine and histidine biosynthesis pathways [66].[22]
"Electrophoretic mobility shift assays identified a pollen-specific cis-acting element POLLEN1 (AGAAA) mapped at AtACBP4 (−157/−153) which interacted with nuclear proteins from flower and this was substantiated by DNase I footprinting."[75]
"Given that AtACBP4pro::GUS (−156/−67) could drive promoter activity for pollen expression, [electrophoretic mobility shift assays] EMSAs were carried out to investigate the role of the putative POLLEN1 cis-element, AGAAA (−150/−146), and its adjacent co-dependent regulatory element TCCACCATA (–141/–133)."[75]
"POLLEN1 and the TCCACCATA element are co-dependent regulatory elements responsible for pollen-specific activation of tomato LAT52 (Bate and Twell 1998)."[75]
"Two predicted [Pleiohomeotic] Pho-Phol binding sites, CGCCATTT, that closely resemble the extended Pho-Phol consensus sequence, CGCCAT(T/A)TT (Kahn et al. 2014), are located within PRE1.1 [...]."[135]
"Two domains upstream of the start site of transcription have been identified for which a consensus sequence has been formulated(1-5). These domains are the -35 sequence (5'-T-T-G-A-C-A) and the Pribnow box (5'-T-A-T-A-A-T) in the -10 region. Both domains are in close contact with the RNA polymerase during initiation of RNAsynthesis (2,6)."[104]
"The main cis-element present in their promoters is an endosperm-specific box [19,20], which consists of two motifs: a GLM (GCN4-like motif) (5′ G(A)TGA(G) GTCAT 3′) that shares homology with yeast GCN4 [21], and a 7 bp P-box (Prolamin box) (5′TGTAAAG3′) [22–24]."[136]
"Prediction of cis-regulatory elements using bioinformatics tools: Upstream 1500bp sequence of transcriptional start site of each gene were searched using PLANTCARE database to identify the cis-regulatory elements of NtSUT1-5 genes. Moreover, manual scanning was also done to identify the presence of sugar responsive elements such as sucrose box (NNAATCA) (Chen et al., 2002; Fillion et al., 1999) and pyrimidine box (CCTTTT, TTTTTTCC) (Washio, 2003)."[137]
"Promoter analysis of five SUTs revealed the presence of sugar responsive element, A-box (TACGTA), which is involved in regulation of sucrose transporters upon addition of sucrose (Kühn, 2011; Osuna et al., 2007). Sugar responsive elements such as sucrose box (NNAATCA) (Chen et al., 2002; Fillion et al., 1999) important for sugar responsive gene expression, pyrimidine box (CCTTTT) (Washio, 2003) partially involved in sugar repression were also observed."[137]
"The basal regulatory elements identified include a putative TATA-box (−30/−24) for RNA polymerase binding and a CAAT box (−64/−61; [...]). Several putative floral expression-related cis-elements identified included a putative 6-nucleotide Q element (−770/−665), three GTGA boxes (−372/−369, −209/−206 and −164/−161) and four putative highly-conserved POLLEN1 boxes (−737/−733, −711/−707, −150/−146 and −36/−32; [...])."[75]
The consensus sequence for a Q element is 5'-AGGTCA-3'.[75]
The quinone reductase (QRDRE) gene contains TCCCCTTGCGTG which has the DRE core of TNGCGTG.[138]
Consensus sequences: C(A/C/G)(A/C/G)(A/G)(C/G/T)C(A/C/T)(A/G/T)(C/G/T)(A/G/T)(A/C/G)(A/C)(A/C/T)(A/C/T).[5]
"Rap1 is another GRF that organizes chromatin, binds promoters of genes that encode ribosomal and glycolytic proteins, and binds telomeres (Shore 1994; Ganapathi et al. 2011; Hughes and de Boer 2013). [...] DNA shape analysis revealed that Rap1 motifs possess an intrinsically wide minor groove spanning the central degenerate region of the motif that was wider at binding-competent sites [...]. A clear trend was observed between increased width of the minor groove in the central degenerate region of the motif and increased Rap1 binding in vitro."[5]
Copying an apparent consensus sequence for Rap1 (CCCACCAACAAAA) and putting it in "⌘F" finds none located between ZSCAN22 or none between ZNF497 and A1BG as can be found by the computer programs.
Purified "Reb1 bound [...] exact TTACCCK occurrences [...] with >60% of 780 occurrences at promoters. [And can have] the extended motif VTTACCCGNH (IUPAC nomenclature) (Rhee and Pugh 2011)."[5]
Copying the apparent consensus sequence for Reb1 (TTACCC(G/T)) and putting it in "⌘F" finds one located between ZSCAN22 or none between ZNF497 and A1BG as can be found by the computer programs. However, an extended Reb1 (ATTACCCGAA) finds none located between ZSCAN22 or between ZNF497 and A1BG.
"Robbins et al. (18) have reported that expression of pRB in mouse fibroblasts suppresses transcription of c-fos and have identified an element, termed the retinoblastoma control element (RCE), in the c-fos promoter necessary for this suppression. More recently, sequences homologous to the RCE have been identified in the TGF-β1, -β2, and -β3 promoters by Kim et al. (19)."[139]
"Comparison of the sequence of the newly cloned mouse MMP-9 promoter region with our previous human isolate revealed that [...] four units of GGGG(T/A)GGGG sequence (GT box) were conserved between the two species."[140]
"Expression of some matrix metalloproteinases (MMPs) are regulated by cytokines and tumor promoters, namely tumor necrosis factor-𝛂 (TNF-𝛂), epidermal growth factor, interleukin-1, and 12-O-tetradecanoylphorbol-13-acetate (TPA) (15-20)."[140]
Expression "of v-Src induces the synthesis of MMP-9, which is mediated by alterations in activity of binding factors for the AP-1 site and the sequence motif GGGGTGGGG (GT box). This GT box is homologous to the so-called retinoblastoma (Rb) control element (RCE) (29,30), and Rb can produce an anti-oncogene or tumor suppressor gene product (31-38) which is involved in regulating transcription of certain genes."[140]
Binding site for NF𝛋B in humans (GGAATTCCCC) with a core of (GAATTC), Sp-1 (CCGCCCC), 12-O-tetradecanoylphorbol-13-acetate (TPA) responsive element (TRE) (TGAGTCA), and GC box (GGGCGG).[140]
"Angiotensin II (Ang II) up-regulates plasminogen-activator inhibitor type-1 (PAI-1) expression in mesangial cells to enhance extracellular matrix formation. The proximal promoter region (bp -87 to -45) of the human PAI-1 gene contains several potent binding sites for transcription factors [two phorbol-ester-response-element (TRE)-like sequences; D-box (-82 to -76) and P-box (-61 to 54), and one Sp1 binding site-like sequence, Sp1-box 1 (-72 to -67)]."[63]
"The methylation-interference experiment demonstrated that human recombinant Sp1 bound to the so-called GT box (TGGGTGGGGCT, -78 to -69), which contains the Sp1-box 1."[63]
D-box (TGAGTGG), Sp1-box 1 (GGGGCT), P-box (TGAGTTCA), Sp1-box 2 (CTGCCC), and TATA box (TATAAA).[63]
Retinoic acid response elements (RAREs).
"Retinoic acid is considered as the earliest factor for regulating anteroposterior axis of neural tube and positioning of structures in developing brain through retinoic acid response elements (RARE) consensus sequence (5′–AGGTCA–3′) in promoter regions of retinoic acid-dependent genes."[141]
"Several studies have suggested that the target gene of the RA signal generally contains two direct-repeat half sites of the consensus sequence AGGTCA that are spaced by one to five base pairs (14,16,32,38)."[142]
"Xavier-Neto’s review demonstrated that the magic AGGTCA has high affinity but poor specificity (16). Some other [nuclear receptors] NRs also utilized the RARE with the same spacer models that are used by RXRs/RARs, for example, orphan receptors, vitamin D receptors (VDR) and peroxisome proliferator-activated receptors (PPAR) (32,39). Identifying a bona fide RARE is more difficult than a simple inspection. In order to attribute the RARE in Cx43 to a candidate sequence, some observations have been conducted in our study using molecular, biological and biophysical methods and functional approaches. In a ligand-dependent luciferase assay, RARE was located between the −1,426 to −341 base pair position. The constitutively active mutant Cx43 RARE represses the luciferase activity in the absence of the ligand and has no response to the 9cRA. Our findings indicate that RARE in the Cx43 promoter is a functional element."[142]
Additional response elements that include the 5'-AGGTCA-3' are Q elements, ROR-response elements and Thyroid hormone response elements.
A likely general consensus sequence may be 5'-AG(A/G)TCA-3'.[142]
Copying the apparent consensus sequence for the RARE (AGGTCA) and putting it in "⌘F" finds two located between ZSCAN22 and A1BG and three between ZNF497 and A1BG as can be found by the computer programs.
"Using MAP-C [Mutation Analysis in Pools by Chromosome conformation capture], we show that inducible interchromosomal pairing between HAS1pr-TDA1pr alleles in saturated cultures of Saccharomyces yeast is mediated by three transcription factors, Leu3, Sdd4 (Ypr022c), and Rgt1. The coincident, combined binding of all three factors is strongest at the HAS1pr-TDA1pr locus and is also specific to saturated conditions. We applied MAP-C to further explore the biochemical mechanism of these contacts, and find they require the structured regulatory domain of Rgt1, but no known interaction partners of Rgt1."[103]
With HAS1 ending at zero and TDA1 beginning at above 1000 bp, 585 - 592 Rgt1 (A/T)(A/T)N(A/T)(C/T)CCG, 610 - 617 Rgt1 (A/T)(A/T)N(A/T)(C/T)CCG, 630 - 637 Rgt CGG(A/G)(A/T)N(A/T)(A/T), the third Rgt1 is the inverse of the first two.[103]
Rgt1 is also known as glucose transporter gene repressor.
Root specific elements (TGACGTCA).[20]
RAR-related orphan receptor "ROR-γ binds DNA with specific sequence motifs AA/TNTAGGTCA (the classic RORE motif) or CT/AG/AGGNCA (the variant RORE motif)13, 31."[143]
Copying the apparent consensus sequence for the RORE (ATATAGGTCA) and putting it in "⌘F" finds one located between ZSCAN22 and A1BG and none between ZNF497 and A1BG as can be found by the computer programs.
Copying the apparent consensus sequence for the variant RORE (CTGGGACA) and putting it in "⌘F" finds two located between ZSCAN22 and A1BG and one between ZNF497 and A1BG as can be found by the computer programs.
The consensus sequence for the RRE is 5'-CATCTG-3'.[144]
The SRE wild type (SREwt) contains the nucleotide sequence ACAGGATGTCCATATTAGGACATCTGC, of which CCATATTAGG is the CArG box, TTAGGACAT is the C/EBP box, and CATCTG is the E box.[145]
5'-CCATATTAGG-3' is a CArG box that does not occur in either promoter of A1BG.
5'-CATCTG-3' is an E box that does not occur in either promoter of A1BG.
5'-TTAGGACAT-3' is a C/EBP box that does not occur in either promoter of A1BG using "⌘F".
5'-ACAGGATGT-3' is contained in the above nucleotide sequence which has one occurring between ZNF497 and A1BG using "⌘F" and none between ZSCAN22 and A1BG.
The "positive effect of W element may result from cooperative interactions between Z and other downstream elements such as the Servenius sequence, GGACCCT, located from -131 to -125 bp(28,38)."[146]
Sp1-box 1 (GGGGCT) and Sp1-box 2 (CTGCCC).[63]
"Sp3 has been shown to repress transcriptional activity of Sp1 [9]."[63]
Sp-1 (CCGCCCC).[140]
Sp1 (GCGGC).[131]
SP1 (GGGGCGGGCC).[44]
An apparent consensus sequences for Sp1 (GGGGCT), (CTGCCC) or (CCGCCCC) is 5'-(C/G)(C/G/T)G(C/G)C(C/T)-3'. Or, each must be considered separately.
Copying the apparent consensus sequences for Sp1 (GGGGCT), (CTGCCC) or (CCGCCCC) and putting each sequence in "⌘F" finds none located between ZSCAN22 and A1BG and four, two or none between ZNF497 and A1BG as can be found by the computer programs.
A "homologous IFN-𝛄 activation site (GAS) element, having the consensus sequence TTC/ANNNG/TAA, is found in the promoters of several [interferon-stimulated genes] ISG.(37–40)"[147] Consensus sequences: STAT1 - TTCC(C/G)GGAA, STAT3 - TTCC(C/G)GGAA, STAT4 - TTCCGGAA, STAT5 - TTCNNNGAA and STAT6 - TTCNNNNGAA.[147]
"The GAS element is palindromic and the sequence TTCN(2-4)GAA defines the optimal binding site for all STATs, with the exception of STAT2 which appears to be defective in GAS-DNA binding [...]."[148]
The upstream activating sequence (UAS) for [Sterile 12 protein] Ste12p is 5'-TGAAAC-3'.[22]
Manual "scanning was also done to identify the presence of sugar responsive elements such as sucrose box (NNAATCA) (Chen et al., 2002; Fillion et al., 1999) and pyrimidine box (CCTTTT, TTTTTTCC) (Washio, 2003)."[137]
"A unique synaptic activity-responsive element (SARE) sequence, composed of the consensus binding sites for SRF, MEF2 and CREB, is necessary for control of transcriptional upregulation of the Arc gene in response to synaptic activity."[149]
"Within the cAMP-responsive element of the somatostatin gene, we observed an 8-base palindrome, 5'-TGACGTCA-3', which is highly conserved in many other genes whose expression is regulated by cAMP."[58]
The consensus sequence for the myocyte enhancer factor 2 (MEF2) is (C/T)TA(A/T)(A/T)(A/T)(A/T)TA(A/G).[116]
The SRE wild type (SREwt) contains the nucleotide sequence ACAGGATGTCCATATTAGGACATCTGC, of which CCATATTAGG is the CArG box, TTAGGACAT is the C/EBP box, and CATCTG is the E box.[145]
"A consensus sequence TACTAA(C/T) was derived for the branch site of Dictyostelium introns."[150]
The "heptamer consensus sequence CAGGTAG (i.e., the TAGteam) is overrepresented in regulatory regions of the earliest expressed zygotic genes [2]."[151]
Copying the consensus TAGteam: 5'-CAGGTAG-3' and putting the sequence in "⌘F" finds one location between ZNF497 and A1BG or no locations between ZSCAN22 and A1BG as can be found by the computer programs.
The consensus sequence for the TAPETUM box is TCGTGT.[75]
"About 24% of human genes have a TATA-like element and their promoters are generally AT-rich; however, only ~10% of these TATA-containing promoters have the canonical TATA box (TATAWAWR)."[41]
Only an inverse and its complement occurs between ZSCAN22 and A1BG: 5'-TACCTAT-3' at 2996 nts from ZSCAN22.
"The different inducing activities of Xbra, VegT and Eomesodermin suggest that the proteins might recognise different DNA target sequences. [...] All three proteins prove to recognise the same core sequence of TCACACCT with some differences in flanking nucleotides."[152]
"Most bZIP proteins show high binding affinity for the ACGT motifs, which include [...] AACGTT (T box) [...]."[7]
"Despite sequence variations within the Tbox DBD between family members, all members of the family appear to bind to the same DNA consensus sequence, TCACACCT. In several in vitro binding-site selection studies, members of the Tbox family were found to bind preferentially sequences containing two or more of these core motifs arranged in various orientations; however, the significance of such double sites in vivo is uncertain, as most Tbox target gene sites have been found to contain only a single consensus motif (18)."[153]
"The TEA/ATTS transcription factor family consists of mammalian, avian, nematode, insect and fungal members that share a conserved TEA domain. The TEA domain (Bürglin, 1991) represents a DNA‐binding region that is composed of 66–76 conserved amino acids (aa) in the N‐terminal section of the proteins."[154]
"The TEA consensus sequence (TCS) in a fungal TEA/ATTS transcription factor target promoter has been defined as 5′‐CATTCY‐3′ (Andrianopoulos and Timberlake, 1994)."[154]
The upstream activating sequence (UAS) for Tec1p is 5'-GAATGT-3'.[22]
In the nucleotides between ZSCAN22 and A1BG there is are ten 5'-TTAGGG-3' beginning about 300 nucleotides from ZSCAN22 or ending at about 3900 nts. There are two among the nucleotides between ZNF497 and A1BG as A1BG is approached from ZNF497.
Homo sapiens genes containing these are found using Homo sapiens "TRF (TTAGGG repeat binding factor)".[155]
"The arrangement of TREs within the promoter might regulate THR action by determining THR isoform binding, THR dimerization, and coregulators binding. In the classic view of how TH and its receptor stimulate gene expression, the gene promoter contains TREs consisting of a 6-bp consensus sequence (AGGTCA) organized as a direct repeat separated by 4 bp (DR4), a palindrome without spacing (PAL), or an inverted palindrome (LAP) separated by 4 to 6 bp (10–13)."[156]
Transcription factor 3 (E2A immunoglobulin enhancer-binding factors E12/E47) (TCF3), is a protein that in humans is encoded by the TCF3 gene.[157][158] TCF3 has been shown to directly enhance Hes1 (a well-known target of Notch signaling) expression.[159]
This gene encodes a member of the E protein (class I) family of helix-loop-helix transcription factors. The 9aaTAD transactivation domains of E proteins and MLL are very similar and both bind to the KIX domain of general transcriptional mediator CBP.[160][161] E proteins activate transcription by binding to regulatory E-box sequences on target genes as heterodimers or homodimers, and are inhibited by heterodimerization with inhibitor of DNA-binding (class IV) helix-loop-helix proteins. E proteins play a critical role in lymphopoiesis, and the encoded protein is required for B and T lymphocyte development.[162]
Consensus sequence found in the promoter of TUG1 is 5'-GTCTGGT-3'.[44]
"Maternal mRNAs are translationally regulated during early development. Zar1 and its closely related homolog, Zar2, are both crucial in early development. Xenopus laevis Zygote arrest 2 (Zar2) binds to the Translational Control Sequence (TCS) in maternal mRNAs and regulates translation."[163]
"Putative TCSs have been identified in Wee1, PCM-1 and Mos 3′ UTRs. The TCSs are slightly different: AUUAUCU (Wee1 TCS1), AUUGUCU (Wee1 TCS2) and UUUGUCU (Mos and PCM-1 TCS) [20]".[163]
X-box binding protein 1s "XBP1s binds to the [unfolded protein response] UPR element (UPRE) containing the consensus sequence TGACGTGG/A and regulates the transcription of target genes in a cell type- and condition-specific manner (Yamamoto et al., 2004)."[71]
The consensus sequence for UPRE is 5'-TGACGTG(G/A)-3'.[71]
"The helix-loop-helix transcription factor USF (upstream stimulating factor) binds to a regulatory sequence of the human insulin gene enhancer."[164]
"The regulation of insulin gene expression is dependent on sequences located upstream of the transcription start site (Clark and Docherty, 1992). Two important cis-acting elements, the insulin enhancer binding site 1 (IEBI) or NIR box and the IEB2 or FAR box, have been identified in the rat insulin I gene (Karlsson et al., 1987, 1989). Located at positions -104 (IEBI/NIR) and -233 (IEB2/FAR), these elements share an identical 8 bp sequence, GCCATCTG, which contains a consensus sequence, CANNTG, characteristic of E-box elements (Kingston, 1989). E boxes are present in enhancers from a variety of genes, including immunoglobulin and muscle-specific genes, where they interact with transcription factors containing a helix-loop-helix (HLH) dimerization domain (Murre et al., 1989)."[164]
"The IEB1 box is highly conserved among insulin genes, and is thus likely to play an important role in controlling transcription. The IEB2 site is not well conserved; in the rat insulin 2 gene the equivalent sequence is GCCACCCAGGAG, and in the human insulin gene the homologous sequence, which has been previously designated the GC2 box (Boam et al., 1990a), is GCCACCGG."[164]
"Confirmation that USF bound at the IEB2 site was obtained using an oligonucleotide containing the USF binding site from the adenovirus MLP."[164]
A likely general USF box consensus sequence may be 5'-GCC(A/T)NN(C/G/T)(A/G)-3'.
"As VRI accumulates in the nucleus during the mid to late day, it binds VRI/PDP1ϵ binding sites (V/P-boxes) [consensus V box:A(/G)TTA(/T)T(/C), P box:GTAAT(/C)], to repress Clk and cry transcription (Hardin, 2004)."[134]
In the negative direction (from ZSCAN22 to A1BG) there are up to 81 V boxes, 28 to 4538 nts from ZSCAN22 with the apparent TSS at 4460 nts.
In the positive direction (from ZNF497 to A1BG) there are up to 21 V boxes, 23 to 4310 nts from ZNF497 with the known TSS at 4300 nts.
"Using the Jasper and Consite algorithms, the A/GGG/TTCAnnnA/GGG/TTCA and GA/GGTTCATnnnGTTCA sequences were considered as human and mouse VDRE consensus sequences, respectively, as previously shown.17, 18 Previous studies have suggested that regulatory VDREs could locate distally, i.e. > 1 Mb, to the transcription starting site.19 We analysed the entire genomic sequence of the human and murine HOTAIR and ANRIL genes as well as 5 kb upstream the transcription starting sites to include proximal promoter regions. Our analysis revealed two and three potential VDREs in the human HOTAIR and ANRIL genes, respectively, all of them were located within the intron 1 [...]."[165]
The "presence of WRKY TF binding sites (C/TTGACC/T, W boxes) in numerous co-regulated Arabidopsis defense gene promoters provided circumstantial evidence that zinc-finger-type WRKY factors play a broad and pivotal role in regulating defenses [10]."[166]
The W box is a DNA cis-regulatory element sequence, (T)TGAC(C/T), which is recognized by the family of WRKY transcription factors.[167][168]
"[T]he X gene core promoter element 1 ... is located between nucleotides -8 and +2 relative to the transcriptional start site (+1) and has a consensus sequence of G/A/T-G/C-G-T/C-G-G-G/A-A-G/C+1-A/C."[169]
The classical recognition motif of the AhR/ARNT complex, referred to as either the AhR-, dioxin- or xenobiotic- responsive element (AHRE, DRE or XRE), contains the core sequence 5'-GCGTG-3'[170] within the consensus sequence 5'-T/GNGCGTGA/CG/CA-3'[171][138] in the promoter region of AhR responsive genes. The AhR/ARNT heterodimer directly binds the AHRE/DRE/XRE core sequence in an asymmetric manner such that ARNT binds to 5'-GTG-3' and AhR binding 5'-TC/TGC-3'.[172][173][174] Recent research suggests that a second type of element termed AHRE-II, 5'-CATG(N6)C[T/A]TG-3', is capable of indirectly acting with the AhR/ARNT complex.[175][176]
"Saccharomyces cerevisiae contains eight members of a novel and fungus-specific family of bZIP proteins that is defined by four atypical residues on the DNA-binding surface. Two of these proteins, Yap1 and Yap2, are transcriptional activators involved in pleiotropic drug resistance. Although initially described as AP-1 factors, at least four Yap proteins bind most efficiently to TTACTAA, a sequence that differs at position ±2 from the optimal AP-1 site (TGACTCA); further, a Yap-like derivative of the AP-1 factor Gcn4 (A239Q S242F) binds efficiently to the Yap recognition sequence."[177]
The upstream activating sequence (UAS) for Yap1p/2p is TTACTAA, which is found in genes GSH1, TRX2, YCF1, GLR1, induced by oxidative stress such as H2O2, for regulation of genes expressed in response to environmental changes.[22]
YY1 consensus sequence is 5'-CCATTTA-3' and 5'-CCATCTT-3'.[44]
"The HY5 protein interacts with both the G- (CACGTG) and Z- (ATACGTGT) boxes of the light-regulated promoter of RbcS1A (ribulose bisphosphate carboxylase small subunit) and the CHS (chalcone synthase) genes (Ang et al., 1998; Chattopadhyay et al., 1998; Yadav et al., 2002)."[24]
Z-boxes 1-3 contain 5'-AGGTG-3'.[178]
|title=
at position 104 (help)
|pmid=
value (help). Retrieved 27 August 2020.
|pmid=
value (help). Retrieved 18 September 2020.
|pmid=
value (help). Retrieved 5 September 2020.
|access-date=
requires |url=
(help)
|pmid=
value (help). Retrieved 29 August 2020.
|pmid=
value (help). Retrieved 6 May 2021.
|access-date=
requires |url=
(help)
|access-date=
requires |url=
(help)
|pmid=
value (help). Retrieved 19 March 2021.
|pmid=
value (help). Retrieved 21 March 2021.