Promoter activity is a term that encompasses several meanings around the process of gene expression from regulatory sequences —promoters[2] and enhancers.[3] Gene expression has been commonly characterized as a measure of how much, how fast, when and where this process happens.[4] Promoters and enhancers are required for controlling where and when a specific gene is transcribed.[3]
Traditionally the measure of gene products (i.e. mRNA, proteins, etc.) has been the major approach of measure promoter activity. However, this method confront with two issues: the stochastic nature of the gene expression[5] and the lack of mechanistic interpretation of the thermodynamical process involved in the promoter activation.[4]
The actual developments in metabolomics product of developments of next-generation sequencing technologies and molecular structural analysis have enabled the development of more accurate models of the process of promoter activation (e.g. the sigma structure of the polymerase holoenzyme domains[6]) and a better understanding of the complexities of the regulatory factors involved.
The process of binding is central in determining the "strength" of promoters, that is the relative estimation of how "well" a promoter perform the expression of a gene under specific circumstances. Brewster et al.,[7] using a simple thermodynamical model based on the postulate that transcriptional activity is proportional to the probability of finding the RNA polymerase bound at the promoter, obtained predictions of the scaling of the RNA polymerase binding energy. This models support the relationship between the probability of binding and the output of gene expression[7]
The problem of gene regulation could be represented mathematically as the probability of n molecules — RNAP, activators, repressors and inducers — are bound to a target regions.[4][2]
To compute the probability of bound, it is needed to sum the Boltzmann weights over all possible states of polymerase molecules
on DNA.[8] Here in this deduction is the effective number of RNAP molecules available for binding to the promoter.
This approach is based in statistical thermodynamics of two possible microscopic outcomes:[4]
The statistical weight of promoter unoccupied Z(P) is defined:
Where the first term is the combinatorial result of taken polymerase of non-specific sites available, and the second term are the Boltzmann weights, where is the energy that represents the average binding energy of RNA polymerase to the genomic background (non-specific sites).
Then, the total statistical weight , can be written as the sum of the state and the RNA polymerase on promoter state:
Where in the state is the binding energy for RNA polymerase on the promoter (where the s stands for specific site).
Finally, to find the probability of a RNA polymerase to binding ( ) to a specific promoter, we divide by which produces:
Where,
An important result of this model is that any transcription factor, regulator or perturbation could be introduced as a term multiplying in the probability of binding equation. This term for any transcriptional factor (here called factor regulators) modify the probability of binding to:
Where is the term for transcriptional factors, and it has the value of for increase of for decrease of the number of RNA polymerase available to bind.
This result has an important significance to represent mathematically all the possible configurations of transcriptional factor by derive different models to estimate (for further developments, see also [4]).
The process of activation and binding in eukaryotes is different from bacteria in the way that specific DNA elements bind the factors for a functional pre-initiation complex. In bacteria there is a single polymerase, that contain catalytic subunits and a single regulatory subunits known as sigma, which transcribe for different type of genes.[9]
In eukaryotes, the transcription is performed by three different RNA polymerase, RNA pol I for ribosomal RNAs (rRNAs), RNA polymerase II for messenger RNAs (mRNAs) and some small regulatory RNAs, and the RNA polymerase III for small RNAs such as transfer RNAs (tRNAs). The process of positioning of the RNA polymerase II and the transcriptional machinery require the recognition of a region known as "core promoter".[9] The elements that could be found in the core promoter include the TATA element, the TFIIB recognition element (BRE), the initiator (Inr), and the downstream core promoter element (DPE).[10] Promoters in eukaryotes contain one or more of these core promotes elements (but any of them are absolutely essential for promoter function),[9] these elements are binding sites for subunits of the transcriptional machinery and are involve in the initiation of the transcription, but also they have some specific enhancer functions.[10] In addition, the promoter activity in eukaryotes include some complexities in the way of how they integrate signals from distal factors with the core promoter.[11]
Unlike in protein coding regions, where the assumption of sequence conservation of functionally homologous genes have been frequently proved, there is not a clear relationship of conservation between sequences and their functions for regulatory regions.[12] The transcriptional promoters regions are under less stringent selection, then have a higher substitutions rates, allowing transcription factor binding sites to be replaced easily be new ones arising from random mutations.[12] Notwithstanding the sequence changes, mainly the functions of regulatory sequences remain conserved.[12]
In recents years with the increase of availability of genome sequences, phylogenetic footprinting open the possibility to identify cis-elements, and then study their evolution processes. In this sense, Raijman et al.,[13] Dermitzakis et al.[14] have developed techniques for analyzing evolutionary processes in transcription factor regions in Saccharomyces species promoters and mammalian regulatory networks respectively.
The basis for many of these evolutionary changes in nature are probably related with events within the cis-regulatory regions involve in gene expression.[15] The impact of variation in regulatory regions is important for disease risk[14] due their impact in the gene expression level. Furthermore, perturbations in the binding properties of proteins encoded by regulatory genes have been linked with phenotypes effects such as, duplicated structures, homeotic transformations and novel morphologies.[15]
The measure of the promoter activity has a broad meaning. The promoter activity could be measured for different situations or research questions,[4] such as:
Methods to study promoter activity commonly are based in the expression of a reporter gene from the promoter of the gene of interest.[16][2][17] Mutations and deletions are made in a promoter region, and their changes on couple expression of the reporter gene are measured.[18]
The most important reporter genes are the fluorescence proteins as GFP. These reporters allow to measure promoter activation by increasing fluorescent signals, and deactivation by decrease in the rate of fluorescence.[19]
The RNA world hypothesis assumes that very early in evolution, prior to the emergence of DNA as a genetic material and prior to the emergence of protein enzymes, RNA was the key player in the emergence of life.[20] A central idea in this hypothesis is an RNA replicase (ribozyme) that is capable of copying its own genome.[21] A holopolymerase ribozyme has been engineered that uses a sigma factor-like specificity primer to recognize an RNA promoter sequence.[22] This ribozyme can then, in a second step rearrange to a processive form that can polymerize from certain RNA promoters and not others.[22]