A single-nucleotide polymorphism (SNP, pronounced snip) is a genetic polymorphism in which a DNA sequence variation is a single nucleotide — A, T, C, or G — in the genome (or other shared sequence) differs between members of a species (or between paired chromosomes in an individual). For example, two sequenced DNA fragments from different individuals, AAGCCTA to AAGCTTA, contain a difference in a single nucleotide. In this case we say that there are two alleles : C and T. Almost all common SNPs have only two alleles.
Within a population, SNPs can be assigned a minor allele frequency — the lowest allele frequency at a locus that is observed in a particular population. This is simply the lesser of the two allele frequencies for single-nucleotide polymorphisms[1]. There are variations between human populations, so a SNP allele that is common in one geographical or ethnic group may be much rarer in another.
In the past, SNPs with a minor allele frequency of greater than or equal to 1% (or 0.5%, etc.) were given the title "SNP".[1] Some used "mutation" to refer to variations with low allele frequency. With the advent of modern bioinformatics and a better understanding of evolution, this definition is no longer necessary, e.g., a database such as dbSNP includes "SNPs" that have lower allele frequency than one percent.[2]
Types of SNPs |
---|
|
Single-nucleotide polymorphisms may fall within coding sequences of genes, non-coding regions of genes, or in the intergenic regions between genes. SNPs within a coding sequence will not necessarily change the amino acid sequence of the protein that is produced, due to degeneracy of the genetic code. A SNP in which both forms lead to the same polypeptide sequence is termed synonymous (sometimes called a silent mutation) — if a different polypeptide sequence is produced they are nonsynonymous. A nonsynonymous change may either be missense or "nonsense", where a missense change results in a different amino acid, while a nonsense change results in a premature stop codon. SNPs that are not in protein-coding regions may still have consequences for gene splicing, transcription factor binding, or the sequence of non-coding RNA.
Variations in the DNA sequences of humans can affect how humans develop diseases and respond to pathogens, chemicals, drugs, vaccines, and other agents. SNPs are also thought to be key enablers in realizing the concept of personalized medicine.[3] However, their greatest importance in biomedical research is for comparing regions of the genome between cohorts (such as with matched cohorts with and without a disease).
The study of single-nucleotide polymorphisms is also important in crop and livestock breeding programs (see genotyping). See SNP genotyping for details on the various methods used to identify SNPs.
Example SNPs are rs6311 and rs6313 in the HTR2A gene. A SNP in the F5 gene causes a hypercoagulability disorder with the variant Factor V Leiden. An example of a triallelic SNP is rs3091244.[4]
As there are for genes, there are also bioinformatics databases for SNPs. dbSNP is a SNP database from National Center for Biotechnology Information (NCBI). SNPedia is a wiki-style database from a private company. The OMIM database describes the association between polymorphisms and, e.g., diseases.
The nomenclature for SNPs can be confusing: several variations can exist for an individual SNP and consensus has not yet been achieved. One approach is to write SNPs with a prefix, period and greater than sign showing the wild-type and altered nucleotide or amino acid; for example, c.76A>T.[5][6][7]