In the field of genetic sequencing, genotyping by sequencing, also called GBS, is a method to discover single nucleotide polymorphisms (SNP) in order to perform genotyping studies, such as genome-wide association studies (GWAS).[1] GBS uses restriction enzymes to reduce genome complexity and genotype multiple DNA samples.[2] After digestion, PCR is performed to increase fragments pool and then GBS libraries are sequenced using next generation sequencing technologies, usually resulting in about 100bp single-end reads.[3] It is relatively inexpensive and has been used in plant breeding.[2] Although GBS presents an approach similar to restriction-site-associated DNA sequencing (RAD-seq) method, they differ in some substantial ways.[4][5][6]
GBS is a robust, simple, and affordable procedure for SNP discovery and mapping. Overall, this approach reduces genome complexity with restriction enzymes (REs) in high-diversity, large genomes species for efficient high-throughput, highly multiplexed sequencing. By using appropriate REs, repetitive regions of genomes can be avoided and lower copy regions can be targeted, which reduces alignments problems in genetically highly diverse species. The method was first described by Elshire et al. (2011).[1] In summary, high molecular weight DNAs are extracted and digested using a specific RE previously defined by cutting frequently[7] in the major repetitive fraction of the genome. ApeKI is the most used RE. Barcode adapters are then ligated to sticky ends and PCR amplification is performed. Next-generation sequencing technology is performed resulting in about 100 bp single-end reads. Raw sequence data are filtered and aligned to a reference genome using usually Burrows–Wheeler alignment tool (BWA) or Bowtie 2. The next step is to identify SNPs from aligned tags and score all discovered SNPs for various coverage, depth and genotypic statistics. Once a large-scale, species-wide SNP production has been run, it is possible to quickly call known SNPs in newly sequenced samples.[8]
When initially developed, the GBS approach was tested and validated in recombinant inbred lines (RILs) from a high-resolution maize mapping population (IBM) and doubled haploid (DH) barley lines from the Oregon Wolfe Barley (OWB) mapping population. Up to 96 RE (ApeKI)-digested DNA samples were pooled and processed simultaneously during the GBS library construction, which was checked on a Genome Analyzer II (Illumina, Inc.). Overall, 25,185 biallelic tags were mapped in maize, while 24,186 sequence tags were mapped in barley. Barley GBS marker validation using a single DH line (OWB003) showed 99% agreement between the reference markers and the mapped GBS reads. Although barley lacks a complete genome sequence, GBS does not require a reference genome for sequence tag mapping, the reference is developed during the process of sampling genotyping. Tags can also be treated as dominant markers for alternative genetic analysis in the absence of a reference genome. Other than the multiplex GBS skimming, imputation of missing SNPs has the potential to further reduce GBS costs. GBS is a versatile and cost-effective procedure that will allow mining genomes of any species without prior knowledge of its genome structure. [1]