The Genomic Standards Consortium (GSC) is an initiative working towards richer descriptions of our collection of genomes, metagenomes and marker genes. Established in September 2005,[1] this international community includes representatives from a range of major sequencing and bioinformatics centres (including NCBI, EMBL, DDBJ, JCVI, JGI, EBI, Sanger, FIG) and research institutions. The goal of the GSC is to promote mechanisms for standardizing the description of (meta)genomes, including the exchange and integration of (meta)genomic data. The number and pace of genomic and metagenomic sequencing projects will only increase as the use of ultra-high-throughput methods becomes common place and standards are vital to scientific progress and data sharing.
Community-driven standards have the best chance of success if developed within the auspices of international working groups. Participants in the GSC include biologists, computer scientists, those building genomic databases and conducting large-scale comparative genomic analyses, and those with experience of building community-based standards. The mission of the GSC is to work with the wider community towards:
Fulfilling this mission by holding face-to-face meetings, forming working groups, and building consensus products that can be widely used in this community. Bringing together investigators working in different systems to work on a common problem.[2]
The GSC has published a “Minimum Information about a (Meta)Genome Sequence” specification and has now completed a "Minimum Information about an ENvironmental Sequence" specification. MIGS/MIMS/MIMARKS provides an extension of the minimum information already captured by the primary nucleotide sequence archives (INSDC or DDBJ/ENA/GenBank). The development of any checklist must be an open and iterative process that involves a balanced group of participants. Further, this development process must be supported by providing mechanisms for achieving compliance if a checklist is to be adopted as a tool for the standardization of a particular area of knowledge. Work towards this goal has spawned a set of interlocking projects that are described in more detail here: GSC projects. These include The Genomic Contextual Data Markup Language (GCDML), Genomic Rosetta Stone (GRS), Habitat-Lite. Newer projects include the M5 project.
The GSC is interested in making and building links with other communities. As stated above, the GSC is engaged in ontology development within the OBO Foundry. The GSC is also a founding member community of the Minimum Information about a Biomedical or Biological Investigation (MIBBI), an umbrella community for supporting and co-ordinating the development of checklists describing Minimum Information Standards.
GSC and the Earth Microbiome Project maintain the Biological Observation Matrix (BIOM) file format, an open JSON-based file format for representing arbitrary observation by sample contingency tables with associated sample and observation metadata.[3]
The GSC maintains a list of publications on its wiki - GSC Publications. This list includes reports from all workshops, articles from the special issue of the journal OMICS on data standards, and the publications describing the MIGS/MIMS and MIMARKS specifications in the journal Nature Biotechnology (May 2008 and May 2011 respectively). The GSC has also published a series of papers "Genomic Standards Consortium and Beyond" in the journal GigaScience.[4][2]