The IUPAC International Chemical Identifier (InChI), developed by IUPAC and NIST, is a digital equivalent of the IUPAC name for any particular covalent compound. Chemical structures are expressed in terms of five layers of information — connectivity, tautomeric, isotopic, stereochemical, and electronic. The stated aim of the InChI is to provide a standard way to structure and encode molecular information.[1]
The InChI algorithm converts input structural information into the InChI identifier in a three-step process: normalization (to remove redundant information), canonicalization (to generate a unique set of atom labels), and serialization (to give a string of characters).
The InChIKey, sometimes referred to as a hashed InChI, is a fixed length (25 character) condensed digital representation of the InChI. It was released in September 2007 in order to facilitate web searches for chemical compounds, since these were problematical with the full-length InChI.[2]
CH3CH2OH ethanol |
InChI=1/C2H6O/c1-2-3/h3H,2H2,1H3 |
L-ascorbic acid |
InChI=1/C6H8O6/c7-1-2(8)5-3(9)4(10)6(11)12-5/h2,5,7-10H,1H2/t2-,5+/m0/s1 |
There are six InChI layer types:
Each layer can be split into sub-layers. For example, the main layer can be split up into three sub-layers:
Layers and sub-layers are both separated by the "/" delimiter. All layers and sub-layers (except for the chemical formula sub-layer of the main layer) start with a lower-case letter indicating the type of information held in that layer.
The condensed, 25 character InChIKey is a hashed version of the full InChI, designed to allow for easy web searches of chemical compounds.[2] Most chemical structures on the Web up to 2007 have been represented as GIF files, which are not searchable for chemical content. The full InChI turned out to be too lengthy for easy searching, and therefore the InChIKey was developed. There is a very small, but finite chance of two different molecules having the same InChIKey, but the probability for duplication of only the first 14 characters has been estimated as only one duplication in 75 databases each containing one billion unique structures. With all databases currently having below 50 million structures, such duplication appears unlikely at present.
Morphine has the structure shown on right.
The InChI for morphine is InChI=1/C17H19NO3/c1-18-7-6-17-10-3-5-13(20)16(17)21-15-12(19)4-2-9(14(15)17)8-11(10)18/h2-5,10-11,13,16,19-20H,6-8H2,1H3/t10-,11-,13-,16-,17-/m0/s1
but the InChIKey for morphine is simply BQJCRHHNABKAKU-XKUOQXLYBY [3]
|volueme=
ignored (help)