Short description: Artificial intelligence model paradigm
A foundation model is a large artificial intelligence model trained on a vast quantity of unlabeled data at scale (usually by self-supervised learning) resulting in a model that can be adapted to a wide range of downstream tasks.[1] Foundation models are behind a major transformation in how AI systems are built since their introduction in 2018. Early examples of foundation models were large pre-trained language models including BERT[2] and GPT-3. Subsequently, several multimodal foundation models have been produced including DALL-E, Flamingo,[3] and Florence.[4] The Stanford Institute for Human-Centered Artificial Intelligence's (HAI) Center for Research on Foundation Models (CRFM) popularized the term.[1]
The Stanford Institute for Human-Centered Artificial Intelligence's (HAI) Center for Research on Foundation Models (CRFM) described a foundation model as a "paradigm for building AI systems" in which a model trained on a large amount of unlabeled data can be adapted to many applications.[5][6]
History
An early concept of a foundation model was found in I. J. Good's 1965 treatise entitled "Speculations Concerning the First Ultraintelligent Machine"[7][8] Stanley Kubrick's HAL 9000 supercomputer in his 1968 2001: A Space Odyssey was modelled after Good's ultraintelligent machine.[9]
Opportunities and risks
A 2021 arXiv report listed foundation models' capabilities in regards to "language, vision, robotics, reasoning, and human interaction", technical principles, such as "model architectures, training procedures, data, systems, security, evaluation, and theory, their applications, for example in law, healthcare, and education and their potential impact on society, including "inequity, misuse, economic and environmental impact, legal and ethical considerations".[10]