An energy-based model (EBM) is a form of generative model (GM) imported directly from statistical physics to learning. GMs learn an underlying data distribution by analyzing a sample dataset. Once trained, a GM can produce other datasets that also match the data distribution.[1] EBMs provide a unified framework for many probabilistic and non-probabilistic approaches to such learning, particularly for training graphical and other structured models.[2]
An EBM learns the characteristics of a target dataset and generates a similar but larger dataset. EBMs detect the latent variables of a dataset and generate new datasets with a similar distribution.[2]
Target applications include natural language processing, robotics and computer vision.[2]
The term "energy-based models" was first coined in a JMLR paper [3] where the authors defined a generalisation of independent components analysis to the overcomplete setting using EBMs. Other early work on EBMs proposed models that represented energy as a composition of latent and observable variables. EBMs surfaced in 2003.[4]
EBMs capture dependencies by associating an unnormalized probability scalar (energy) to each configuration of the combination of observed and latent variables. Inference consists of finding (values of) latent variables that minimize the energy given a set of (values of) the observed variables. Similarly, the model learns a function that associates low energies to correct values of the latent variables, and higher energies to incorrect values.[2]
Traditional EBMs rely on stochastic gradient-descent (SGD) optimization methods that are typically hard to apply to high-dimension datasets. In 2019, OpenAI publicized a variant that instead used Langevin dynamics (LD). LD is an iterative optimization algorithm that introduces noise to the estimator as part of learning an objective function. It can be used for Bayesian learning scenarios by producing samples from a posterior distribution.[2]
EBMs do not require that energies be normalized as probabilities. In other words, energies do not need to sum to 1. Since there is no need to estimate the normalization constant like probabilistic models do, certain forms of inference and learning with EBMs are more tractable and flexible.[2]
Samples are generated implicitly via a Markov chain Monte Carlo approach.[5] A replay buffer of past images is used with LD to initialize the optimization module.[2]
EBMs demonstrate useful properties:[2]
On image datasets such as CIFAR-10 and ImageNet 32x32, an EBM model generated high-quality images relatively quickly. It supported combining features learned from one type of image for generating other types of images. It was able to generalize using out-of-distribution datasets, outperforming flow-based and autoregressive models. EBM was relatively resistant to adversarial perturbations, behaving better than models explicitly trained against them with training for classification.[2]
EBMs compete with techniques such as variational autoencoders (VAEs) or Generative Adversarial Neural Networks (GANs).[2]