Short description: Statistical optimization technique
Bayesian optimization is a sequential design strategy for global optimization of black-box functions[1][2][3] that does not assume any functional forms. It is usually employed to optimize expensive-to-evaluate functions.
The term is generally attributed to Jonas Mockus (lt) and is coined in his work from a series of publications on global optimization in the 1970s and 1980s.[4][5][1]
Strategy
Bayesian optimization of a function (black) with Gaussian processes (purple). Three acquisition functions (blue) are shown at the bottom.[6]
Bayesian optimization is typically used on problems of the form [math]\displaystyle{ \max_{x \in A} f(x) }[/math], where [math]\displaystyle{ A }[/math] is a set of points, [math]\displaystyle{ x }[/math], which rely upon less than 20 dimensions ([math]\displaystyle{ \mathbb{R}^d, d \le 20 }[/math]), and whose membership can easily be evaluated. Bayesian optimization is particularly advantageous for problems where [math]\displaystyle{ f(x) }[/math] is difficult to evaluate due to its computational cost. The objective function, [math]\displaystyle{ f }[/math], is continuous and takes the form of some unknown structure, referred to as a "black box". Upon its evaluation, only [math]\displaystyle{ f(x) }[/math] is observed and its derivatives are not evaluated.[7]
Since the objective function is unknown, the Bayesian strategy is to treat it as a random function and place a prior over it. The prior captures beliefs about the behavior of the function. After gathering the function evaluations, which are treated as data, the prior is updated to form the posterior distribution over the objective function. The posterior distribution, in turn, is used to construct an acquisition function (often also referred to as infill sampling criteria) that determines the next query point.
There are several methods used to define the prior/posterior distribution over the objective function. The most common two methods use Gaussian processes in a method called kriging. Another less expensive method uses the Parzen-Tree Estimator to construct two distributions for 'high' and 'low' points, and then finds the location that maximizes the expected improvement.[8]
Standard Bayesian optimization relies upon each [math]\displaystyle{ x \in A }[/math] being easy to evaluate, and problems that deviate from this assumption are known as exotic Bayesian optimization problems. Optimization problems can become exotic if it is known that there is noise, the evaluations are being done in parallel, the quality of evaluations relies upon a tradeoff between difficulty and accuracy, the presence of random environmental conditions, or if the evaluation involves derivatives.[7]
Acquisition functions
Examples of acquisition functions include
probability of improvement
expected improvement
Bayesian expected losses
upper confidence bounds (UCB) or lower confidence bounds
and hybrids of these.[9] They all trade-off exploration and exploitation so as to minimize the number of function queries. As such, Bayesian optimization is well suited for functions that are expensive to evaluate.
Bayesian Optimization has been applied in the field of facial recognition[34]. The performance of the Histogram of Oriented Gradients (HOG) algorithm, a popular feature extraction method, heavily relies on its parameter settings. Optimizing these parameters can be challenging but crucial for achieving high accuracy[34]. A novel approach to optimize the HOG algorithm parameters and image size for facial recognition using a Tree-structured Parzen Estimator (TPE) based Bayesian optimization technique has been proposed[34]. This optimized approach has the potential to be adapted for other computer vision applications and contributes to the ongoing development of hand-crafted parameter-based feature extraction algorithms in computer vision[34].
↑Močkus, Jonas (1975). "On bayesian methods for seeking the extremum". Optimization Techniques IFIP Technical Conference Novosibirsk, July 1–7, 1974. Lecture Notes in Computer Science. 27. pp. 400–404. doi:10.1007/3-540-07165-2_55. ISBN978-3-540-07165-5.
↑Močkus, Jonas (1977). "On Bayesian Methods for Seeking the Extremum and their Application". IFIP Congress: 195–200.
↑Matthew W. Hoffman, Eric Brochu, Nando de Freitas: Portfolio Allocation for Bayesian Optimization. Uncertainty in Artificial Intelligence: 327–336 (2011)
↑ 34.034.134.234.3Mohammed Mehdi Bouchene: Bayesian Optimization of Histogram of Oriented Gradients (Hog) Parameters for Facial Recognition. SSRN (2023)