Boosting is a machine learning meta-algorithm for performing supervised learning. Boosting is based on the question posed by Kearns[1]: can a set of weak learners create a single strong learner? A weak learner is defined to be a classifier which is only slightly correlated with the true classification. In contrast, a strong learner is a classifier that is arbitrarily well-correlated with the true classification.
The affirmative answer to Kearns question has significant ramifications in machine learning and statistics.
While boosting is not algorithmically constrained, most boosting algorithms follow a template. Typically boosting occurs in iterations, by incrementally adding weak learners to a final strong learner. At every iteration, a weak learner learns the training data with respect to a distribution. The weak learner is then added to the final strong learner. This is typically done by weighting the weak learner in some manner, which is typically related to the weak learner's accuracy. After the weak learner is added to the final strong learner, the data is reweighted: examples that are misclassified gain weight and examples that are classified correctly lose weight (some boosting algorithms will actually decrease the weight of repeatedly misclassified examples, e.g. boost by majority and BrownBoost). Thus, future weak learners will focus more on the examples that previous weak learners misclassified.
There are many boosting algorithms. The original algorithms proposed by Rob Schapire (a recursive majority gate formulation [2]) and Yoav Freund (boost by majority [3]) were not adaptive and could not take full advantage of the weak learners.
Some algorithms refer to themselves as "boosting algorithms," and these algorithms can be quite effective. However, in the probably approximately correct learning (PAC) model, only provable boosting algorithms should adopt the title "boosting." Algorithms that are similar to boosting in spirit, but not provably PAC-boosters, are sometimes called "leveraging algorithms." These can be quite effective machine learning algorithms [4].
The main variation between many boosting algorithms is their method of weighting training data points and hypotheses. AdaBoost is very popular and perhaps the most significant historically as it was the first algorithm that could adapt to the weak learners. However, there are many more recent algorithms such as LPBoost, TotalBoost, BrownBoost,MadaBoost, LogitBoost, and others. Many boosting algorithms fit into the AnyBoost framework[5], which shows that boosting performs gradient descent in function space using a convex cost function.