Statistics models the collection, organization, analysis, interpretation, presentation of data, and is used to solve mathematical problems. Conclusions drawn from statistical analysis typically contain uncertainties, certainties or as they represent the probability of an event occurring. Statistics is fundamental to disciplines of science that involve predicting or classifying events based on a large set of data and is an integral part of fields such as machine learning, bioinformatics, genomics, and economics.
Statistics also encompasses the identification and study of statistical laws, which are statistical behaviors observed over a variety of datasets.[1] One common example is the Pareto Principle, which states that roughly 80% of effects are the result of 20% of causes, and is sometimes abbreviated as the 80/20 rule.[2]
Statistical inference addresses various issues, including Bayesian inference versus frequentist inference; the distinction between Fisher's "significance testing" and the Neyman-Pearson "hypothesis testing"; and whether the likelihood principle should be followed. Some of these issues have been subject to unresolved debate for up to two centuries.[3]
Bandyopadhyay & Forster[4] describe four statistical paradigms: classical statistics (or error statistics), Bayesian statistics, likelihood-based statistics, and the use of the Akaike Information Criterion as a statistical basis. More recently, Judea Pearl reintroduced a formal mathematics for attributing causality in statistical systems that addresses fundamental limitations of both Bayesian and Neyman-Pearson methods.
During the second quarter of the 20th century, the development of classical statistics led to the emergence of two competing models for inductive statistical testing.[5][6] The merits of these models were extensively debated[7] for over 25 years until Fisher's passing. Although a hybrid approach combining elements of both methods is commonly taught and utilized, the philosophical questions raised during the debate remain unresolved.
Fisher played a significant role in popularizing significance testing through his publications, such as "Statistical Methods for Research Workers" in 1925 and "The Design of Experiments" in 1935.[8] His aim was to achieve scientific experimental outcomes without bias from prior opinions. Significance testing is a probabilistic form of deductive inference, akin to modus tollens. A simplified statement of the test can be described as follows: "If the evidence contradicts the hypothesis to a sufficient degree, the hypothesis is rejected." In practice, a statistic is computed based on the experimental data, and the probability of obtaining a value greater than that statistic under a default or "null" model is compared to a predetermined threshold. This threshold represents the level of discord required (typically established by convention). One common application of this method is to determine whether a treatment has a noticeable effect based on a comparative experiment. In this case, the null hypothesis corresponds to the absence of a treatment effect, implying that the treated group and the control group are drawn from the same population. Statistical significance measures probability and does not address practical significance. It can be viewed as a criterion for the statistical signal-to-noise ratio. It is important to note that the test cannot prove the hypothesis (of no treatment effect), but it can provide evidence against it. The method relies on formulating an imaginary infinite population, representing the null hypothesis, within a specified statistical model.
The Fisherian significance test involves a single hypothesis, but the choice of the test statistic requires an understanding of relevant directions of deviation from the hypothesized model.
Neyman and Pearson collaborated on the problem of selecting the most appropriate hypothesis based solely on experimental evidence, which differed from significance testing. Their most renowned joint paper, published in 1933,[9] introduced the Neyman-Pearson lemma, which states that a ratio of probabilities serves as an effective criterion for hypothesis selection (with the choice of the threshold being arbitrary). The paper demonstrated the optimality of the Student's t-test, one of the significance tests. Neyman believed that hypothesis testing represented a generalization and improvement of significance testing. The rationale for their methods can be found in their collaborative papers.[10]
Hypothesis testing involves considering multiple hypotheses and selecting one among them, akin to making a multiple-choice decision. The absence of evidence is not an immediate factor to be taken into account. The method is grounded in the assumption of repeated sampling from the same population (the classical frequentist assumption), although Fisher criticized this assumption (Rubin, 2020).[11]
The duration of the dispute allowed for a comprehensive discussion of various fundamental issues in the field of statistics.
Repeated sampling of the same population
Type II errors
Inductive behavior
Fisher's attack on inductive behavior has been largely successful because he selected the field of battle. While operational decisions are routinely made on a variety of criteria (such as cost), scientific conclusions from experimentation are typically made based on probability alone. Fisher's theory of fiducial inference is flawed
A purely probabilistic theory of tests requires an alternative hypothesis
Fisher's attacks on type II errors have faded with time. In the intervening years, statistics have separated the exploratory from the confirmatory. In the current environment, the concept of type II errors are used in power calculations for confirmatory hypothesis tests sample size determination.
Fisher's attack based on frequentist probability failed but was not without result. He identified a specific case (2×2 table) where the two schools of testing reach different results. This case is one of several that are still troubling. Commentators believe that the "right" answer is context-dependent.[14] Fiducial probability has not fared well, being virtual without advocates, while frequentist probability remains a mainstream interpretation.
Fisher's attacks on type II errors have faded with time. In the intervening years, statistics have separated the exploratory from the confirmatory. In the current environment, the concept of type II errors are used in power calculations for confirmatory hypothesis tests sample size determination.
Fisher's attack on inductive behavior has been largely successful because he selected the field of battle. While ''operational decisions'' are routinely made on a variety of criteria (such as cost), ''scientific conclusions'' from experimentation are typically made based on probability alone.
During this exchange, Fisher also discussed the requirements for inductive inference, specifically criticizing cost functions that penalize erroneous judgments. Neyman countered by mentioning the use of such functions by Gauss and Laplace. These arguments occurred 15 years after textbooks began teaching a hybrid theory of statistical testing.
Fisher and Neyman held different perspectives on the foundations of statistics (though they both opposed the Bayesian viewpoint):[14]
Fisher and Neyman diverged in their attitudes and, perhaps, their language. Fisher was a scientist and an intuitive mathematician, and inductive reasoning came naturally to him. Neyman, on the other hand, was a rigorous mathematician who relied on deductive reasoning rather than probability calculations based on experiments.[5] Hence, there was an inherent clash between applied and theoretical approaches (between science and mathematics).
In 1938, Neyman relocated to the West Coast of the United States of America, effectively ending his collaboration with Pearson and their work on hypothesis testing.[5] Subsequent developments in the field were carried out by other researchers.
By 1940, textbooks began presenting a hybrid approach that combined elements of significance testing and hypothesis testing.[16] However, none of the main contributors were directly involved in the further development of the hybrid approach currently taught in introductory statistics.[6]
Statistics subsequently branched out into various directions, including decision theory, Bayesian statistics, exploratory data analysis, robust statistics, and non-parametric statistics. Neyman-Pearson hypothesis testing made significant contributions to decision theory, which is widely employed, particularly in statistical quality control. Hypothesis testing also extended its applicability to incorporate prior probabilities, giving it a Bayesian character. While Neyman -Pearson hypothesis testing has evolved into an abstract mathematical subject taught at the post-graduate level,[17] much of what is taught and used in undergraduate education under the umbrella of hypothesis testing can be attributed to Fisher.
There have been no major conflicts between the two classical schools of testing in recent decades, although occasional criticism and disputes persist. However, it is highly unlikely that one theory of statistical testing will completely supplant the other in the foreseeable future.
The hybrid approach, which combines elements from both competing schools of testing, can be interpreted in different ways. Some view it as an amalgamation of two mathematically complementary ideas,[14] while others see it as a flawed union of philosophically incompatible concepts.[18] Fisher's approach had certain philosophical advantages, while Neyman and Pearson emphasized rigorous mathematics. Hypothesis testing remains a subject of controversy for some users, but the most widely accepted alternative method, confidence intervals, is based on the same mathematical principles.
Due to the historical development of testing, there is no single authoritative source that fully encompasses the hybrid theory as it is commonly practiced in statistics. Additionally, the terminology used in this context may lack consistency. Empirical evidence indicates that individuals, including students and instructors in introductory statistics courses, often have a limited understanding of the meaning of hypothesis testing.[19]
Two distinct interpretations of probability have existed for a long time, one based on objective evidence and the other on subjective degrees of belief. The debate between Gauss and Laplace could have taken place more than 200 years ago, giving rise to two competing schools of statistics. Classical inferential statistics emerged primarily during the second quarter of the 20th century,[6] largely in response to the controversial principle of indifference used in Bayesian probability at that time. The resurgence of Bayesian inference was a reaction to the limitations of frequentist probability, leading to further developments and reactions.
While the philosophical interpretations have a long history, the specific statistical terminology is relatively recent. The terms "Bayesian" and "frequentist" became standardized in the second half of the 20th century.[20] However, the terminology can be confusing, as the "classical" interpretation of probability aligns with Bayesian principles, while "classical" statistics follow the frequentist approach. Moreover, even within the term "frequentist," there are variations in interpretation, differing between philosophy and physics.
The intricate details of philosophical probability interpretations are explored elsewhere. In the field of statistics, these alternative interpretations allow for the analysis of different datasets using distinct methods based on various models, aiming to achieve slightly different objectives. When comparing the competing schools of thought in statistics, pragmatic criteria beyond philosophical considerations are taken into account.
Fisher and Neyman were significant figures in the development of frequentist (classical) methods.[5] While Fisher had a unique interpretation of probability that differed from Bayesian principles, Neyman adhered strictly to the frequentist approach. In the realm of Bayesian statistical philosophy, mathematics, and methods, de Finetti,[21] Jeffreys,[22] and Savage[23] emerged as notable contributors during the 20th century. Savage played a crucial role in popularizing de Finetti's ideas in English-speaking regions and establishing rigorous Bayesian mathematics. In 1965, Dennis Lindley's two-volume work titled "Introduction to Probability and Statistics from a Bayesian Viewpoint" played a vital role in introducing Bayesian methods to a wide audience. Over the course of three generations, statistics have progressed significantly, and the views of early contributors are not necessarily considered authoritative in present times.
The earlier description briefly highlights frequentist inference, which encompasses Fisher's "significance testing" and Neyman-Pearson's "hypothesis testing." Frequentist inference incorporates various perspectives and allows for scientific conclusions, operational decisions, and parameter estimation with or without confidence intervals.
A classical frequency distribution provides information about the probability of the observed data. By applying Bayes' theorem, a more abstract concept is introduced, which involves estimating the probability of a hypothesis (associated with a theory) given the data. This concept, formerly referred to as "inverse probability," is realized through Bayesian inference. Bayesian inference involves updating the probability estimate for a hypothesis as new evidence becomes available. It explicitly considers both the evidence and prior beliefs, enabling the incorporation of multiple sets of evidence.
Frequentists and Bayesians employ distinct probability models. Frequentists typically view parameters as fixed but unknown, whereas Bayesians assign probability distributions to these parameters. As a result, Bayesians discuss probabilities that frequentists do not acknowledge. Bayesians consider the probability of a theory, whereas true frequentists can only assess the evidence's consistency with the theory. For instance, a frequentist does not claim a 95% probability that the true value of a parameter falls within a confidence interval; rather, they state that 95% of confidence intervals encompass the true value.
Bayesian | Frequentist | |
---|---|---|
Basis | Belief (prior) | Behavior (method) |
Resulting Characteristic | Principled Philosophy | Opportunistic Methods |
Distributions | One distribution | Many distributions (bootstrap?) |
Ideal Application | Dynamic (repeated sampling) | Static (one sample) |
Target Audience | Individual (subjective) | Community (objective) |
Modeling Characteristic | Aggressive | Defensive |
Bayesian | Frequentist | |
---|---|---|
Strengths |
|
|
Weaknesses |
|
|
Both the frequentist and Bayesian schools are subject to mathematical critique, and neither readily embraces such criticism. For instance, Stein's paradox highlights the intricacy of determining a "flat" or "uninformative" prior probability distribution in high-dimensional spaces.[3] While Bayesians perceive this as tangential to their fundamental philosophy, they find frequentism plagued with inconsistencies, paradoxes, and unfavorable mathematical behavior. Frequent travelers can account for most of these issues. Certain "problematic" scenarios, like estimating the weight variability of a herd of elephants based on a single measurement ("Basu's elephants"), exemplify extreme cases that defy statistical estimation. The principle of likelihood has been a contentious arena of debate.
Both the frequentist and Bayesian schools have demonstrated notable accomplishments in addressing practical challenges. Classical statistics, with its reliance on mechanical calculators and specialized printed tables, boasts a longer history of obtaining results. Bayesian methods, on the other hand, have shown remarkable efficacy in analyzing sequentially sampled information, such as radar and sonar data. Several Bayesian techniques, as well as certain recent frequentist methods like the bootstrap, necessitate the computational capabilities that have become widely accessible in the past few decades. There is an ongoing discourse regarding the integration of Bayesian and frequentist approaches,[25] although concerns have been raised regarding the interpretation of results and the potential diminishment of methodological diversity.
Bayesians share a common stance against the limitations of frequentism, but they are divided into various philosophical camps (empirical, hierarchical, objective, personal, and subjective), each emphasizing different aspects. A philosopher of statistics from the frequentist perspective has observed a shift from the statistical domain to philosophical interpretations of probability over the past two generations.[27] Some perceive that the successes achieved with Bayesian applications do not sufficiently justify the associated philosophical framework.[28] Bayesian methods often develop practical models that deviate from traditional inference and have minimal reliance on philosophy.[29] Neither the frequentist nor the Bayesian philosophical interpretations of probability can be considered entirely robust. The frequentist view is criticized for being overly rigid and restrictive, while the Bayesian view can encompass both objective and subjective elements, among others.
In common usage, likelihood is often considered synonymous with probability. However, according to statistics, this is not the case. In statistics, probability refers to variable data given a fixed hypothesis, whereas likelihood refers to variable hypotheses given a fixed set of data. For instance, when making repeated measurements with a ruler under fixed conditions, each set of observations corresponds to a probability distribution, and the observations can be seen as a sample from that distribution, following the frequentist interpretation of probability. On the other hand, a set of observations can also arise from sampling various distributions based on different observational conditions. The probabilistic relationship between a fixed sample and a variable distribution stemming from a variable hypothesis is referred to as likelihood, representing the Bayesian view of probability. For instance, a set of length measurements may represent readings taken by observers with specific characteristics and conditions.
Likelihood is a concept that was introduced and developed by Fisher over a span of more than 40 years, although earlier references to the concept exist and Fisher's support for it was not wholehearted.[34] The concept was subsequently accepted and substantially revised by Jeffreys.[35] In 1962, Birnbaum "proved" the likelihood principle based on premises that were widely accepted among statisticians,[36] although his proof has been subject to dispute by statisticians and philosophers. Notably, by 1970, Birnbaum had rejected one of these premises (the conditionality principle) and had also abandoned the likelihood principle due to their incompatibility with the frequentist "confidence concept of statistical evidence."[37][38] The likelihood principle asserts that all the information in a sample is contained within the likelihood function, which is considered a valid probability distribution by Bayesians but not by frequentists.
Certain significance tests employed by frequentists are not consistent with the likelihood principle. Bayesians, on the other hand, embrace the principle as it aligns with their philosophical standpoint (perhaps in response to frequentists' discomfort). The likelihood approach is compatible with Bayesian statistical inference, where the posterior Bayes distribution for a parameter is derived by multiplying the prior distribution by the likelihood function using Bayes's Theorem.[34] Frequentists interpret the likelihood principle unfavorably, as it suggests a lack of concern for the reliability of evidence. The likelihood principle, according to Bayesian statistics, implies that information about the experimental design used to collect evidence does not factor into the statistical analysis of the data.[39] Some Bayesians, including Savage,[citation needed] acknowledge this implication as a vulnerability.
The likelihood principle's staunchest proponents argue that it provides a more solid foundation for statistics compared to the alternatives presented by Bayesian and frequentist approaches.[40] These supporters include some statisticians and philosophers of science.[41] While Bayesians recognize the importance of likelihood for calculations, they contend that the posterior probability distribution serves as the appropriate basis for inference.[42]
Inferential statistics relies on statistical models. Classical hypothesis testing, for instance, has often relied on the assumption of data normality. To reduce reliance on this assumption, robust and nonparametric statistics have been developed. Bayesian statistics, on the other hand, interpret new observations based on prior knowledge, assuming continuity between the past and present. The experimental design assumes some knowledge of the factors to be controlled, varied, randomized, and observed. Statisticians are aware of the challenges in establishing causation, often stating that "correlation does not imply causation," which is more of a limitation in modeling than a mathematical constraint.
As statistics and data sets have become more complex,[lower-alpha 1][lower-alpha 2] questions have arisen regarding the validity of models and the inferences drawn from them. There is a wide range of conflicting opinions on modeling.
Models can be based on scientific theory or ad hoc data analysis, each employing different methods. Advocates exist for each approach.[44] Model complexity is a trade-off and less subjective approaches such as the Akaike information criterion and Bayesian information criterion aim to strike a balance.[45]
Concerns have been raised even about simple regression models used in the social sciences, as a multitude of assumptions underlying model validity are often neither mentioned nor verified. In some cases, a favourable comparison between observations and the model is considered sufficient.[46]
Traditional observation-based models often fall short in addressing many significant problems, requiring the utilization of a broader range of models, including algorithmic ones. "If the model is a poor emulation of nature, the conclusions may be wrong."[47]
Modeling is frequently carried out inadequately, with improper methods employed, and the reporting of models is often subpar.[48]
Given the lack of a strong consensus on the philosophical review of statistical modeling, many statisticians adhere to the cautionary words of George Box: "All models are wrong, but some are useful."
For a concise introduction to the fundamentals of statistics, refer to Stuart, A.; Ord, J.K. (1994). "Ch. 8 – Probability and statistical inference" in Kendall's Advanced Theory of Statistics, Volume I: Distribution Theory (6th ed.), published by Edward Arnold.
In his book Statistics as Principled Argument, Robert P. Abelson presents the perspective that statistics serve as a standardized method for resolving disagreements among scientists, who could otherwise engage in endless debates about the merits of their respective positions. From this standpoint, statistics can be seen as a form of rhetoric. However, the effectiveness of statistical methods depends on the consensus among all involved parties regarding the chosen approach.[49]
Original source: https://en.wikipedia.org/wiki/Foundations of statistics.
Read more |