A general theory for the processing and use of statistical observations. In a broader interpretation of the term, statistical decision theory is the theory of choosing an optimal non-deterministic behaviour in incompletely known situations.
Inverse problems of probability theory are a subject of mathematical statistics. Suppose that a random phenomenon
occurs, described qualitatively by the measure space
of all its elementary events
and quantitatively by a probability distribution
of the events. The statistician knows only the qualitative description of ,
and has only incomplete information on
of the type ,
where
is a family of probability distributions. By making one or more observations of
and processing the data thus obtained, the statistician has to make a decision on
and choose the most profitable way to proceed (in particular, it may be decided that insufficient material has been collected and that the set of observations has to be extended before final inferences be made). In classical problems of mathematical statistics, the number of independent observations (the size of the sample) was fixed and optimal estimators of the unknown distribution
were sought. The general modern conception of a statistical decision is attributed to A. Wald (see [2]). It is assumed that every experiment has a cost which has to be paid for, and the statistician must meet the loss of a wrong decision by paying the "fine" corresponding to his error. Therefore, from the statistician's point of view, a decision rule (procedure)
is optimal when it minimizes the risk —
the mathematical expectation of his total loss. This approach was proposed by Wald as the basis of statistical sequential analysis and led to the creation in statistical quality control of procedures which, with the same accuracy of inference, use on the average almost half the number of observations as the classical decision rule. In the formulation described, any statistical decision problem can be seen as a two-player game in the sense of J. von Neumann, in which the statistician is one of the players and nature is the other (see [3]). However, as early as 1820, P. Laplace had likewise described a statistical estimation problem as a game of chance in which the statistician is defeated if his estimates are bad.
The value of the risk
depends both on the decision rule
and on the probability distribution
that governs the distribution of the results of the observed phenomenon. As this "true" value of
is unknown, the entire risk function
has to be minimized with respect to
as a function in
for a given .
A decision rule
is said to be uniformly better than
if
for all
and
for at least one .
A decision rule
is said to be admissible if no uniformly-better decision rules exist. A class
of decision rules is said to be complete (essentially complete) if for any decision rule
there is a uniformly-better (not worse) decision rule .
The most important is a minimal complete class of decision rules which coincides (when it exists) with the set of all admissible decision rules. If the minimal complete class contains precisely one decision rule, then it will be optimal. Generally, the risk functions corresponding to admissible decision rules must also be compared by the value of some other functional, for example, the maximum risk. The optimal decision rule
in this sense,
is called the minimax rule. Comparison using the Bayesian risk is also possible:
— averaging the risk over an a priori probability distribution
on the family .
This choice of functional is natural, especially when sets of experiments are repeated with a fixed marginal distribution
in the -
th set, whereas the
prove to be a random series of measures with unknown distribution (
see Bayesian approach). The optimal decision rule in this sense,
is called the Bayesian decision rule with a priori distribution .
Finally, an a priori distribution
is said to be least favourable (for the given problem) if
Under very general assumptions it has been proved that: 1) for any a priori distribution ,
a Bayesian decision rule exists; 2) the totality of all Bayes decision rules and their limits forms a complete class; and 3) minimax decision rules exist and are Bayesian rules relative to the least-favourable a priori distribution, and (
see [4]). The concrete form of optimal decision rules essentially depends on the type of statistical problem. However, in classical problems of statistical estimation, the optimal decision rule when the samples are large depends weakly on the chosen method of comparing risk functions.
Decision rules in problems of statistical decision theory can be deterministic or randomized. Deterministic rules are defined by functions, for example by a measurable mapping of the space
of all samples
of size
onto a measurable space
of decisions .
Randomized rules are defined by Markov transition probability distributions of the form
from
into ,
which describe the probability distribution according to which the selected value
must also be independently "chosen" (see Statistical experiments, method of; Monte-Carlo method). The allowance of randomized procedures makes the set of decision rules of the problem convex, which greatly facilitates theoretical analysis. Moreover, problems exist in which the optimal decision rule is randomized. Even so, statisticians try to avoid them whenever possible in practice, since the use of tables or other sources of random numbers for "determining" inferences complicates the work and even may seem unscientific.
A statistical decision rule is by definition a transition probability distribution from a certain measurable space
of results of the experiment into a measurable space
of decisions. Conversely, every transition probability distribution
can be interpreted as a decision rule in any statistical decision problem with a measurable space
of results and a measurable space
of inferences (it can also be interpreted as a memoryless communication channel with input alphabet
and output alphabet ).
The statistical decision rules form an algebraic category with objects —
the totality of all probability distributions on measurable spaces ,
and morphisms — transition probability distributions of .
The invariants and equivariants of this category define many natural concepts and laws of mathematical statistics (see [5]). For example, an invariant Riemannian metric, unique up to a factor, exists on the objects of this category. It is defined by the Fisher information matrix. The morphisms of the category generate equivalence and order relations for parametrized families of probability distributions and for statistical decision problems, which permits one to give a natural definition of a sufficient statistic. The Kullback non-symmetrical information deviation ,
which characterizes the dissimilarity of the probability distributions
and (
see Information distance), is a monotone invariant in the category:
if ,
i.e. if
and
for a certain .
If in the problem of statistical estimation by a sample of fixed size
there is a need to estimate the actual marginal probability distribution
of the results of observations, which belongs a priori to a smooth family ,
then, given the choice
for an invariant loss function for the decision ,
the minimax risk proved to be
The logic of quantum events is not Aristotelean; random phenomena of the micro-physics are therefore not a subject of classical probability theory. The formalism designed to describe them accepts the existence of non-commuting random variables and contains the classical theory as a degenerate commutative scheme. In the corresponding interpretation, many problems of the theory of quantum-mechanical measurements become non-commutative analogues of problems of statistical decision theory (see [6]).
References[edit]
[1] | A. Wald, "Sequential analysis" , Wiley (1947) |
[2] | A. Wald, "Statistical decision functions" , Wiley (1950) |
[3] | J. von Neumann, O. Morgenstern, "The theory of games and economic behavior" , Princeton Univ. Press (1944) |
[4] | E.L. Lehmann, "Testing statistical hypotheses" , Wiley (1986) |
[5] | N.N. Chentsov, "Statistical decision rules and optimal inference" , Amer. Math. Soc. (1982) (Translated from Russian) |
[6] | A.S. Kholevo, "Probabilistic and statistical aspects of quantum theory" , North-Holland (1982) (Translated from Russian) |
References[edit]
[a1] | J.O. Berger, "Statistical decision theory and Bayesian analysis" , Springer (1985) |