Variance

in probability theory

2020 Mathematics Subject Classification: Primary: 60-01 [MSN][ZBL]

The measure $\newcommand{\Var}{\operatorname{Var}} \newcommand{\Ex}{\mathop{\mathsf{E}}} \newcommand{\Prob}{\mathop{\mathsf{P}}} \Var X$ of the deviation of a random variable $X$ from its mathematical expectation $\Ex X$ defined by the equation: $$\begin{equation}\label{eq:1} \Var X = \Ex(X-\Ex X)^2. \end{equation}$$

The properties of the variance are: $$\begin{equation} \Var X = \Ex X^2 - (\Ex X)^2; \end{equation}$$ if $c$ is a real number, then $$\begin{equation} \Var (cX) = c^2\Var X, \end{equation}$$ in particular, $\Var(-X) = \Var X$.

In speaking of the variance of a random variable $X$, it is always assumed that its expectation $\Ex X$ exists; the variance $\Var X$ may exist (i.e. be finite) or may not (i.e. be infinite). In modern probability theory the expectation of a random variable is defined in terms of the Lebesgue integral over the sample space. However, formulas expressing the expectation of various functions of a random variable $X$ in terms of the distribution of this variable on the set of real numbers are of importance (cf. Mathematical expectation). For the variance $\Var X$ these formulas are

a) $$\begin{equation} \Var X = \sum_i(a_i-\Ex X)^2p_i, \end{equation}$$ for a discrete random variable $X$ which assumes at most a countable number of different values $a_i$ with probabilities $p_i=\Prob\{X=a_i\}$;

b) $$\begin{equation} \Var X = \int\limits_{-\infty}^{\infty}(x-\Ex X)^2p(x)\,dx, \end{equation}$$ for a random variable $X$ with a density $p$ of the probability distribution;

c) $$\begin{equation} \Var X = \int\limits_{-\infty}^{\infty}(x-\Ex X)^2\,dF(x), \end{equation}$$ in the general case; here $F$ is the distribution function of the random variable $X$, and the integral is understood in the sense of Lebesgue–Stieltjes or Riemann–Stieltjes.

The variance is not the only conceivable measure of the deviation of a random variable from its expectation. Other measures of the deviation, constructed on the same principle, e.g. $\Ex|X-\Ex X|$, $\Ex(X-\Ex X)^4$, etc., are also possible, as are measures of deviation based on quantiles (cf. Quantile). The importance of the variance is mainly due to the role played by this concept in limit theorems. Roughly speaking, one may say that if the expectation and variance of the sum of a large number of random variables are known, it is possible to describe completely the distribution law of this sum: It is (approximately) normal, with corresponding parameters (cf. Normal distribution). Thus, the most important properties of the variance are connected with the expression for the variance $\Var(X_1+\cdots+X_n)$ of the sum of random variables $X_1,\dots, X_n$:

$$ {\mathsf D} ( X _{1} + \dots + X _{n} ) \ = \ \sum _ {i = 1} ^ n {\mathsf D} X _{i} + 2 \sum _ {i < j} \mathop{\rm cov}\nolimits ( X _{i} ,\ X _{j} ) , $$

where

$$ \mathop{\rm cov}\nolimits ( X _{i} ,\ X _{j} ) \ = \ {\mathsf E} \{ ( X _{i} - {\mathsf E} X _{i} ) ( X _{j} - {\mathsf E} X _{j} ) \} $$

denotes the covariance of the random variables $ X _{i} $ and $ X _{j} $. If the random variables $ X _{1} \dots X _{n} $ are pairwise independent, then $ \mathop{\rm cov}\nolimits ( X _{i} ,\ X _{j} ) = 0 $. Accordingly, the equation

$$ \tag{7} {\mathsf D} ( X _{1} + \dots + X _{n} ) \ = \ {\mathsf D} X _{1} + \dots + {\mathsf D} X _{n} $$

is valid for pairwise independent random variables. The converse proposition is not valid: (7) does not entail independence. Nevertheless, the utilization of (7) is usually based on the independence of the random variables. Strictly speaking, a sufficient condition for the validity of (7) is that $ \mathop{\rm cov}\nolimits ( X _{i} ,\ X _{j} ) = 0 $, i.e. the random variables $ X _{1} \dots X _{n} $ need to be pairwise uncorrelated.

The applications of the concept of the variance have had two directions of development. The first is in the limit theorems of probability theory. If, for a sequence of random variables $ X _{1} ,\ X _{2} \dots $ one has $ D X _{n} \rightarrow 0 $ as $ n \rightarrow \infty $, then for any $ \epsilon > 0 $,

$$ {\mathsf P} \{ | X _{n} - {\mathsf E} X _{n} | > \epsilon \} \ \rightarrow \ 0 $$

as $ n \rightarrow \infty $( cf. Chebyshev inequality in probability theory), i.e. if $ n $ is large the random variable $ X _{n} $ becomes practically identical with the non-random variable $ {\mathsf E} X _{n} $. The development of these concepts yields a proof of the law of large numbers, of the consistency of estimators (cf. Consistent estimator) in mathematical statistics, and also leads to other applications in which convergence in probability is established for random variables. Another application to limit theorems is connected with the concept of normalization. Normalization of a random variable $ X $ is effected by subtracting the expectation and dividing by the square root of the variance $ \sqrt { {\mathsf D} X} $; in other words, the variable $ Y = ( X - {\mathsf E} X ) / \sqrt { {\mathsf D} X} $ is considered. Normalization of a sequence of random variables is usually necessary in order to obtain a convergent sequence of distribution laws, in particular, convergence to the normal law with parameters zero and one. The second direction consists in the application of the concept of the variance in mathematical statistics to sample processing. If a random variable is considered as the realization of a random experiment, an arbitrary change in the numerical scale converts the random variable $ X $ to $ Y = \sigma X + a $, where $ a $ is an arbitrary random number and $ \sigma $ is a positive number. It is accordingly meaningful, in many cases, to consider not the one theoretical distribution law $ F (x) $ of the random variable $ X $ alone, but rather the type of the law, i.e. the family of distribution laws of the type $F((x-a)/\sigma)$, which is a function of at least two parameters $ a $ and $ \sigma $. If $ {\mathsf E} X = 0 $, $ {\mathsf D} X = 1 $, then $ {\mathsf E} X = a $ and $ {\mathsf D} Y = \sigma ^{2} $. Accordingly, the meaning of the parameters in the theoretical law is $ a = {\mathsf E} Y $ and $ \sigma = \sqrt { {\mathsf D} Y} $. This makes it possible to determine these parameters by sampling.

References[edit]

[G]	B.V. Gnedenko, "The theory of probability", Chelsea, reprint (1962) (Translated from Russian)
[F]	W. Feller, "An introduction to probability theory and its applications", 1–2, Wiley (1957–1971)
[C]	H. Cramér, "Mathematical methods of statistics", Princeton Univ. Press (1946) MR0016588 Zbl 0063.01014

Comments[edit]

Dispersion is usually termed variance in English, and one accordingly uses $ \mathop{\rm Var}\nolimits \ X $ instead of $ {\mathsf D} X $.