Informant

The gradient of the logarithmic likelihood function. The concept of the informant arose in so-called parametric problems in mathematical statistics. Suppose one has the a priori information that an observed random phenomenon can be described by a probability distribution $ P ^ \theta ( d \omega ) $ from a family $ \{ {P ^ {t} } : {t \in \Theta } \} $, where $ t $ is a numerical or vector parameter, but for which the true value of $ \theta $ is unknown. The observation (series of independent observations) made led to the outcome $ \omega $( series of outcomes $ \omega ^ {(1)} \dots \omega ^ {(N)} $). It is required to estimate $ \theta $ from the outcome(s). Suppose that the family $ \{ {P ^ {t} } : {t \in \Theta } \} $ is given by a family of densities $ p ( \omega ; t ) $ with respect to a measure $ \mu ( d \omega ) $ on the space $ \Omega $ of outcomes of observations. If $ \Omega $ is discrete, then the probabilities $ P ^ {t} ( \omega ) $ itself can be taken for $ p ( \omega ; t ) $. For $ \omega $ fixed, $ p ( \omega ; t ) $, as a function of $ t = ( t _ {1} \dots t _ {m} ) $, is called a likelihood function, and its logarithm is called a logarithmic likelihood function.

For smooth families the informant can conveniently be introduced as the vector

$$ \mathop{\rm grad} _ {t} \mathop{\rm ln} p ( \omega ; t ) = $$

$$ = \ \left ( \frac{1}{p ( \omega ; t ) } \frac{\partial p ( \omega ; t ) }{\partial t _ {1} } \dots \frac{1}{p ( \omega ; t ) } \frac{\partial p ( \omega ; t ) }{\partial t _ {n} } \right ) , $$

which, unlike the logarithmic likelihood function, does not depend on the choice of $ \mu $. The informant contains all essential information, both that obtained from the observations, as well as the a priori information, for the problem of estimating $ \theta $. Moreover, it is additive: For independent observations, i.e. when

$$ p ( \omega ^ {(1)} \dots \omega ^ {(N)} ; t ) = \ \prod _ { k= 1} ^ { N } p _ {k} ( \omega ^ {(k)} ; t ) , $$

the informants are summed:

$$ { \mathop{\rm grad} \mathop{\rm ln} } p ( \omega ^ {(1)} \dots \omega ^ {(N)} ; t ) = \ \sum _ { k= 1} ^ { N } { \mathop{\rm grad} \mathop{\rm ln} } p _ {k} ( \omega ^ {(k)} ; t ) . $$

In statistical estimation theory the properties of the informant as a vector function are important. Under the assumptions that the logarithmic likelihood function is regular, in particular, twice differentiable, that its derivatives are integrable and that differentiation by the parameter may be interchanged with integration with respect to the outcomes, one has

$$ {\mathsf E} _ {t} \frac{\partial \mathop{\rm ln} p ( \omega ; t ) }{\partial t _ {k} } = \int\limits _ \Omega \frac{\partial \mathop{\rm ln} p ( \omega ; t ) }{\partial t _ {k} } p ( \omega ; t ) d \mu = 0 ,\ \ \forall k ; $$

$$ - {\mathsf E} _ {t} \frac{\partial ^ {2} \mathop{\rm ln} p ( \omega ; \ t ) }{\partial t _ {j} \partial t _ {k} } = \ I _ {jk} ( t) = {\mathsf E} _ {t} \frac{\partial \mathop{\rm ln} p }{\partial t _ {j} } \frac{\partial \mathop{\rm ln} p }{\partial t _ {k} } ,\ \ \forall j , k . $$

The covariance matrix $ \| I _ {jk} ( t) \| _ {j,k=} 1 ^ {m} $ is called the information matrix. An inequality expressing a bound on the exactness of statistical estimators for $ \theta $ can be given in terms of this matrix.

When estimating $ \theta $ by the maximum-likelihood method, one assigns to the observed outcome $ \omega $( or series $ \omega ^ {(} 1) \dots \omega ^ {(} N) $) the most likely value $ t = \theta ^ {*} ( \omega ) $, i.e. one maximizes the likelihood function and its logarithm. At an extremal point the informant must vanish. However, the likelihood equation that arises,

$$ { \mathop{\rm grad} \mathop{\rm ln} } p ( \omega ; t ) = 0 , $$

can have roots $ t = \theta ^ {*} $, corresponding to maxima of the logarithmic likelihood function that are only local (or to minima); these must be discarded. If, in a neighbourhood of $ t = 0 $,

$$ \mathop{\rm det} \| I _ {jk} ( t) \| \neq 0 , $$

then the asymptotic optimality of the maximum-likelihood estimator $ \theta _ {N} ^ {*} $ follows from the listed properties of the informant, as the number $ N $ of independent observations used grows indefinitely.

References[edit]

[1]	S.S. Wilks, "Mathematical statistics" , Wiley (1962) Zbl 0173.45805