Quadratic Form (Statistics)

In multivariate statistics, if [math]\displaystyle{ \varepsilon }[/math] is a vector of [math]\displaystyle{ n }[/math] random variables, and [math]\displaystyle{ \Lambda }[/math] is an [math]\displaystyle{ n }[/math]-dimensional symmetric matrix, then the scalar quantity [math]\displaystyle{ \varepsilon^T\Lambda\varepsilon }[/math] is known as a quadratic form in [math]\displaystyle{ \varepsilon }[/math].

Expectation

It can be shown that^[1]

[math]\displaystyle{ \operatorname{E}\left[\varepsilon^T\Lambda\varepsilon\right]=\operatorname{tr}\left[\Lambda \Sigma\right] + \mu^T\Lambda\mu }[/math]

where [math]\displaystyle{ \mu }[/math] and [math]\displaystyle{ \Sigma }[/math] are the expected value and variance-covariance matrix of [math]\displaystyle{ \varepsilon }[/math], respectively, and tr denotes the trace of a matrix. This result only depends on the existence of [math]\displaystyle{ \mu }[/math] and [math]\displaystyle{ \Sigma }[/math]; in particular, normality of [math]\displaystyle{ \varepsilon }[/math] is not required.

A book treatment of the topic of quadratic forms in random variables is that of Mathai and Provost.^[2]

Proof

Since the quadratic form is a scalar quantity, [math]\displaystyle{ \varepsilon^T\Lambda\varepsilon = \operatorname{tr}(\varepsilon^T\Lambda\varepsilon) }[/math].

Next, by the cyclic property of the trace operator,

[math]\displaystyle{ \operatorname{E}[\operatorname{tr}(\varepsilon^T\Lambda\varepsilon)] = \operatorname{E}[\operatorname{tr}(\Lambda\varepsilon\varepsilon^T)]. }[/math]

Since the trace operator is a linear combination of the components of the matrix, it therefore follows from the linearity of the expectation operator that

[math]\displaystyle{ \operatorname{E}[\operatorname{tr}(\Lambda\varepsilon\varepsilon^T)] = \operatorname{tr}(\Lambda \operatorname{E}(\varepsilon\varepsilon^T)). }[/math]

A standard property of variances then tells us that this is

[math]\displaystyle{ \operatorname{tr}(\Lambda (\Sigma + \mu \mu^T)). }[/math]

Applying the cyclic property of the trace operator again, we get

[math]\displaystyle{ \operatorname{tr}(\Lambda\Sigma) + \operatorname{tr}(\Lambda \mu \mu^T) = \operatorname{tr}(\Lambda\Sigma) + \operatorname{tr}(\mu^T\Lambda\mu) = \operatorname{tr}(\Lambda\Sigma) + \mu^T\Lambda\mu. }[/math]

Variance in the Gaussian case

In general, the variance of a quadratic form depends greatly on the distribution of [math]\displaystyle{ \varepsilon }[/math]. However, if [math]\displaystyle{ \varepsilon }[/math] does follow a multivariate normal distribution, the variance of the quadratic form becomes particularly tractable. Assume for the moment that [math]\displaystyle{ \Lambda }[/math] is a symmetric matrix. Then,

[math]\displaystyle{ \operatorname{var} \left[\varepsilon^T\Lambda\varepsilon\right] = 2\operatorname{tr}\left[\Lambda \Sigma\Lambda \Sigma\right] + 4\mu^T\Lambda\Sigma\Lambda\mu }[/math].^[3]

In fact, this can be generalized to find the covariance between two quadratic forms on the same [math]\displaystyle{ \varepsilon }[/math] (once again, [math]\displaystyle{ \Lambda_1 }[/math] and [math]\displaystyle{ \Lambda_2 }[/math] must both be symmetric):

[math]\displaystyle{ \operatorname{cov}\left[\varepsilon^T\Lambda_1\varepsilon,\varepsilon^T\Lambda_2\varepsilon\right]=2\operatorname{tr}\left[\Lambda _1\Sigma\Lambda_2 \Sigma\right] + 4\mu^T\Lambda_1\Sigma\Lambda_2\mu }[/math].^[4]

In addition, a quadratic form such as this follows a generalized chi-squared distribution.

Computing the variance in the non-symmetric case

The case for general [math]\displaystyle{ \Lambda }[/math] can be derived by noting that

[math]\displaystyle{ \varepsilon^T\Lambda^T\varepsilon=\varepsilon^T\Lambda\varepsilon }[/math]

so

[math]\displaystyle{ \varepsilon^T\tilde{\Lambda}\varepsilon=\varepsilon^T\left(\Lambda+\Lambda^T\right)\varepsilon/2 }[/math]

is a quadratic form in the symmetric matrix [math]\displaystyle{ \tilde{\Lambda}=\left(\Lambda+\Lambda^T\right)/2 }[/math], so the mean and variance expressions are the same, provided [math]\displaystyle{ \Lambda }[/math] is replaced by [math]\displaystyle{ \tilde{\Lambda} }[/math] therein.

Examples of quadratic forms

In the setting where one has a set of observations [math]\displaystyle{ y }[/math] and an operator matrix [math]\displaystyle{ H }[/math], then the residual sum of squares can be written as a quadratic form in [math]\displaystyle{ y }[/math]:

[math]\displaystyle{ \textrm{RSS}=y^T(I-H)^T (I-H)y. }[/math]

For procedures where the matrix [math]\displaystyle{ H }[/math] is symmetric and idempotent, and the errors are Gaussian with covariance matrix [math]\displaystyle{ \sigma^2I }[/math], [math]\displaystyle{ \textrm{RSS}/\sigma^2 }[/math] has a chi-squared distribution with [math]\displaystyle{ k }[/math] degrees of freedom and noncentrality parameter [math]\displaystyle{ \lambda }[/math], where

[math]\displaystyle{ k=\operatorname{tr}\left[(I-H)^T(I-H)\right] }[/math]

[math]\displaystyle{ \lambda=\mu^T(I-H)^T(I-H)\mu/2 }[/math]

may be found by matching the first two central moments of a noncentral chi-squared random variable to the expressions given in the first two sections. If [math]\displaystyle{ Hy }[/math] estimates [math]\displaystyle{ \mu }[/math] with no bias, then the noncentrality [math]\displaystyle{ \lambda }[/math] is zero and [math]\displaystyle{ \textrm{RSS}/\sigma^2 }[/math] follows a central chi-squared distribution.

References

↑ Bates, Douglas. "Quadratic Forms of Random Variables". STAT 849 lectures. http://www.stat.wisc.edu/~st849-1/lectures/Ch02.pdf.
↑ Mathai, A. M.; Provost, Serge B. (1992). Quadratic Forms in Random Variables. CRC Press. p. 424. ISBN 978-0824786915.
↑ Rencher, Alvin C.; Schaalje, G. Bruce. (2008). Linear models in statistics (2nd ed.). Hoboken, N.J.: Wiley-Interscience. ISBN 9780471754985. OCLC 212120778.
↑ Graybill, Franklin A.. Matrices with applications in statistics (2. ed.). Wadsworth: Belmont, Calif.. p. 367. ISBN 0534980384.