The control variates method is a variance reduction technique used in Monte Carlo methods. It exploits information about the errors in estimates of known quantities to reduce the error of an estimate of an unknown quantity.[1] [2][3]
Let the unknown parameter of interest be [math]\displaystyle{ \mu }[/math], and assume we have a statistic [math]\displaystyle{ m }[/math] such that the expected value of m is μ: [math]\displaystyle{ \mathbb{E}\left[m\right]=\mu }[/math], i.e. m is an unbiased estimator for μ. Suppose we calculate another statistic [math]\displaystyle{ t }[/math] such that [math]\displaystyle{ \mathbb{E}\left[t\right]=\tau }[/math] is a known value. Then
is also an unbiased estimator for [math]\displaystyle{ \mu }[/math] for any choice of the coefficient [math]\displaystyle{ c }[/math]. The variance of the resulting estimator [math]\displaystyle{ m^{\star} }[/math] is
By differentiating the above expression with respect to [math]\displaystyle{ c }[/math], it can be shown that choosing the optimal coefficient
minimizes the variance of [math]\displaystyle{ m^{\star} }[/math]. (Note that this coefficient is the same as the coefficient obtained from a linear regression.) With this choice,
where
is the correlation coefficient of [math]\displaystyle{ m }[/math] and [math]\displaystyle{ t }[/math]. The greater the value of [math]\displaystyle{ \vert\rho_{m,t}\vert }[/math], the greater the variance reduction achieved.
In the case that [math]\displaystyle{ \textrm{Cov}\left(m,t\right) }[/math], [math]\displaystyle{ \textrm{Var}\left(t\right) }[/math], and/or [math]\displaystyle{ \rho_{m,t}\; }[/math] are unknown, they can be estimated across the Monte Carlo replicates. This is equivalent to solving a certain least squares system; therefore this technique is also known as regression sampling.
When the expectation of the control variable, [math]\displaystyle{ \mathbb{E}\left[t\right]=\tau }[/math], is not known analytically, it is still possible to increase the precision in estimating [math]\displaystyle{ \mu }[/math] (for a given fixed simulation budget), provided that the two conditions are met: 1) evaluating [math]\displaystyle{ t }[/math] is significantly cheaper than computing [math]\displaystyle{ m }[/math]; 2) the magnitude of the correlation coefficient [math]\displaystyle{ |\rho_{m,t}| }[/math] is close to unity. [3]
We would like to estimate
using Monte Carlo integration. This integral is the expected value of [math]\displaystyle{ f(U) }[/math], where
and U follows a uniform distribution [0, 1]. Using a sample of size n denote the points in the sample as [math]\displaystyle{ u_1, \cdots, u_n }[/math]. Then the estimate is given by
Now we introduce [math]\displaystyle{ g(U) = 1+U }[/math] as a control variate with a known expected value [math]\displaystyle{ \mathbb{E}\left[g\left(U\right)\right]=\int_0^1 (1+x) \, \mathrm{d}x=\tfrac{3}{2} }[/math] and combine the two into a new estimate
Using [math]\displaystyle{ n=1500 }[/math] realizations and an estimated optimal coefficient [math]\displaystyle{ c^\star \approx 0.4773 }[/math] we obtain the following results
Estimate | Variance | |
Classical estimate | 0.69475 | 0.01947 |
Control variates | 0.69295 | 0.00060 |
The variance was significantly reduced after using the control variates technique. (The exact result is [math]\displaystyle{ I=\ln 2 \approx 0.69314718 }[/math].)
Original source: https://en.wikipedia.org/wiki/Control variates.
Read more |