Surrogate data testing[1] (or the method of surrogate data) is a statistical proof by contradiction technique similar to permutation tests[2] and parametric bootstrapping. It is used to detect non-linearity in a time series.[3] The technique involves specifying a null hypothesis [math]\displaystyle{ H_0 }[/math] describing a linear process and then generating several surrogate data sets according to [math]\displaystyle{ H_0 }[/math] using Monte Carlo methods. A discriminating statistic is then calculated for the original time series and all the surrogate set. If the value of the statistic is significantly different for the original series than for the surrogate set, the null hypothesis is rejected and non-linearity assumed.[3]
The particular surrogate data testing method to be used is directly related to the null hypothesis. Usually this is similar to the following: The data is a realization of a stationary linear system, whose output has been possibly measured by a monotonically increasing possibly nonlinear (but static) function.[1] Here linear means that each value is linearly dependent on past values or on present and past values of some independent identically distributed (i.i.d.) process, usually also Gaussian. This is equivalent to saying that the process is ARMA type. In case of fluxes (continuous mappings), linearity of system means that it can be expressed by a linear differential equation. In this hypothesis, the static measurement function is one which depends only on the present value of its argument, not on past ones.
Many algorithms to generate surrogate data have been proposed. They are usually classified in two groups:[4]
The last surrogate data methods do not depend on a particular model, nor on any parameters, thus they are non-parametric methods. These surrogate data methods are usually based on preserving the linear structure of the original series (for instance, by preserving the autocorrelation function, or equivalently the periodogram, an estimate of the sample spectrum).[5] Among constrained realizations methods, the most widely used (and thus could be called the classical methods) are:
Many other surrogate data methods have been proposed, some based on optimizations to achieve an autocorrelation close to the original one,[9][10][11] some based on wavelet transform[12][13][14] and some capable of dealing with some types of non-stationary data.[15][16][17]
The above mentioned techniques are called linear surrogate methods, because they are based on a linear process and address a linear null hypothesis.[9] Broadly speaking, these methods are useful for data showing irregular fluctuations (short-term variabilities) and data with such a behaviour abound in the real world. However, we often observe data with obvious periodicity, for example, annual sunspot numbers, electrocardiogram (ECG) and so on. Time series exhibiting strong periodicities are clearly not consistent with the linear null hypotheses. To tackle this case, some algorithms and null hypotheses have been proposed.[18][19][20]
Original source: https://en.wikipedia.org/wiki/Surrogate data testing.
Read more |