Short description: Fitting an approximating function to data
Simple exponential smoothing example. Raw data: mean daily temperatures at the Paris-Montsouris weather station (France) from 1960/01/01 to 1960/02/29. Smoothed data with alpha factor = 0.1.
In statistics and image processing, to smooth a data set is to create an approximating function that attempts to capture important patterns in the data, while leaving out noise or other fine-scale structures/rapid phenomena. In smoothing, the data points of a signal are modified so individual points higher than the adjacent points (presumably because of noise) are reduced, and points that are lower than the adjacent points are increased, leading to a smoother signal.
Reducing noise by smoothing may aid in data analysis in two notable ways:
Help uncover more meaningful information from the underlying data, such as trends.[1]
Provide analyses that are both flexible and robust.[2]
Smoothing may be distinguished from the related and partially overlapping concept of curve fitting in the following ways:
curve fitting often involves the use of an explicit function form for the result, whereas the immediate results from smoothing are the "smoothed" values with no later use made of a functional form if there is one;
the aim of smoothing is to give a general idea of relatively slow changes of value with little attention paid to the close matching of data values, while curve fitting concentrates on achieving as close a match as possible.
smoothing methods often have an associated tuning parameter which is used to control the extent of smoothing. Curve fitting will adjust any number of parameters of the function to obtain the 'best' fit.
Linear smoothers
The operation of applying such a matrix transformation is called convolution. Thus the matrix is also called convolution matrix or a convolution kernel. In the case of simple series of data points (rather than a multi-dimensional image), the convolution kernel is a one-dimensional vector.
Algorithms
One of the most common algorithms is the "moving average", often used to try to capture important trends in repeated statistical surveys. In image processing and computer vision, smoothing ideas are used in scale space representations. The simplest smoothing algorithm is the "rectangular" or "unweighted sliding-average smooth". This method replaces each point in the signal with the average of "m" adjacent points, where "m" is a positive integer called the "smooth width". Usually m is an odd number. The triangular smooth is like the rectangular smooth except that it implements a weighted smoothing function.[3]
Some specific smoothing and filter types, with their respective uses, pros and cons are:
Uses a series of measurements observed over time, containing statistical noise and other inaccuracies by estimating a joint probability distribution over the variables for each timeframe.
Estimates of unknown variables it produces tend to be more accurate than those based on a single measurement alone, when assumptions are met.
Assumes and therefore requires knowledge of how the system generating the data-points advances in time and how the measurements are acquired.
Uses a series of iterations of a moving average filter of length m, where m is a positive, odd integer.
robust and nearly optimal
performs well in a missing data environment, especially in multidimensional time and space where missing data can cause problems arising from spatial sparseness
the two parameters each have clear interpretations so that it can be easily adopted by specialists in different areas
Software implementations for time series, longitudinal and spatial data have been developed in the popular statistical package R, which facilitate the use of the KZ filter and its extensions in different areas.
Generalizes a Savitzky–Golay smoothing filter to non-regular sampling instances.
fitting simple models to localized subsets of the data to build up a function that describes the deterministic part of the variation in the data, point by point
one of the chief attractions of this method is that the data analyst is not required to specify a global function of any form to fit a model to the data, only to fit segments of the data.
increased computation. Because it is so computationally intensive, LOESS would have been practically impossible to use in the era when least squares regression was being developed.
A filter that passes signals with a frequency lower than a selected cutoff frequency and attenuates signals with frequencies higher than the cutoff frequency.
Used for continuous time realization and discrete time realization.
decimates a curve composed of line segments to a similar curve with fewer points.
Savitzky–Golay smoothing filter
Based on the least-squares fitting of polynomials to segments of the data.
A specific case of Local regression ("loess" or "lowess") when the sampling instances are regular.
a numerical technique for finding approximate solutions of various mathematical and engineering problems that can be related to an elastic grid behavior
meteorologists use the stretched grid method for weather prediction
engineers use the stretched grid method to design tents and other tensile structures.