A statistic is a function of a distributed variable. Notionally, it is a calculation made on the basis of a set numbers typically derived as a sample from some presumed underlying probability distribution, and usually used in order to estimate something about the distribution from which the sample is taken. The use of a statistic to characterize a set of observations is generally justified on the basis of its asymptotic behavior, that is, a given statistic accurately characterizes the underlying phenomena only probabilistically (this consideration is the genesis of confidence intervals in classical statistics) and is considered to be accurate only in the limit as the number of observations increases without bounds. It should be noted however that the use of confidence intervals is somewhat problematic since their calculations are based on certain presumptions about the nature of the underlying true distribution, which may or may not prove to be good.
For example, suppose a random sample of three children is chosen from a particular class, and their heights measured as 1.42 cm., 1.54 cm., and 1.48 cm; then the arithmetic mean of these heights is 1.48 cm. We might then go on to use this value of 1.48 cm to represent the average height of a child in that class.
Clearly the validity and reliability of such estimations will depend enormously on a range of factors such as the type of distributions, the number in the sample, and on sampling methods used.
Let X1, X2, X3, ...., Xn be a random sample of size n from some distribution. A statistic calculated on the sample is defined to be any function of the set of values X1, X2, X3, ...., Xn, involving no unknown quantities [1]
The point of this definition is to ensure that the process results in an actual numerical value, rather than a formula involving variables.
Categories: [Science] [Statistics]