Probability Axioms

From Handwiki

Short description: Foundations of probability theory

The Kolmogorov axioms are the foundations of probability theory introduced by Russian mathematician Andrey Kolmogorov in 1933.[1] These axioms remain central and have direct contributions to mathematics, the physical sciences, and real-world probability cases.[2] An alternative approach to formalising probability, favoured by some Bayesians, is given by Cox's theorem.[3][4]

Axioms

The assumptions as to setting up the axioms can be summarised as follows: Let [math]\displaystyle{ (\Omega, F, P) }[/math] be a measure space with [math]\displaystyle{ P(E) }[/math] being the probability of some event E, and [math]\displaystyle{ P(\Omega) = 1 }[/math]. Then [math]\displaystyle{ (\Omega, F, P) }[/math] is a probability space, with sample space [math]\displaystyle{ \Omega }[/math], event space [math]\displaystyle{ F }[/math] and probability measure [math]\displaystyle{ P }[/math].[1]

First axiom

The probability of an event is a non-negative real number:

[math]\displaystyle{ P(E)\in\mathbb{R}, P(E)\geq 0 \qquad \forall E \in F }[/math]

where [math]\displaystyle{ F }[/math] is the event space. It follows that [math]\displaystyle{ P(E) }[/math] is always finite, in contrast with more general measure theory. Theories which assign negative probability relax the first axiom.

Second axiom

This is the assumption of unit measure: that the probability that at least one of the elementary events in the entire sample space will occur is 1

[math]\displaystyle{ P(\Omega) = 1. }[/math]

Third axiom

This is the assumption of σ-additivity:

Any countable sequence of disjoint sets (synonymous with mutually exclusive events) [math]\displaystyle{ E_1, E_2, \ldots }[/math] satisfies
[math]\displaystyle{ P\left(\bigcup_{i = 1}^\infty E_i\right) = \sum_{i=1}^\infty P(E_i). }[/math]

Some authors consider merely finitely additive probability spaces, in which case one just needs an algebra of sets, rather than a σ-algebra.[5] Quasiprobability distributions in general relax the third axiom.

Consequences

From the Kolmogorov axioms, one can deduce other useful rules for studying probabilities. The proofs[6][7][8] of these rules are a very insightful procedure that illustrates the power of the third axiom, and its interaction with the remaining two axioms. Four of the immediate corollaries and their proofs are shown below:

Monotonicity

[math]\displaystyle{ \quad\text{if}\quad A\subseteq B\quad\text{then}\quad P(A)\leq P(B). }[/math]

If A is a subset of, or equal to B, then the probability of A is less than, or equal to the probability of B.

Proof of monotonicity[6]

In order to verify the monotonicity property, we set [math]\displaystyle{ E_1=A }[/math] and [math]\displaystyle{ E_2=B\setminus A }[/math], where [math]\displaystyle{ A\subseteq B }[/math] and [math]\displaystyle{ E_i=\varnothing }[/math] for [math]\displaystyle{ i\geq 3 }[/math]. From the properties of the empty set ([math]\displaystyle{ \varnothing }[/math]), it is easy to see that the sets [math]\displaystyle{ E_i }[/math] are pairwise disjoint and [math]\displaystyle{ E_1\cup E_2\cup\cdots=B }[/math]. Hence, we obtain from the third axiom that

[math]\displaystyle{ P(A)+P(B\setminus A)+\sum_{i=3}^\infty P(E_i)=P(B). }[/math]

Since, by the first axiom, the left-hand side of this equation is a series of non-negative numbers, and since it converges to [math]\displaystyle{ P(B) }[/math] which is finite, we obtain both [math]\displaystyle{ P(A)\leq P(B) }[/math] and [math]\displaystyle{ P(\varnothing)=0 }[/math].

The probability of the empty set

[math]\displaystyle{ P(\varnothing)=0. }[/math]

In many cases, [math]\displaystyle{ \varnothing }[/math] is not the only event with probability 0.

Proof of the probability of the empty set

[math]\displaystyle{ P(\varnothing \cup \varnothing) = P(\varnothing) }[/math] since [math]\displaystyle{ \varnothing \cup \varnothing = \varnothing }[/math],

[math]\displaystyle{ P(\varnothing)+P(\varnothing) = P(\varnothing) }[/math] by applying the third axiom to the left-hand side (note [math]\displaystyle{ \varnothing }[/math] is disjoint with itself), and so

[math]\displaystyle{ P(\varnothing) = 0 }[/math] by subtracting [math]\displaystyle{ P(\varnothing) }[/math] from each side of the equation.

The complement rule

[math]\displaystyle{ P\left(A^{c}\right) = P(\Omega-A) = 1 - P(A) }[/math]

Proof of the complement rule

Given [math]\displaystyle{ A }[/math] and [math]\displaystyle{ A^{c} }[/math] are mutually exclusive and that [math]\displaystyle{ A \cup A^c = \Omega }[/math]:

[math]\displaystyle{ P(A \cup A^c)=P(A)+P(A^c) }[/math] ... (by axiom 3)

and, [math]\displaystyle{ P(A \cup A^c)=P(\Omega)=1 }[/math] ... (by axiom 2)

[math]\displaystyle{ \Rightarrow P(A)+P(A^c)=1 }[/math]

[math]\displaystyle{ \therefore P(A^c)=1-P(A) }[/math]

The numeric bound

It immediately follows from the monotonicity property that

[math]\displaystyle{ 0\leq P(E)\leq 1\qquad \forall E\in F. }[/math]

Proof of the numeric bound

Given the complement rule [math]\displaystyle{ P(E^c)=1-P(E) }[/math] and axiom 1 [math]\displaystyle{ P(E^c)\geq0 }[/math]:

[math]\displaystyle{ 1-P(E) \geq 0 }[/math]

[math]\displaystyle{ \Rightarrow 1 \geq P(E) }[/math]

[math]\displaystyle{ \therefore 0\leq P(E)\leq 1 }[/math]

Further consequences

Another important property is:

[math]\displaystyle{ P(A \cup B) = P(A) + P(B) - P(A \cap B). }[/math]

This is called the addition law of probability, or the sum rule. That is, the probability that an event in A or B will happen is the sum of the probability of an event in A and the probability of an event in B, minus the probability of an event that is in both A and B. The proof of this is as follows:

Firstly,

[math]\displaystyle{ P(A\cup B) = P(A) + P(B\setminus A) }[/math] ... (by Axiom 3)

So,

[math]\displaystyle{ P(A \cup B) = P(A) + P(B\setminus (A \cap B)) }[/math] (by [math]\displaystyle{ B \setminus A = B\setminus (A \cap B) }[/math]).

Also,

[math]\displaystyle{ P(B) = P(B\setminus (A \cap B)) + P(A \cap B) }[/math]

and eliminating [math]\displaystyle{ P(B\setminus (A \cap B)) }[/math] from both equations gives us the desired result.

An extension of the addition law to any number of sets is the inclusion–exclusion principle.

Setting B to the complement Ac of A in the addition law gives

[math]\displaystyle{ P\left(A^{c}\right) = P(\Omega\setminus A) = 1 - P(A) }[/math]

That is, the probability that any event will not happen (or the event's complement) is 1 minus the probability that it will.

Simple example: coin toss

Consider a single coin-toss, and assume that the coin will either land heads (H) or tails (T) (but not both). No assumption is made as to whether the coin is fair.

We may define:

[math]\displaystyle{ \Omega = \{H,T\} }[/math]
[math]\displaystyle{ F = \{\varnothing, \{H\}, \{T\}, \{H,T\}\} }[/math]

Kolmogorov's axioms imply that:

[math]\displaystyle{ P(\varnothing) = 0 }[/math]

The probability of neither heads nor tails, is 0.

[math]\displaystyle{ P(\{H,T\}^c) = 0 }[/math]

The probability of either heads or tails, is 1.

[math]\displaystyle{ P(\{H\}) + P(\{T\}) = 1 }[/math]

The sum of the probability of heads and the probability of tails, is 1.

See also

  • Conditional probability – Probability of an event occurring, given that another event has already occurred
  • Fully probabilistic design
  • Intuitive statistics
  • Set theory – Branch of mathematics that studies sets
  • σ-algebra

References

  1. 1.0 1.1 Kolmogorov, Andrey (1950). Foundations of the theory of probability. New York, USA: Chelsea Publishing Company. https://archive.org/details/foundationsofthe00kolm. 
  2. Aldous, David. "What is the significance of the Kolmogorov axioms?". https://www.stat.berkeley.edu/~aldous/Real_World/kolmogorov.html. 
  3. Cox, R. T. (1946). "Probability, Frequency and Reasonable Expectation". American Journal of Physics 14 (1): 1–10. doi:10.1119/1.1990764. Bibcode: 1946AmJPh..14....1C. 
  4. Cox, R. T. (1961). The Algebra of Probable Inference. Baltimore, MD: Johns Hopkins University Press. 
  5. Hájek, Alan (August 28, 2019). "Interpretations of Probability". https://plato.stanford.edu/entries/probability-interpret/#KolProCal. 
  6. 6.0 6.1 Ross, Sheldon M. (2014). A first course in probability (Ninth ed.). Upper Saddle River, New Jersey. pp. 27, 28. ISBN 978-0-321-79477-2. OCLC 827003384. 
  7. Gerard, David (December 9, 2017). "Proofs from axioms". https://dcgerard.github.io/stat234/11_proofs_from_axioms.pdf. 
  8. Jackson, Bill (2010). "Probability (Lecture Notes - Week 3)". http://www.maths.qmul.ac.uk/~bill/MTH4107/notesweek3_10.pdf. 

Further reading

  • DeGroot, Morris H. (1975). Probability and Statistics. Reading: Addison-Wesley. pp. 12–16. ISBN 0-201-01503-X. https://archive.org/details/probabilitystati0000degr/page/12. 
  • McCord, James R.; Moroney, Richard M. (1964). "Axiomatic Probability". Introduction to Probability Theory. New York: Macmillan. pp. 13–28. https://archive.org/details/introductiontopr00mcco. 
  • Formal definition of probability in the Mizar system, and the list of theorems formally proved about it.



Retrieved from "https://handwiki.org/wiki/index.php?title=Probability_axioms&oldid=3008234"

Categories: [Probability theory] [Mathematical axioms]


Download as ZWI file | Last modified: 12/25/2023 16:55:55 | 9 views
☰ Source: https://handwiki.org/wiki/Probability_axioms | License: CC BY-SA 3.0

ZWI signed:
  Encycloreader by the Knowledge Standards Foundation (KSF) ✓[what is this?]