Probability density function | |||
Notation | [math]\displaystyle{ \mathcal{CB}(\lambda) }[/math] | ||
---|---|---|---|
Parameters | [math]\displaystyle{ \lambda \in (0,1) }[/math] | ||
Support | [math]\displaystyle{ x \in [0, 1] }[/math] | ||
[math]\displaystyle{ C(\lambda) \lambda^x (1-\lambda)^{1-x}\! }[/math] where [math]\displaystyle{ C(\lambda) = \begin{cases} 2 &\text{if } \lambda = \frac{1}{2}\\ \frac{2 \tanh^{-1}(1-2\lambda)}{1-2\lambda} &\text{ otherwise} \end{cases} }[/math] | |||
CDF | [math]\displaystyle{ \begin{cases} x &\text{ if } \lambda = \frac{1}{2} \\ \frac{\lambda^x (1-\lambda)^{1-x} + \lambda - 1}{2\lambda - 1} &\text{ otherwise} \end{cases}\! }[/math] | ||
Mean | [math]\displaystyle{ \operatorname{E}[X] = \begin{cases} \frac{1}{2} &\text{ if } \lambda = \frac{1}{2} \\ \frac{\lambda}{2\lambda - 1} + \frac{1}{2 \tanh^{-1}(1-2\lambda)} &\text{ otherwise} \end{cases}\! }[/math] | ||
Variance | [math]\displaystyle{ \operatorname{var}[X] = \begin{cases} \frac{1}{12} &\text{ if } \lambda = \frac{1}{2} \\ -\frac{(1-\lambda) \lambda}{(1-2\lambda)^2} + \frac{1}{(2 \tanh^{-1}(1-2\lambda))^2} &\text{ otherwise} \end{cases}\! }[/math] |
In probability theory, statistics, and machine learning, the continuous Bernoulli distribution[1][2][3] is a family of continuous probability distributions parameterized by a single shape parameter [math]\displaystyle{ \lambda \in (0, 1) }[/math], defined on the unit interval [math]\displaystyle{ x \in [0, 1] }[/math], by:
The continuous Bernoulli distribution arises in deep learning and computer vision, specifically in the context of variational autoencoders,[4][5] for modeling the pixel intensities of natural images. As such, it defines a proper probabilistic counterpart for the commonly used binary cross entropy loss, which is often applied to continuous, [math]\displaystyle{ [0,1] }[/math]-valued data.[6][7][8][9] This practice amounts to ignoring the normalizing constant of the continuous Bernoulli distribution, since the binary cross entropy loss only defines a true log-likelihood for discrete, [math]\displaystyle{ \{0,1\} }[/math]-valued data.
The continuous Bernoulli also defines an exponential family of distributions. Writing [math]\displaystyle{ \eta = \log\left(\lambda/(1-\lambda)\right) }[/math] for the natural parameter, the density can be rewritten in canonical form: [math]\displaystyle{ p(x | \eta) \propto \exp (\eta x) }[/math].
The continuous Bernoulli can be thought of as a continuous relaxation of the Bernoulli distribution, which is defined on the discrete set [math]\displaystyle{ \{0,1\} }[/math] by the probability mass function:
where [math]\displaystyle{ p }[/math] is a scalar parameter between 0 and 1. Applying this same functional form on the continuous interval [math]\displaystyle{ [0,1] }[/math] results in the continuous Bernoulli probability density function, up to a normalizing constant.
The Beta distribution has the density function:
which can be re-written as:
where [math]\displaystyle{ \alpha_1, \alpha_2 }[/math] are positive scalar parameters, and [math]\displaystyle{ (x_1, x_2) }[/math] represents an arbitrary point inside the 1-simplex, [math]\displaystyle{ \Delta^{1} = \{ (x_1, x_2): x_1 \gt 0, x_2 \gt 0, x_1 + x_2 = 1 \} }[/math]. Switching the role of the parameter and the argument in this density function, we obtain:
This family is only identifiable up to the linear constraint [math]\displaystyle{ \alpha_1 + \alpha_2 = 1 }[/math], whence we obtain:
corresponding exactly to the continuous Bernoulli density.
An exponential distribution restricted to the unit interval is equivalent to a continuous Bernoulli distribution with appropriate[which?] parameter.
The multivariate generalization of the continuous Bernoulli is called the continuous-categorical.[10]
Original source: https://en.wikipedia.org/wiki/Continuous Bernoulli distribution.
Read more |