Continuous Bernoulli distribution

Short description: Probability distribution

Template:Infobox probability distribution 2 In probability theory, statistics, and machine learning, the continuous Bernoulli distribution^[1]^[2]^[3] is a family of continuous probability distributions parameterized by a single shape parameter $λ \in (0, 1)$ , defined on the unit interval $x \in [0, 1]$ , by:

p (x | λ) \propto λ^{x} (1 - λ)^{1 - x} .

The continuous Bernoulli distribution arises in deep learning and computer vision, specifically in the context of variational autoencoders,^[4]^[5] for modeling the pixel intensities of natural images. As such, it defines a proper probabilistic counterpart for the commonly used binary cross entropy loss, which is often applied to continuous, $[0, 1]$ -valued data.^[6]^[7]^[8]^[9] This practice amounts to ignoring the normalizing constant of the continuous Bernoulli distribution, since the binary cross entropy loss only defines a true log-likelihood for discrete, ${0, 1}$ -valued data.

The continuous Bernoulli also defines an exponential family of distributions. Writing $θ = \log (λ / (1 - λ))$ for the natural parameter, the density can be rewritten in canonical form: $p (x | θ) \propto \exp (θ x)$ . ^[10]

Statistical inference

Given an independent sample of $n$ points $x_{1}, \dots, x_{n}$ with $x_{i} \in [0, 1] \forall i$ from continuous Bernoulli, the log-likelihood of the natural parameter $θ$ is

ℒ (θ) = θ \sum_{i = 1}^{n} x_{i} - n \log {(e^{θ} - 1) / θ}

and the maximum likelihood estimator of the natural parameter $θ$ is the solution of $ℒ^{'} (θ) = 0$ , that is, $\hat{θ}$ satisfies

\frac{e^{\hat{θ}}}{e^{\hat{θ}} - 1} - \frac{1}{\hat{θ}} = \frac{1}{n} \sum_{i = 1}^{n} x_{i}

where the left hand side $e^{\hat{θ}} / (e^{\hat{θ}} - 1) - {\hat{θ}}^{- 1}$ is the expected value of continuous Bernoulli with parameter $\hat{θ}$ . Although $\hat{θ}$ does not admit a closed-form expression, it can be easily calculated with numerical inversion.

Further properties

The entropy of a continuous Bernoulli distribution is

H [X] = {\begin{cases} 0 & if λ = \frac{1}{2} \\ \frac{λ \log (λ) - (1 - λ) \log (1 - λ)}{1 - 2 λ} - \log (\frac{2 \tanh^{- 1} (1 - 2 λ)}{e (1 - 2 λ)}) & otherwise \end{cases}

Related distributions

Bernoulli distribution

The continuous Bernoulli can be thought of as a continuous relaxation of the Bernoulli distribution, which is defined on the discrete set ${0, 1}$ by the probability mass function:

p (x) = p^{x} (1 - p)^{1 - x},

where $p$ is a scalar parameter between 0 and 1. Applying this same functional form on the continuous interval $[0, 1]$ results in the continuous Bernoulli probability density function, up to a normalizing constant.

Uniform distribution

The Uniform distribution between the unit interval [0,1] is a special case of continuous Bernoulli when $λ = 1 / 2$ or $θ = 0$ .

Exponential distribution

An exponential distribution with rate $Λ$ restricted to the unit interval [0,1] corresponds to a continuous Bernoulli distribution with natural parameter $θ = - Λ < 0$ .

Continuous categorical distribution

The multivariate generalization of the continuous Bernoulli is called the continuous-categorical.^[11]

References

↑ Loaiza-Ganem, G., & Cunningham, J. P. (2019). The continuous Bernoulli: fixing a pervasive error in variational autoencoders. In Advances in Neural Information Processing Systems (pp. 13266-13276).
↑ PyTorch Distributions. https://pytorch.org/docs/stable/distributions.html#continuousbernoulli
↑ Tensorflow Probability. https://www.tensorflow.org/probability/api_docs/python/tfp/edward2/ContinuousBernoulli
↑ Kingma, D. P., & Welling, M. (2013). Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.
↑ Kingma, D. P., & Welling, M. (2014, April). Stochastic gradient VB and the variational auto-encoder. In Second International Conference on Learning Representations, ICLR (Vol. 19).
↑ Larsen, A. B. L., Sønderby, S. K., Larochelle, H., & Winther, O. (2016, June). Autoencoding beyond pixels using a learned similarity metric. In International conference on machine learning (pp. 1558-1566).
↑ Jiang, Z., Zheng, Y., Tan, H., Tang, B., & Zhou, H. (2017, August). Variational deep embedding: an unsupervised and generative approach to clustering. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (pp. 1965-1972).
↑ PyTorch VAE tutorial: https://github.com/pytorch/examples/tree/master/vae.
↑ Keras VAE tutorial: https://blog.keras.io/building-autoencoders-in-keras.html.
↑ Lee, C. J.; Dahl, B. K.; Ovaskainen, O.; Dunson, D. B. (2025). Scalable and robust regression models for continuous proportional data. arXiv preprint arXiv:2504.15269. https://arxiv.org/abs/2504.15269
↑ Gordon-Rodriguez, E., Loaiza-Ganem, G., & Cunningham, J. P. (2020). The continuous categorical: a novel simplex-valued exponential family. In 36th International Conference on Machine Learning, ICML 2020. International Machine Learning Society (IMLS).

0.00

(0 votes)

Original source: https://en.wikipedia.org/wiki/Continuous Bernoulli distribution. Read more

[1] Loaiza-Ganem, G., & Cunningham, J. P. (2019). The continuous Bernoulli: fixing a pervasive error in variational autoencoders. In Advances in Neural Information Processing Systems (pp. 13266-13276).

[2] PyTorch Distributions. https://pytorch.org/docs/stable/distributions.html#continuousbernoulli

[3] Tensorflow Probability. https://www.tensorflow.org/probability/api_docs/python/tfp/edward2/ContinuousBernoulli

[4] Kingma, D. P., & Welling, M. (2013). Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.

[5] Kingma, D. P., & Welling, M. (2014, April). Stochastic gradient VB and the variational auto-encoder. In Second International Conference on Learning Representations, ICLR (Vol. 19).

[6] Larsen, A. B. L., Sønderby, S. K., Larochelle, H., & Winther, O. (2016, June). Autoencoding beyond pixels using a learned similarity metric. In International conference on machine learning (pp. 1558-1566).

[7] Jiang, Z., Zheng, Y., Tan, H., Tang, B., & Zhou, H. (2017, August). Variational deep embedding: an unsupervised and generative approach to clustering. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (pp. 1965-1972).

[8] PyTorch VAE tutorial: https://github.com/pytorch/examples/tree/master/vae.

[9] Keras VAE tutorial: https://blog.keras.io/building-autoencoders-in-keras.html.

[Lee2025-10] Lee, C. J.; Dahl, B. K.; Ovaskainen, O.; Dunson, D. B. (2025). Scalable and robust regression models for continuous proportional data. arXiv preprint arXiv:2504.15269. https://arxiv.org/abs/2504.15269

[11] Gordon-Rodriguez, E., Loaiza-Ganem, G., & Cunningham, J. P. (2020). The continuous categorical: a novel simplex-valued exponential family. In 36th International Conference on Machine Learning, ICML 2020. International Machine Learning Society (IMLS).

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]