Directed information is an information theory measure that quantifies the information flow from the random string [math]\displaystyle{ X^n = (X_1,X_2,\dots,X_n) }[/math] to the random string [math]\displaystyle{ Y^n = (Y_1,Y_2,\dots,Y_n) }[/math]. The term directed information was coined by James Massey and is defined as[1]
where [math]\displaystyle{ I(X^{i};Y_i|Y^{i-1}) }[/math] is the conditional mutual information[math]\displaystyle{ I(X_1,X_2,...,X_{i};Y_i|Y_1,Y_2,...,Y_{i-1}) }[/math].
Directed information has applications to problems where causality plays an important role such as the capacity of channels with feedback,[1][2][3][4] capacity of discrete memoryless networks,[5] capacity of networks with in-block memory,[6] gambling with causal side information,[7]compression with causal side information,[8] real-time control communication settings,[9][10] and statistical physics.[11]
The essence of directed information is causal conditioning. The probability of [math]\displaystyle{ y^n }[/math] causally conditioned on [math]\displaystyle{ x^n }[/math] is defined as[5]
This is similar to the chain rule for conventional conditioning [math]\displaystyle{ P(x^n|y^n) = \prod_{i=1}^n P(x_i|x^{i-1},y^{n}) }[/math] except one conditions on "past" and "present" symbols [math]\displaystyle{ y^{i} }[/math] rather than all symbols [math]\displaystyle{ y^{n} }[/math]. To include "past" symbols only, one can introduce a delay by prepending a constant symbol:
It is common to abuse notation by writing [math]\displaystyle{ P(x^n||y^{n-1}) }[/math] for this expression, although formally all strings should have the same number of symbols.
One may also condition on multiple strings: [math]\displaystyle{ P(x^n||y^n,z^n) \triangleq \prod_{i=1}^n P(x_i|x^{i-1},y^{i},z^{i}) }[/math].
Causally conditioned entropy
The causally conditioned entropy is defined as:[2]
Similarly, one may causally condition on multiple strings and write
[math]\displaystyle{ H(X^n || Y^n,Z^n)=\mathbf E\left[ -\log {P(X^n||Y^n,Z^n)} \right] }[/math].
Properties
A decomposition rule for causal conditioning[1] is
This rule shows that any product of [math]\displaystyle{ P(x^n||y^{n-1}), P(y^n||x^n) }[/math] gives a joint distribution [math]\displaystyle{ P(x^n, y^n) }[/math].
The causal conditioning probability[math]\displaystyle{ P(y^n||x^n) = \prod_{i=1}^n P(y_i|y^{i-1},x^{i}) }[/math] is a probability vector, i.e.,
The relation generalizes to three strings: the directed information flowing from [math]\displaystyle{ X^n }[/math] to [math]\displaystyle{ Y^n }[/math] causally conditioned on [math]\displaystyle{ Z^n }[/math] is
This law, established by James Massey and his son Peter Massey,[12] gives intuition by relating directed information and mutual information. The law states that for any [math]\displaystyle{ X^n, Y^n }[/math], the following equality holds:
Estimating and optimizing the directed information is challenging because it has [math]\displaystyle{ n }[/math] terms where [math]\displaystyle{ n }[/math] may be large. In many cases, one is interested in optimizing the limiting average, that is, when [math]\displaystyle{ n }[/math] grows to infinity termed as a multi-letter expression.
Estimation
Estimating directed information from samples is a hard problem since the directed information expression does not depend on samples but on the joint distribution [math]\displaystyle{ \{P(x_i,y_i|x^{i-1},y^{i-1})_{i=1}^n\} }[/math] which may be unknown. There are several algorithms based on context tree weighting[14] and empirical parametric distributions[15] and using long short-term memory.[16]
Optimization
Maximizing directed information is a fundamental problem in information theory. For example, given the channel distributions [math]\displaystyle{ \{P(y_i|x^{i},y^{i-1}\}_{i=1}^n) }[/math], the objective might be to optimize [math]\displaystyle{ I(X^n\to Y^n) }[/math] over the channel input distributions [math]\displaystyle{ \{P(x_i|x^{i-1},y^{i-1}\}_{i=1}^n) }[/math].
Massey's directed information was motivated by Marko's early work (1966) on developing a theory of bidirectional communication.[25][26] Marko's definition of directed transinformation differs slightly from Massey's in that, at time [math]\displaystyle{ n }[/math], one conditions on past symbols [math]\displaystyle{ X^{n-1},Y^{n-1} }[/math] only and one takes limits:
and developed the conservation law [math]\displaystyle{ F_{1}+F_{2} = R_{1}+R_{2}+K = H_{1}+H_{2}-K }[/math] and several bounds.
Relation to transfer entropy
Directed information is related to transfer entropy, which is a truncated version of Marko's directed transinformation [math]\displaystyle{ T_{21} }[/math].
The transfer entropy at time [math]\displaystyle{ i }[/math] and with memory [math]\displaystyle{ d }[/math] is
where one does not include the present symbol [math]\displaystyle{ X_i }[/math] or the past symbols [math]\displaystyle{ X^{i-d-1},Y^{i-d-1} }[/math] before time [math]\displaystyle{ i-d }[/math].
Transfer entropy usually assumes stationarity, i.e., [math]\displaystyle{ T_{X \to Y} }[/math] does not depend on the time [math]\displaystyle{ i }[/math].
References
↑ 1.01.11.2Massey, James (1990). "Causality, Feedback And Directed Information".
↑Permuter, Haim Henry; Weissman, Tsachy; Goldsmith, Andrea J. (February 2009). "Finite State Channels With Time-Invariant Deterministic Feedback". IEEE Transactions on Information Theory55 (2): 644–662. doi:10.1109/TIT.2008.2009849.
↑ 5.05.1Kramer, G. (January 2003). "Capacity results for the discrete memoryless network". IEEE Transactions on Information Theory49 (1): 4–21. doi:10.1109/TIT.2002.806135.
↑Kramer, Gerhard (April 2014). "Information Networks With In-Block Memory". IEEE Transactions on Information Theory60 (4): 2105–2120. doi:10.1109/TIT.2014.2303120.
↑Permuter, Haim H.; Kim, Young-Han; Weissman, Tsachy (June 2011). "Interpretations of Directed Information in Portfolio Theory, Data Compression, and Hypothesis Testing". IEEE Transactions on Information Theory57 (6): 3248–3259. doi:10.1109/TIT.2011.2136270.
↑Simeone, Osvaldo; Permuter, Haim Henri (June 2013). "Source Coding When the Side Information May Be Delayed". IEEE Transactions on Information Theory59 (6): 3607–3618. doi:10.1109/TIT.2013.2248192.
↑Charalambous, Charalambos D.; Stavrou, Photios A. (August 2016). "Directed Information on Abstract Spaces: Properties and Variational Equalities". IEEE Transactions on Information Theory62 (11): 6019–6052. doi:10.1109/TIT.2016.2604846.
↑Vinkler, Dror A; Permuter, Haim H; Merhav, Neri (20 April 2016). "Analogy between gambling and measurement-based work extraction". Journal of Statistical Mechanics: Theory and Experiment2016 (4): 043403. doi:10.1088/1742-5468/2016/04/043403. Bibcode: 2016JSMTE..04.3403V.
↑Massey, J.L.; Massey, P.C. (September 2005). "Conservation of mutual and directed information". Proceedings. International Symposium on Information Theory, 2005. ISIT 2005. pp. 157–158. doi:10.1109/ISIT.2005.1523313. ISBN0-7803-9151-9.
↑Amblard, Pierre-Olivier; Michel, Olivier (28 December 2012). "The Relation between Granger Causality and Directed Information Theory: A Review". Entropy15 (1): 113–143. doi:10.3390/e15010113. Bibcode: 2012Entrp..15..113A.
↑Jiao, Jiantao; Permuter, Haim H.; Zhao, Lei; Kim, Young-Han; Weissman, Tsachy (October 2013). "Universal Estimation of Directed Information". IEEE Transactions on Information Theory59 (10): 6220–6242. doi:10.1109/TIT.2013.2267934.
↑Quinn, Christopher J.; Kiyavash, Negar; Coleman, Todd P. (December 2015). "Directed Information Graphs". IEEE Transactions on Information Theory61 (12): 6887–6909. doi:10.1109/TIT.2015.2478440.
↑ 16.016.116.2Aharoni, Ziv; Tsur, Dor; Goldfeld, Ziv; Permuter, Haim Henry (June 2020). "Capacity of Continuous Channels with Memory via Directed Information Neural Estimator". 2020 IEEE International Symposium on Information Theory (ISIT). pp. 2014–2019. doi:10.1109/ISIT44484.2020.9174109. ISBN978-1-7281-6432-8.
↑ 17.017.1Naiss, Iddo; Permuter, Haim H. (January 2013). "Extension of the Blahut–Arimoto Algorithm for Maximizing Directed Information". IEEE Transactions on Information Theory59 (1): 204–222. doi:10.1109/TIT.2012.2214202.
↑ 18.018.1Permuter, Haim; Cuff, Paul; Van Roy, Benjamin; Weissman, Tsachy (July 2008). "Capacity of the Trapdoor Channel With Feedback". IEEE Transactions on Information Theory54 (7): 3150–3165. doi:10.1109/TIT.2008.924681.
↑ 19.019.1Elishco, Ohad; Permuter, Haim (September 2014). "Capacity and Coding for the Ising Channel With Feedback". IEEE Transactions on Information Theory60 (9): 5138–5149. doi:10.1109/TIT.2014.2331951.
↑ 20.020.1Sabag, Oron; Permuter, Haim H.; Kashyap, Navin (January 2016). "The Feedback Capacity of the Binary Erasure Channel With a No-Consecutive-Ones Input Constraint". IEEE Transactions on Information Theory62 (1): 8–22. doi:10.1109/TIT.2015.2495239.
↑ 21.021.1Peled, Ori; Sabag, Oron; Permuter, Haim H. (July 2019). "Feedback Capacity and Coding for the $(0,k)$ -RLL Input-Constrained BEC". IEEE Transactions on Information Theory65 (7): 4097–4114. doi:10.1109/TIT.2019.2903252.
↑ 22.022.1Aharoni, Ziv; Sabag, Oron; Permuter, Haim Henri (18 August 2020). "Reinforcement Learning Evaluation and Solution for the Feedback Capacity of the Ising Channel with Large Alphabet". arXiv:2008.07983 [cs.IT].
↑Sabag, Oron; Permuter, Haim Henry; Pfister, Henry (March 2017). "A Single-Letter Upper Bound on the Feedback Capacity of Unifilar Finite-State Channels". IEEE Transactions on Information Theory63 (3): 1392–1409. doi:10.1109/TIT.2016.2636851.
↑Sabag, Oron; Huleihel, Bashar; Permuter, Haim Henry (2020). "Graph-Based Encoders and their Performance for Finite-State Channels with Feedback". IEEE Transactions on Communications68 (4): 2106–2117. doi:10.1109/TCOMM.2020.2965454.
↑Marko, H. (December 1973). "The Bidirectional Communication Theory--A Generalization of Information Theory". IEEE Transactions on Communications21 (12): 1345–1351. doi:10.1109/TCOM.1973.1091610.
0.00
(0 votes)
Original source: https://en.wikipedia.org/wiki/Directed information. Read more