From Scholarpedia - Reading time: 10 min
Strong subadditivity of entropy (SSA) was long known and appreciated in classical probability theory and information theory. Its extension to quantum mechanical entropy (the von Neumann entropy) was conjectured by Robinson and Ruelle (1966) and Lanford III and Robinson (1968) and proved by Lieb and Ruskai (1973). It is a basic theorem in modern quantum information theory.
SSA concerns the relation between the entropies of various subsystems of a larger system consisting of three subsystems (or of one system with three degrees of freedom). The proof of this relation in the classical case is quite easy but the quantum case is difficult because of the non-commutativity of the density matrices describing the subsystems.
We will use the following notation throughout: A Hilbert space is denoted by \(\mathcal{H}\), and \( \mathcal{B}(\mathcal{H})\) denotes the bounded linear operators on \(\mathcal{H}\). Tensor products are denoted by superscripts, e.g., \(\mathcal{H}^{12}=\mathcal{H}^1\otimes \mathcal{H}^2\). The trace is denoted by \({\rm Tr}\).
A density matrix is a Hermitian, positive semi-definite matrix of trace one. It describes a quantum system in a mixed state. Density matrices on a tensor product are denoted by superscripts, e.g., \(\rho^{12}\) is a density matrix on \(\mathcal{H}^{12}\).
The von Neumann quantum entropy of a density matrix \(\rho\) is \[S(\rho):=-{\rm Tr}(\rho\log \rho).\]
Umegaki's quantum relative entropy of two density matrices \(\rho\) and \(\sigma\) is \[S(\rho||\sigma)={\rm Tr}(\rho\log\rho-\rho\log\sigma)\geq 0. \]
A function \(g\) of two variables is said to be jointly concave if for any \( 0\leq \lambda\leq 1\) the following holds \[ g(\lambda A_1 + (1-\lambda)A_2,\lambda B_1 + (1-\lambda)B_2 ) \geq \lambda g(A_1, B_1) + (1 -\lambda)g(A_2, B_2). \]
Ordinary subadditivity, see Araki and Lieb (1970), concerns only two spaces \(\mathcal{H}^{12}\) and a density matrix \(\rho^{12}\). It states that \[ S(\rho^{12}) \leq S(\rho^1) +S(\rho^2). \] This inequality is true, of course, in classical probability theory, but the latter also contains the theorem that the conditional entropies \( S(\rho^{12} | \rho^1)= S(\rho^{12} )-S(\rho^1)\) and \( S(\rho^{12} | \rho^2)=S(\rho^{12} ) -S(\rho^2)\) are both non-negative. In the quantum case, however, both can be negative, e.g. \( S(\rho^{12}) \) can be zero while \( S(\rho^1) = S(\rho^{12}) >0\). Nevertheless, the subadditivity upper bound on \( S(\rho^{12}) \) continues to hold. The closest thing one has to \( S(\rho^{12})- S(\rho^1)\geq 0 \) is the Araki–Lieb triangle inequality \[ S(\rho^{12}) \geq |S(\rho^1) -S(\rho^2)|, \] which is derived by Araki and Lieb (1970) from subadditivity by a mathematical technique known as 'purification'.
Suppose that the Hilbert space of the system is a tensor product of three spaces\[\mathcal{H}=\mathcal{H}^1\otimes \mathcal{H}^2\otimes \mathcal{H}^3.\] Physically, these three spaces can be interpreted as the space of three different systems, or else as three parts or three degrees of freedom of one physical system.
Given a density matrix \(\rho^{123}\) on \(\mathcal{H}\), we define a density matrix \(\rho^{12}\) on \(\mathcal{H}^1\otimes \mathcal{H}^2\) as a partial trace \(\rho^{12}={\rm Tr}_{\mathcal{H}^3} \rho^{123}\). Similarly, we can define density matrices \(\rho^{23}\), \(\rho^{13}\), \(\rho^1\), \(\rho^2\), \(\rho^3\).
For any tri-partite state \(\rho^{123}\) the following holds \[S(\rho^{123})+S(\rho^2)\leq S(\rho^{12})+S(\rho^{23}),\] where \( S(\rho^{12})=-{\rm Tr}_{\mathcal{H}^{12}} \rho^{12} \log \rho^{12}\), for example.
This was improved in the following way by Carlen and Lieb (2012) \[S(\rho^{12})+S(\rho^{23})-S(\rho^{123})-S(\rho^2) \geq 2\max\{S(\rho^1)-S(\rho^{12}),S(\rho^2)-S(\rho^{12}),0 )\}, \] with the optimal constant \(2\).
As mentioned above, SSA was first proved by Lieb and Ruskai (1973), using Lieb's theorem that was proved by Lieb (1973). The extension from a Hilbert space setting to a von Neumann algebra setting, where states are not given by density matrices, was done by Narnhofer and Thirring (1975).
The theorem can also be obtained by proving numerous equivalent statements, some of which are summarized below.
Wigner and Yanase (1963) proposed a different definition of entropy, which was generalized by F.J. Dyson.
The Wigner–Yanase–Dyson \(p\)-skew information of a density matrix \(\rho\). with respect to an operator \(K\) is \[ I_p(\rho, K)=\frac{1}{2}{\rm Tr}[\rho^p, K^*][\rho^{1-p}, K],\] where \([A,B]=AB-BA\) is a commutator, \( K^* \) is the adjoint of \(K\) and \(0\leq p\leq 1\) is fixed.
It was conjectured by Wigner and Yanase (1964) that \(p\)- skew information is concave as a function of a density matrix \(\rho\) for a fixed \(0\leq p\leq 1\).
Since the term \(-\tfrac{1}{2}{\rm Tr}\rho K^2\) is concave (it is linear), the conjecture reduces to the problem of concavity of \(Tr\rho^p K\rho^{1-p}K\). As noted in (Lieb, 1973), this conjecture (for all \( 0 \leq p \leq 1\)) implies SSA, and was proved for \( p= \tfrac{1}{2}\) in (Wigner, Yanase, 1964), and for all \( 0\leq p \leq 1 \) in (Lieb, 1973) in the following more general form: The function of two matrix variables \[\tag{1} A, B \mapsto {\rm Tr} A^{r}K^*B^pK \] is jointly concave in \( A\) and \( B,\) when \(0\leq r\leq 1\) and \(p+r \leq 1\).
This theorem is an essential part of the proof of SSA in (Lieb, Ruskai, 1973).
In their paper Wigner and Yanase (1964) also conjectured the subadditivity of \(p\)-skew information for \(p=\tfrac{1}{2}\), which was disproved by Hansen (2007) by giving a counterexample.
It was pointed out in (Araki, Lieb, 1970) that the first statement below is equivalent to SSA and Ulhmann (1973) showed the equivalence between the second statement below and SSA.
Both of these statements were proved directly by Lieb and Ruskai (1973).
As noted by Lindblad (1974) and Uhlmann (1977), if, in equation (1), one takes \( K=1\) and \( r=1-p, A=\rho\) and \(B=\sigma\) and differentiates in \( p\) at \(p=0\) one obtains the Joint convexity of relative entropy : i.e., if \(\rho=\sum_k\lambda_k\rho_k\), and \(\sigma=\sum_k\lambda_k\sigma_k\), then \[ \tag{2} S\Bigl(\sum_k \lambda_k\rho_k||\sum_k\lambda_k \sigma_k \Bigr)\leq \sum_k\lambda_k S(\rho_k||\sigma_k),\] where \(\lambda_k\geq 0\) with \(\sum_k\lambda_k=1\).
The relative entropy decreases monotonically under certain operations on density matrices, the most important and basic of which is the following. Consider the map \(T\) from \( \mathcal{B}(\mathcal{H}^{12}) \rightarrow \mathcal{B}(\mathcal{H}^{12})\) given by \(T=1_{\mathcal{H}^1}\otimes Tr_{\mathcal{H}^2}\) . Then
\[ \tag{3} S(T\rho||T\sigma)\leq S(\rho||\sigma),\]
which is called Monotonicity of quantum relative entropy under partial trace.
To see how this follows from the joint convexity of relative entropy, observe that \( T\) can be written in Uhlmann's representation as \[ T(\rho^{12} ) = N^{-1} \sum_{j=1}^N (1_{\mathcal{H}^1}\otimes U_j) \rho^{12}(1_{\mathcal{H}^1}\otimes U_j^*), \] for some finite \( N\) and some collection of unitary matrices on \( \mathcal{H}^2 \) (alternatively, integrate over Haar measure). Since the trace (and hence the relative entropy) is unitarily invariant, inequality (3) now follows from (2). This theorem is due to Lindblad (1974) and Ulhmann (1973), whose proof is the one given here.
SSA is obtained from (3) with \( \mathcal{H}^1 \) replaced by \( \mathcal{H}^{12} \) and \( \mathcal{H}^2 \) replaced \( \mathcal{H}^3 \). Take \( \rho = \rho^{123}, \sigma = \rho^1\otimes \rho^{23}, T= 1_{\mathcal{H}^{12}}\otimes Tr_{\mathcal{H}^3}\). Then (3) becomes \[ S(\rho^{12}||\rho^1\otimes \rho^2)\leq S(\rho^{123}||\rho^1\otimes\rho^{23}).\]
Therefore, \[S(\rho^{123}||\rho^1\otimes\rho^{23})- S(\rho^{12}||\rho^1\otimes \rho^2)=S(\rho^{12})+S(\rho^{23})-S(\rho^{123})-S(\rho^2)\geq 0, \] which is SSA. Thus, the monotonicity of quantum relative entropy (which follows from (1)) implies SSA.
Owing to the Stinespring factorization theorem, equation (3) is valid not only for partial traces but also when \(T\) is a quantum operation, i.e., a completely positive, trace preserving map. In this general case the inequality is called Monotonicity of quantum relative entropy.
All of the above important inequalities are equivalent to each other, and can also be proved directly. The following are equivalent:
The following implications show the equivalence between these inequalities.
\(\rho_{12}\mapsto S(\rho_1)-S(\rho_{12})\) is convex. In (Lieb, Ruskai, 1973) it was observed that this convexity yields MPT;
\[ S(\rho_4)+S(\rho_2)\leq S(\rho_{12})+S(\rho_{14}). \] Moreover, if \(\rho_{124}\) is pure, then \(S(\rho_2)=S(\rho_{14})\) and \(S(\rho_4)=S(\rho_{12})\), so the equality holds in the above inequality. Since the extreme points of the convex set of density matrices are pure states, SSA follows from JC;
See (Lieb, 1975), (Ruskai, 2002) for a discussion.
In (Petz, 1986) and (Petz, 1986) D. Petz showed that the only case of equality in the monotonicity relation is to have a proper "recovery" channel:
For all states \(\rho\) and \(\sigma\) on a Hilbert space \(\mathcal{H}\) and all quantum operators \(T: \mathcal{B}(\mathcal{H})\rightarrow \mathcal{B}(\mathcal{K})\),
\[ S(T\rho||T\sigma)= S(\rho||\sigma), \]
if and only if there exists a quantum operator \(\hat{T}\) such that
\[ \hat{T}T\sigma=\sigma,\] and \(\hat{T}T\rho=\rho.\)
Moreover, \(\hat{T}\) can be given explicitly be the formula
\[ \hat{T}\omega=\sigma^{1/2}T^*\Bigl((T\sigma)^{-1/2}\omega(T\sigma)^{-1/2} \Bigr)\sigma^{1/2}, \]
where \(T^*\) is the adjoint map of \(T\).
Petz (1986) also gave another condition when the equality holds in Monotonicity of quantum relative entropy: the first statement in Theorem below. Differentiating it at \(t=0\) we have the second condition. Moreover, M.B. Ruskai gave another proof of the second statement.
For all states \(\rho\) and \(\sigma\) on \(\mathcal{H}\) and all quantum operators \(T: \mathcal{B}(\mathcal{H})\rightarrow \mathcal{B}(\mathcal{K})\),
\[ S(T\rho||T\sigma)= S(\rho||\sigma),\]
if and only if the following equivalent conditions are satisfied:
where \(T^*\) is the adjoint map of \(T\).
Hayden et all (2003) described the states for which the equality holds in SSA.
A state \(\rho^{ABC}\) on a Hilbert space \(\mathcal{H}^A\otimes\mathcal{H}^B\otimes\mathcal{H}^C\) satisfies strong subadditivity with equality if and only if there is a decomposition of second system as \[ \mathcal{H}^B=\bigoplus_j \mathcal{H}^{B^L_j}\otimes \mathcal{H}^{B^R_j} \] into a direct sum of tensor products, such that \[ \rho^{ABC}=\bigoplus_j q_j\rho^{AB^L_j}\otimes\rho^{B^R_jC},\] with states \(\rho^{AB^L_j}\) on \(\mathcal{H}^A\otimes\mathcal{H}^{B^L_j}\) and \(\rho^{B^R_jC}\) on \(\mathcal{H}^{B^R_j}\otimes\mathcal{H}^C\), and a probability distribution \(\{q_j\}\).
E. H. Lieb and E.A. Carlen have found an explicit error term in the SSA inequality Carlen, Lieb (2012) , namely, \[ S(\rho^{12})+S(\rho^{23})-S(\rho^{123})-S(\rho^2)\geq 2\min\{0, S(\rho^1)-S(\rho^{12}), S(\rho^3)-S(\rho^{23})\}.\] If \(-S(\rho^{12}\|\rho^1)=S(\rho^1)-S(\rho^{12})\leq 0 \) and \(-S(\rho^{23}\|\rho^3)=S(\rho^3)-S(\rho^{23})\leq 0 \), as is always the case for the classical Shannon entropy, this inequality has nothing to say. For the quantum entropy, on the other hand, it is quite possible that the conditional entropies satisfy \(-S(\rho^{12}\|\rho^1)>0 \) and \(-S(\rho^{23}\|\rho^3)> 0 \) (but never both!). Then, in this "highly quantum" regime, this inequality provides additional information.
The constant 2 is optimal, in the sense that for any constant larger than 2, one can find a state for which the inequality is violated with that constant.
In his paper Kim (2012) studied an operator extension of strong subadditivity, proving the following inequality:
For a tri-partite state (density matrix) \(\rho^{123}\) on \(\mathcal{H}^1\otimes \mathcal{H}^2\otimes\mathcal{H}^3\), \[ Tr_{12}\Bigl(\rho^{123}(-\log(\rho^{12})-\log(\rho^{23})+\log(\rho^2)+\log(\rho^{123}))\Bigr) \geq 0.\]
The proof of this inequality is based on Effros's theorem, see (Effros, 2009), for which particular functions and operators are chosen to derive the inequality above. Ruskai (2012) describes this work in details and discusses how to prove a large class of new matrix inequalities in the tri-partite and bi-partite cases by taking a partial trace over all but one of the spaces.