In mathematics, there are many kinds of inequalities connected with matrices and linear operators on Hilbert spaces. This article reviews some of the most important operator inequalities connected with traces of matrices.
Let \(\mathbf{H}_n\) denote the space of Hermitian \(n\times n\) matrices and \(\mathbf{H}_n^+\) denote the set consisting of positive semi-definite \(n\times n\) Hermitian matrices. For operators on an infinite dimensional Hilbert space we require that they be trace class and self-adjoint, in which case similar definitions apply, but we discuss only matrices, for simplicity.
For any real-valued function \(f\) on an interval \(I\subset \mathbb{R}\) one can define a matrix function \(f(A)\) for any operator \(A\in\mathbf{H}_n\) with eigenvalues \(\lambda \) in \(I\) by defining it on the eigenvalues and corresponding projectors \( P \) as \(f(A)=\sum_j f(\lambda_j)P_j,\) with the spectral decomposition \(A=\sum_j\lambda_j P_j. \)
A function \(f:I\rightarrow \mathbb{R}\) defined on an interval \(I\subset\mathbb{R}\) is said to be operator monotone if for all \(n\), and all \(A,B\in \mathbf{H}_n\) with eigenvalues in \(I\), the following holds: \[ A \geq B \Rightarrow f(A) \geq f(B), \] where the inequality \(A\geq B\) means that the operator \(A-B\geq 0 \) is positive semi-definite.
A function \(f: I \rightarrow \mathbb{R}\) is said to be operator convex if for all \(n\) and all \(A,B\in \mathbf{H}_n\) with eigenvalues in \(I\), and \(0 < \lambda < 1\), the following holds \[ f(\lambda A + (1-\lambda)B) \leq \lambda f(A) + (1 -\lambda)f(B) . \] Note that the operator \(\lambda A + (1-\lambda)B \) has eigenvalues in \(I\), since \( A\) and \(B \) have eigenvalues in \(I\).
A function \(f\) is operator concave if \(-f\) is operator convex, i.e. the inequality above for \(f\) is reversed.
A function \(g: I\times J \rightarrow \mathbb{R}\), defined on intervals \(I,J\subset \mathbb{R} \) is said to be jointly convex if for all \(n\) and all \(A_1, A_2\in \mathbf{H}_n\) with eigenvalues in \(I\) and all \(B_1,B_2\in \mathbf{H}_n\) with eigenvalues in \(J\), and any \( 0\leq \lambda\leq 1\) the following holds \[ g(\lambda A_1 + (1-\lambda)A_2,\lambda B_1 + (1-\lambda)B_2 ) \leq \lambda g(A_1, B_1) + (1 -\lambda)g(A_2, B_2). \]
A function \(g\) is jointly concave if \(-g\) is jointly convex, i.e. the inequality above for \(g\) is reversed.
Given a function \(f : \mathbb{R} \rightarrow \mathbb{R}\), the associated trace function on \(\mathbf{H}_n\) is given by \[ A\mapsto {\rm Tr} f(A)=\sum_j f(\lambda_j),\] where \(A\) has eigenvalues \(\lambda \) and \({\rm Tr} \) stands for a trace of the operator.
Let \(f : \mathbb{R} \rightarrow \mathbb{R}\) be continuous, and let \(n\) be any integer.
Then if \(t\mapsto f(t)\) is monotone increasing, so is \(A \mapsto {\rm Tr} f(A)\) on \(\mathbf{H}_n\).
Likewise, if \(t \mapsto f(t)\) is convex, so is \(A \mapsto {\rm Tr} f(A)\) on \(\mathbf{H}_n\), and
it is strictly convex if \(f\) is strictly convex.
See proof and discussion in Carlen (2009), for example.
For \(-1\leq p \leq 0\), the function \(f(t) = -t^p\) is operator monotone and operator concave.
For \(0 \leq p \leq 1\), the function \(f(t) = t^p\) is operator monotone and operator concave.
For \(1 \leq p \leq 2\), the function \(f(t) = t^p\) and operator convex.
Furthermore, \(f(t) = \log(t)\) is operator concave and operator monotone, while \(f(t) = t \log(t)\) is operator convex.
The original proof of this theorem is due to K. Löwner Löwner (1934), where he gave a necessary and sufficient condition for \(f\) to be operator monotone. An elementary proof of the theorem is discussed in Carlen (2009) and a more general version of it in Donoghue (1974).
For all Hermitian \(n\times n\) matrices \(A\) and \(B\) and all differentiable convex functions \(f : \mathbb{R} \rightarrow \mathbb{R}\) with derivative \(f' \), or for all positive-definite Hermitian \(n\times n\) matrices \(A\) and \(B\), and all differentiable convex functions \(f:(0,\infty)\rightarrow\mathbb{R}\) the following inequality holds \[ {\rm Tr}[f(A)- f(B)- (A - B)f'(B)] \geq 0.\]
In either case, if \(f\) is strictly convex, there is equality if and only if \(A = B\).
Let \(C = A - B\) so that for \(0 < t < 1\), \(B + tC = (1 -t)B + tA\). Define \(\phi(t) = {\rm Tr}[f(B + tC)]\). By convexity and monotonicity of trace functions, \(\phi\) is convex, and so for all \(0 < t < 1\), \[ \phi(1) = \phi(0) \geq \frac{\phi(t) - \phi(0)}{t},\]
and in fact the right hand side is monotone decreasing in \(t\). Taking the limit \(t \rightarrow 0\) yields Klein's inequality.
Note that if \(f\) is strictly convex and \(C \neq 0\), then \(\phi\) is strictly convex. The final assertion follows from this and the fact that \(\frac{\phi(t) -\phi(0)}{t}\) is monotone decreasing in \(t\).
In 1965, Golden (1965) and Thompson (1965) independently discovered that
For any matrices \(A, B\in\mathbf{H}_n\), \[{\rm Tr}\, e^{A+B}\leq {\rm Tr}\, e^A e^B.\]
This inequality can be generalized for three operators Lieb (1973): for non-negative operators \(A, B, C\in\mathbf{H}_n^+\), \[{\rm Tr} \, e^{\ln A -\ln B+\ln C}\leq \int_0^\infty dt\, {\rm Tr}\, A(B+t)^{-1}C(B+t)^{-1}.\]
Let \(R, F\in \mathbf{H}_n\) be such that \({\rm Tr}\, e^R=1\). Define \(f={\rm Tr}\, Fe^R\), then \[{\rm Tr}\, e^{F}e^R \geq {\rm Tr}\, e^{F+R}\geq e^f.\]
The proof of this inequality follows from Klein's inequality. Take \(f(x)=e^x\), \(A=R+F\) and \(B=R+fI\). See Ruelle (1969).
Let \(H\) be a self-adjoint operator such that \(e^{-H}\) is trace class. Then for any \(\gamma\geq 0 \) with \({\rm Tr}\,\gamma=1,\) \[{\rm Tr}\, \gamma H+{\rm Tr}\, \gamma\ln\gamma\geq -\ln {\rm Tr}\, e^{-H},\] with equality if and only if \(\gamma={\rm exp}(-H)/{\rm Tr}\, {\rm exp}(-H)\).
The following theorem was proved by Lieb (1973). It proves and generalizes a conjecture of Wigner, Yanase and Dyson (1964). Six years later other proofs were given by Ando (1979) and Simon (1979), and several more have been given since then.
For all \(m\times n\) matrices \(K\), and all \(q \) and \(r\) such that \(0 \leq q\leq 1\) and \(0\leq r \leq 1\), with \(q + r \leq 1\) the real valued map on \(\mathbf{H}^+_m \times \mathbf{H}^+_n\) given by \[ F(A,B,K) = {\rm Tr}(K^*A^qKB^r) \]
Here \(K^* \) stands for the adjoint operator of \(K.\)
For a fixed Hermitian matrix \(L\in\mathbf{H}_n\), the function \[ f(A)={\rm Tr} \,\exp\{L+\ln A\} \] is concave on \(\mathbf{H}_n^+\).
The theorem and proof are due to Lieb (1973), Thm 6, where he obtains this theorem as a corollary of Lieb's concavity Theorem. The most direct proof is due to Epstein (1973); see Ruskai (2002), (2007) papers for a review of this argument.
Ando's proof (1979) of Lieb's concavity theorem led to the following significant complement to it:
For all \(m \times n\) matrices \(K\), and all \(1 \leq q \leq 2\) and \(0 \leq r \leq 1\) with \(q-r \geq 1\), the real valued map on \(\mathbf{H}^+_m \times \mathbf{H}^+_n\) given by \[ (A,B) \mapsto {\rm Tr}(K^*A^qKB^{-r})\] is convex.
For two operators \(A, B\in\mathbf{H}^+_n \) define the following map \[ R(A\|B):= {\rm Tr}(A\log A) - {\rm Tr}(A\log B).\]
For density matrices \(\rho\) and \(\sigma\), the map \(R(\rho\|\sigma)=S(\rho\|\sigma)\) is the Umegaki's quantum relative entropy.
Note that the non-negativity of \(R(A\|B)\) follows from Klein's inequality with \(f(x)=x\log x\).
The map \(R(A\|B): \mathbf{H}^+_n \times \mathbf{H}^+_n \rightarrow \mathbf{R}\) is jointly convex.
For all \(0 < p < 1\), \((A,B) \mapsto Tr(B^{1-p}A^p)\) is jointly concave, by Lieb's concavity theorem, and thus \[(A,B)\mapsto \frac{1}{p-1}({\rm Tr}(B^{1-p}A^p)-{\rm Tr}\, A)\] is convex. But \[\lim_{p\rightarrow 1}\frac{1}{p-1}({\rm Tr}(B^{1-p}A^p)-{\rm Tr}\, A)=R(A\|B),\] and convexity is preserved in the limit.
The proof is due to Lindblad (1974).
The operator version of Jensen's inequality is due to Davis (1957).
A continuous, real function \(f\) on an interval \(I\) satisfies Jensen's Operator Inequality if the following holds \[ f\left(\sum_kA_k^*X_kA_k\right)\leq\sum_k A_k^*f(X_k)A_k, \] for operators \(\{A_k\}_k\) with \(\sum_k A^*_kA_k=1\) and for self-adjoint operators \(\{X_k\}_k\) with spectrum on \(I\).
See Hansen and Pedersen (2003), Davis (1957) for the proof of the following two theorems.
Let \(f\) be a continuous function defined on an interval \(I\) and let \(m\) and \(n\) be natural numbers. If \(f\) is convex we then have the inequality \[ {\rm Tr}\Bigl(f\Bigl(\sum_{k=1}^nA_k^*X_kA_k\Bigr)\Bigr)\leq {\rm Tr}\Bigl(\sum_{k=1}^n A_k^*f(X_k)A_k\Bigr),\] for all \((X_1, \ldots , X_n)\) self-adjoin \(m\times m\) matrices with spectra contained in \(I\) and all \((A_1, \ldots , A_n)\) of \(m \times m\) matrices with \(\sum_{k=1}^nA_k^*A_k=1\).
Conversely, if the above inequality is satisfied for some \(n\) and \(m\), where \(n > 1\), then \(f\) is convex.
For a continuous function \(f\) defined on an interval \(I\) the following conditions are equivalent:
\[ f\Bigl(\sum_{k=1}^nA_k^*X_kA_k\Bigr)\leq\sum_{k=1}^n A_k^*f(X_k)A_k, \] for all \((X_1, \ldots , X_n)\) bounded, self-adjoint operators on an arbitrary Hilbert space \(\mathcal{H}\) with spectra contained in \(I\) and all \((A_1, \ldots , A_n)\) on \(\mathcal{H}\) with \(\sum_{k=1}^n A^*_kA_k=1\).
every self-adjoint operator \(X\) with spectrum in \(I\).
Lieb and Thirring (1976) proved the following inequality: For any \( A\geq 0 \), \(B\geq 0 \) and \(r\geq 1, \) \[{\rm Tr} (B^{1/2}A^{1/2}B^{1/2})^r\leq {\rm Tr} B^{r/2}A^{r/2}B^{r/2}.\]
Araki (1990) generalized the above inequality to the following one: For any \(A\geq 0 \), \(B\geq 0 \) and \(q\geq 0, \) \[{\rm Tr}(B^{1/2}AB^{1/2})^{rq}\leq {\rm Tr}(B^{r/2}A^rB^{r/2})^q,\] for \(r\geq 1, \) and \[{\rm Tr}(B^{r/2}A^rB^{r/2})^q\leq {\rm Tr}(B^{1/2}AB^{1/2})^{rq},\] for \(0\leq r\leq 1. \)
Effos (2009) proved the following theorem.
If \(f(x)\) is an operator convex function, and \(L\) and \(R\) are commuting bounded linear operators, i.e. the commutator \([L,R]=LR-RL=0\), the perspective \[g(L, R):=f(L/R)R \] is jointly convex, i.e. if \(L=\lambda L_1+(1-\lambda)L_2\) and \(R=\lambda R_1+(1-\lambda)R_2\) with \([L_i, R_i]=0\) (i=1,2), \(0\leq\lambda\leq 1\), \[g(L,R)\leq \lambda g(L_1,R_1)+(1-\lambda)g(L_2,R_2).\]