对于目标函数:
f
=
log
(
log
(
∣
X
∣
)
)
f = \log (\log (\left| {\bf{X}}\right|))
f=log(log(∣X∣))
目标求取其关于变量
X
\mathbf{X}
X的梯度:
∇
f
=
∂
log
(
log
(
∣
X
∣
)
)
∂
X
(1)
\nabla f= \frac{{\partial \log (\log (\left| {\bf{X}} \right|))}}{{\partial {\bf{X}}}} \tag{1}
∇f=∂X∂log(log(∣X∣))(1)
其中
X
\mathbf{X}
X是矩阵。
由微分的定义,我们有:
d
f
=
t
r
(
(
∇
f
)
T
d
X
)
(2)
\mathrm{d}f = \mathrm{tr}((\nabla f)^T \mathrm{d}\mathbf{X}) \tag{2}
df=tr((∇f)TdX)(2)
注意, 这里的
f
f
f可以换做任何
X
\mathbf{X}
X的函数, (2)仍成立。 本文仍以
log
(
log
(
∣
X
∣
)
)
\log (\log (\left| {\bf{X}}\right|))
log(log(∣X∣))为例求解。
(2)式启示我们:将 d f \mathrm{d}f df 化成 d f = t r ( G d X ) \mathrm{d}f=\mathrm{tr}(\mathbf{G}\mathrm{d}\mathbf{X}) df=tr(GdX)的形式, 就可以得到 ∇ f = G T \nabla f = \mathbf{G}^T ∇f=GT.
对于
f
=
log
(
log
(
∣
X
∣
)
)
f = \log (\log (\left| {\bf{X}}\right|))
f=log(log(∣X∣)),有:
d
log
(
log
(
∣
X
∣
)
)
=
(
a
)
(
log
(
∣
X
∣
)
)
−
1
d
(
log
(
∣
X
∣
)
)
=
(
b
)
(
log
(
∣
X
∣
)
)
−
1
∣
X
∣
−
1
d
(
∣
X
∣
)
=
(
c
)
(
log
(
∣
X
∣
)
)
−
1
∣
X
∣
−
1
∣
X
∣
t
r
(
X
−
1
d
X
)
=
(
log
(
∣
X
∣
)
)
−
1
t
r
(
X
−
1
d
X
)
=
(
e
)
t
r
(
(
log
(
∣
X
∣
)
)
−
1
X
−
1
d
X
)
(3)
\begin{array}{l} {\rm{d}}\log (\log (\left| {\bf{X}} \right|))\\ \mathop {\rm{ = }}\limits^{(a)} {{\rm{(}}\log (\left| {\bf{X}} \right|))^{ - 1}}{\rm{d(}}\log (\left| {\bf{X}} \right|))\\ \mathop {\rm{ = }}\limits^{(b)} {{\rm{(}}\log (\left| {\bf{X}} \right|))^{ - 1}}{\left| {\bf{X}} \right|^{ - 1}}{\rm{d(}}\left| {\bf{X}} \right|)\\ \mathop {\rm{ = }}\limits^{(c)} {{\rm{(}}\log (\left| {\bf{X}} \right|))^{ - 1}}{\left| {\bf{X}} \right|^{ - 1}}\left| {\bf{X}} \right|{\rm{tr(}}{{\bf{X}}^{ - 1}}{\rm{d}}{\bf{X}}{\rm{)}}\\ {\rm{ = (}}\log (\left| {\bf{X}} \right|){)^{ - 1}}{\rm{tr(}}{{\bf{X}}^{ - 1}}{\rm{d}}{\bf{X}}{\rm{)}}\\ \mathop {\rm{ = }}\limits^{(e)} {\rm{tr((}}\log (\left| {\bf{X}} \right|){)^{ - 1}}{{\bf{X}}^{ - 1}}{\rm{d}}{\bf{X}}{\rm{)}} \tag{3} \end{array}
dlog(log(∣X∣))=(a)(log(∣X∣))−1d(log(∣X∣))=(b)(log(∣X∣))−1∣X∣−1d(∣X∣)=(c)(log(∣X∣))−1∣X∣−1∣X∣tr(X−1dX)=(log(∣X∣))−1tr(X−1dX)=(e)tr((log(∣X∣))−1X−1dX)(3)
- (a), (b): d ( l o g ( X ) ) = X − 1 d X \mathrm{d}(\mathrm{log}(\mathbf{X}))=\mathbf{X}^{-1}\mathrm{d}\mathbf{X} d(log(X))=X−1dX。 如(a)中, 令 t = log ( ∣ X ∣ ) t = \log (\left| {\bf{X}} \right|) t=log(∣X∣) 即可。
- c: d ( ∣ X ∣ ) = ∣ X ∣ t r ( X − 1 d X ) \mathrm{d}(\left| {\bf{X}} \right|) = \left| {\bf{X}} \right|{\rm{tr(}}{{\bf{X}}^{ - 1}}{\rm{d}}{\bf{X}}{\rm{)}} d(∣X∣)=∣X∣tr(X−1dX)
- e: 注意 ( log ( ∣ X ∣ ) ) − 1 {{\rm{(}}\log (\left| {\bf{X}} \right|))^{ - 1}} (log(∣X∣))−1是个标量, 而对于标量a有 a t r ( X ) = t r ( a X ) a\mathrm{tr}(\mathbf{X}) = \mathrm{tr}(a\mathbf{X}) atr(X)=tr(aX), 因此可以放入括号。
将(3)式最后结果:
d
log
(
log
(
∣
X
∣
)
)
=
t
r
(
(
log
(
∣
X
∣
)
)
−
1
X
−
1
d
X
)
{\rm{d}}\log (\log (\left| {\bf{X}} \right|)) = {\rm{tr((}}\log (\left| {\bf{X}} \right|){)^{ - 1}}{{\bf{X}}^{ - 1}}{\rm{d}}{\bf{X}}{\rm{)}}
dlog(log(∣X∣))=tr((log(∣X∣))−1X−1dX)
与 (2)比较,有:
∇
f
=
(
log
(
∣
X
∣
)
)
−
1
X
−
1
)
T
\nabla f =(\log (\left| {\bf{X}} \right|){)^{ - 1}}{{\bf{X}}^{ - 1}})^T
∇f=(log(∣X∣))−1X−1)T
求解完毕。
