线性代数之 矩阵求导(2)基本法则与公式
- 前言
- 基本约定
- 标量对向量求导
- 基本法则
- 公式
- 标量对矩阵求导
- 基本法则
- 公式
- 后记
前言
上篇矩阵求导(1)解决了求导时的布局问题,也是矩阵求导最基础的求导方法。现在进入矩阵求导的核心:基本求导法则与基本公式。
基本约定
本篇只涉及标量对向量、矩阵的求导,默认向量是列向量。
标量对向量求导
基本法则
常数求导:
∂
c
0
∂
x
=
0
n
×
1
\frac {\partial c_0}{\partial x}=0^{n\times 1}
∂x∂c0=0n×1
常数求导很简单,在此不证明。
线性变换:
∂
(
c
1
f
(
x
)
+
c
2
g
(
x
)
)
∂
x
=
c
1
∂
f
∂
x
+
c
2
∂
g
∂
x
\frac {\partial (c_1f(x)+c_2g(x))}{\partial x}=c_1\frac {\partial f}{\partial x}+c_2\frac {\partial g}{\partial x}
∂x∂(c1f(x)+c2g(x))=c1∂x∂f+c2∂x∂g
证明:
∂
(
c
1
f
(
x
)
+
c
2
g
(
x
)
)
∂
x
=
[
∂
(
c
1
f
(
x
)
+
c
2
g
(
x
)
)
∂
x
1
∂
(
c
1
f
(
x
)
+
c
2
g
(
x
)
)
∂
x
2
…
∂
(
c
1
f
(
x
)
+
c
2
g
(
x
)
)
∂
x
n
]
=
[
c
1
∂
(
f
(
x
)
)
∂
x
1
c
1
∂
(
f
(
x
)
)
∂
x
2
…
c
1
∂
(
f
(
x
)
)
∂
x
n
]
+
[
c
2
∂
(
g
(
x
)
)
∂
x
1
c
2
∂
(
g
(
x
)
)
∂
x
2
…
c
2
∂
(
g
(
x
)
)
∂
x
n
]
=
c
1
∂
f
∂
x
+
c
2
∂
g
∂
x
\frac {\partial (c_1f(x)+c_2g(x))}{\partial x}= \begin{bmatrix} \frac {\partial (c_1f(x)+c_2g(x))}{\partial x_1}\\ \frac {\partial (c_1f(x)+c_2g(x))}{\partial x_2}\\ \dots \\ \frac {\partial (c_1f(x)+c_2g(x))}{\partial x_n} \end{bmatrix} \\ \quad \\ =\begin{bmatrix} \frac {c_1\partial (f(x))}{\partial x_1}\\ \frac {c_1\partial (f(x))}{\partial x_2}\\ \dots \\ \frac {c_1\partial (f(x))}{\partial x_n} \end{bmatrix} + \begin{bmatrix} \frac {c_2\partial (g(x))}{\partial x_1}\\ \frac {c_2\partial (g(x))}{\partial x_2}\\ \dots \\ \frac {c_2\partial (g(x))}{\partial x_n} \end{bmatrix} \\ \quad \\ = c_1\frac {\partial f}{\partial x}+c_2\frac {\partial g}{\partial x}
∂x∂(c1f(x)+c2g(x))=⎣⎢⎢⎢⎡∂x1∂(c1f(x)+c2g(x))∂x2∂(c1f(x)+c2g(x))…∂xn∂(c1f(x)+c2g(x))⎦⎥⎥⎥⎤=⎣⎢⎢⎢⎡∂x1c1∂(f(x))∂x2c1∂(f(x))…∂xnc1∂(f(x))⎦⎥⎥⎥⎤+⎣⎢⎢⎢⎡∂x1c2∂(g(x))∂x2c2∂(g(x))…∂xnc2∂(g(x))⎦⎥⎥⎥⎤=c1∂x∂f+c2∂x∂g
加减法就不细说了,和普通函数求导是一样的,也很好证。
乘积:
∂
(
f
(
x
)
g
(
x
)
)
∂
x
=
∂
f
(
x
)
∂
x
g
(
x
)
+
f
(
x
)
∂
g
(
x
)
∂
x
\frac {\partial (f(x)g(x))}{\partial x}= \frac {\partial f(x)}{\partial x}g(x)+f(x)\frac {\partial g(x)}{\partial x}
∂x∂(f(x)g(x))=∂x∂f(x)g(x)+f(x)∂x∂g(x)
证明:
∂
f
(
x
)
g
(
x
)
∂
x
=
[
∂
f
g
∂
x
1
∂
f
g
∂
x
2
…
∂
f
g
∂
x
n
]
=
[
∂
f
∂
x
1
g
+
f
∂
g
∂
x
1
∂
f
∂
x
2
g
+
f
∂
g
∂
x
2
…
∂
f
∂
x
n
g
+
f
∂
g
∂
x
n
]
=
[
∂
f
∂
x
1
∂
f
∂
x
2
…
∂
f
∂
x
n
]
g
+
f
[
∂
g
∂
x
1
∂
g
∂
x
2
…
∂
g
∂
x
n
]
=
∂
f
(
x
)
∂
x
g
(
x
)
+
f
(
x
)
∂
g
(
x
)
∂
x
\frac {\partial f(x)g(x)}{\partial x} = \begin{bmatrix} \frac {\partial fg}{\partial x_1} \\ \frac {\partial fg}{\partial x_2} \\ \dots \\ \frac {\partial fg}{\partial x_n} \\ \end{bmatrix}\\ \quad \\ = \begin{bmatrix} \frac {\partial f}{\partial x_1}g+ f\frac {\partial g}{\partial x_1}\\ \frac {\partial f}{\partial x_2}g+ f\frac {\partial g}{\partial x_2}\\ \dots \\ \frac {\partial f}{\partial x_n}g+ f\frac {\partial g}{\partial x_n}\\ \end{bmatrix}\\ \quad \\ = \begin{bmatrix} \frac {\partial f}{\partial x_1} \\ \frac {\partial f}{\partial x_2} \\ \dots \\ \frac {\partial f}{\partial x_n} \\ \end{bmatrix}g + f\begin{bmatrix} \frac {\partial g}{\partial x_1} \\ \frac {\partial g}{\partial x_2} \\ \dots \\ \frac {\partial g}{\partial x_n} \\ \end{bmatrix}\\ \quad \\ = \frac {\partial f(x)}{\partial x}g(x)+f(x)\frac {\partial g(x)}{\partial x}
∂x∂f(x)g(x)=⎣⎢⎢⎢⎡∂x1∂fg∂x2∂fg…∂xn∂fg⎦⎥⎥⎥⎤=⎣⎢⎢⎢⎡∂x1∂fg+f∂x1∂g∂x2∂fg+f∂x2∂g…∂xn∂fg+f∂xn∂g⎦⎥⎥⎥⎤=⎣⎢⎢⎢⎡∂x1∂f∂x2∂f…∂xn∂f⎦⎥⎥⎥⎤g+f⎣⎢⎢⎢⎡∂x1∂g∂x2∂g…∂xn∂g⎦⎥⎥⎥⎤=∂x∂f(x)g(x)+f(x)∂x∂g(x)
除法:
∂
f
(
x
)
g
(
x
)
∂
x
=
∂
f
(
x
)
∂
x
g
(
x
)
−
f
(
x
)
∂
g
(
x
)
∂
x
g
(
x
)
2
\frac {\partial \frac {f(x)}{g(x)}}{\partial x} = \frac {\frac {\partial f(x)}{\partial x}g(x) - f(x)\frac {\partial g(x)}{\partial x}}{g(x)^2}
∂x∂g(x)f(x)=g(x)2∂x∂f(x)g(x)−f(x)∂x∂g(x)
这个证明和乘积的流程是一样的,只是
∂
(
f
g
)
/
∂
x
\partial (fg)/\partial x
∂(fg)/∂x和
∂
(
f
/
g
)
/
∂
x
\partial (f/g)/\partial x
∂(f/g)/∂x 求导形式不一样而已,在此省略。
公式
公式1
∂
a
T
x
∂
x
=
∂
x
T
a
∂
x
=
a
\frac {\partial a^Tx}{\partial x}=\frac {\partial x^Ta}{\partial x}=a
∂x∂aTx=∂x∂xTa=a
证明:
∂
a
T
x
∂
x
=
∂
(
a
1
x
1
+
a
2
x
2
+
⋯
+
a
n
x
n
)
∂
x
=
∂
x
T
a
∂
x
=
[
a
1
a
2
…
a
n
]
=
a
\frac {\partial a^Tx}{\partial x}= \frac {\partial (a_1x_1+a_2x_2+\dots+a_nx_n)}{\partial x} =\frac {\partial x^Ta}{\partial x} \\ \quad \\ =\begin{bmatrix} a_1\\ a_2\\ \dots\\ a_n \end{bmatrix} = a
∂x∂aTx=∂x∂(a1x1+a2x2+⋯+anxn)=∂x∂xTa=⎣⎢⎢⎡a1a2…an⎦⎥⎥⎤=a
公式2
∂
f
(
x
T
x
)
∂
x
=
2
x
∂
f
(
x
T
x
)
∂
x
T
=
2
x
T
\frac {\partial f(x^Tx)}{\partial x}=2x \\ \quad \\ \frac {\partial f(x^Tx)}{\partial x^T}=2x^T \\
∂x∂f(xTx)=2x∂xT∂f(xTx)=2xT
证明:
∂
f
(
x
T
x
)
∂
x
=
∂
(
x
1
2
+
x
2
2
+
⋯
+
x
n
2
)
∂
x
=
[
2
x
1
2
x
2
…
2
x
n
]
=
2
x
\frac {\partial f(x^Tx)}{\partial x}=\frac {\partial (x_1^2+x_2^2+\dots+x_n^2)}{\partial x} \\ = \begin{bmatrix} 2x_1 \\ 2x_2 \\ \dots \\ 2x_n \end{bmatrix} =2x
∂x∂f(xTx)=∂x∂(x12+x22+⋯+xn2)=⎣⎢⎢⎡2x12x2…2xn⎦⎥⎥⎤=2x
公式3
∂
f
(
x
T
A
x
)
∂
x
=
A
x
+
A
T
x
\frac {\partial f(x^TAx)}{\partial x}=Ax+A^Tx
∂x∂f(xTAx)=Ax+ATx
证明:
∂
f
(
x
T
A
x
)
∂
x
=
∂
(
[
a
11
x
1
+
a
21
x
2
+
⋯
+
a
n
1
x
n
a
12
x
1
+
a
22
x
2
+
⋯
+
a
n
2
x
n
…
a
1
n
x
1
+
a
2
n
x
2
+
⋯
+
a
n
n
x
n
]
x
)
/
∂
x
=
∂
(
a
11
x
1
2
+
a
21
x
2
x
1
+
⋯
+
a
n
1
x
n
x
1
+
a
12
x
1
x
2
+
a
22
x
2
2
+
⋯
+
a
n
2
x
n
x
2
+
…
a
1
n
x
1
x
n
+
a
2
n
x
2
x
n
+
⋯
+
a
n
n
x
n
x
n
)
/
∂
x
=
[
a
11
x
1
+
a
12
x
2
+
⋯
+
a
1
n
x
n
+
a
11
x
1
+
a
21
x
2
+
⋯
+
a
n
1
x
n
a
21
x
1
+
a
22
x
2
+
⋯
+
a
2
n
x
n
+
a
12
x
1
+
a
22
x
2
+
⋯
+
a
n
2
x
n
…
a
n
1
x
1
+
a
n
2
x
2
+
⋯
+
a
n
n
x
n
+
a
1
n
x
1
+
a
2
n
x
2
+
⋯
+
a
n
n
x
n
]
=
[
a
11
x
1
+
a
12
x
2
+
⋯
+
a
1
n
x
n
a
21
x
1
+
a
22
x
2
+
⋯
+
a
2
n
x
n
…
a
n
1
x
1
+
a
n
2
x
2
+
⋯
+
a
n
n
x
n
]
+
[
a
11
x
1
+
a
21
x
2
+
⋯
+
a
n
1
x
n
a
12
x
1
+
a
22
x
2
+
⋯
+
a
n
2
x
n
…
a
1
n
x
2
+
a
2
n
x
2
+
⋯
+
a
n
n
x
n
]
=
A
x
+
A
T
x
\frac {\partial f(x^TAx)}{\partial x}=\partial(\begin{bmatrix} a_{11}x_1 + a_{21}x_2+\dots+a_{n1}x_n \\ a_{12}x_1 + a_{22}x_2+\dots+a_{n2}x_n \\ \dots \\ a_{1n}x_1 + a_{2n}x_2+\dots+a_{nn}x_n \\ \end{bmatrix}x)/ \partial x \\ =\partial( a_{11}x_1^2 + a_{21}x_2x_1+\dots+a_{n1}x_nx_1 + \\ a_{12}x_1x_2 + a_{22}x_2^2+\dots+a_{n2}x_nx_2 + \\ \dots \\ a_{1n}x_1x_n + a_{2n}x_2x_n+\dots+a_{nn}x_nx_n )/ \partial x \\ \quad \\ =\begin{bmatrix} a_{11}x_1 + a_{12}x_2+\dots+a_{1n}x_n +a_{11}x_1 +a_{21}x_2+\dots+a_{n1}x_n \\ a_{21}x_1 + a_{22}x_2+\dots+a_{2n}x_n +a_{12}x_1+a_{22}x_2+\dots+a_{n2}x_n \\ \dots \\ a_{n1}x_1 + a_{n2}x_2+\dots+a_{nn}x_n +a_{1n}x_1+a_{2n}x_2+\dots+a_{nn}x_n \\ \end{bmatrix} \\ \quad \\ =\begin{bmatrix} a_{11}x_1 + a_{12}x_2+\dots+a_{1n}x_n \\ a_{21}x_1 + a_{22}x_2+\dots+a_{2n}x_n \\ \dots \\ a_{n1}x_1 + a_{n2}x_2+\dots+a_{nn}x_n \\ \end{bmatrix} + \begin{bmatrix} a_{11}x_1 +a_{21}x_2+\dots+a_{n1}x_n \\ a_{12}x_1+a_{22}x_2+\dots+a_{n2}x_n \\ \dots \\ a_{1n}x_2+a_{2n}x_2+\dots+a_{nn}x_n \\ \end{bmatrix} \\ \quad \\ = Ax+A^Tx
∂x∂f(xTAx)=∂(⎣⎢⎢⎡a11x1+a21x2+⋯+an1xna12x1+a22x2+⋯+an2xn…a1nx1+a2nx2+⋯+annxn⎦⎥⎥⎤x)/∂x=∂(a11x12+a21x2x1+⋯+an1xnx1+a12x1x2+a22x22+⋯+an2xnx2+…a1nx1xn+a2nx2xn+⋯+annxnxn)/∂x=⎣⎢⎢⎡a11x1+a12x2+⋯+a1nxn+a11x1+a21x2+⋯+an1xna21x1+a22x2+⋯+a2nxn+a12x1+a22x2+⋯+an2xn…an1x1+an2x2+⋯+annxn+a1nx1+a2nx2+⋯+annxn⎦⎥⎥⎤=⎣⎢⎢⎡a11x1+a12x2+⋯+a1nxna21x1+a22x2+⋯+a2nxn…an1x1+an2x2+⋯+annxn⎦⎥⎥⎤+⎣⎢⎢⎡a11x1+a21x2+⋯+an1xna12x1+a22x2+⋯+an2xn…a1nx2+a2nx2+⋯+annxn⎦⎥⎥⎤=Ax+ATx
公式4:
∂
(
a
T
x
x
T
b
)
∂
x
=
a
b
T
x
+
b
a
T
x
\frac {\partial (a^Txx^Tb)}{\partial x}=ab^Tx+ba^Tx
∂x∂(aTxxTb)=abTx+baTx
证明:
a
T
x
=
x
T
a
,
x
T
b
=
b
T
x
∂
(
a
T
x
x
T
b
)
∂
x
=
∂
(
x
T
a
b
T
x
)
∂
x
=
a
b
T
x
+
(
a
b
T
)
T
x
=
a
b
T
x
+
b
a
T
x
a^Tx=x^Ta,x^Tb=b^Tx \\ \quad \\ \frac {\partial (a^Txx^Tb)}{\partial x}=\frac {\partial (x^Tab^Tx)}{\partial x}\\ \quad \\ =ab^Tx+(ab^T)^Tx=ab^Tx+ba^Tx
aTx=xTa,xTb=bTx∂x∂(aTxxTb)=∂x∂(xTabTx)=abTx+(abT)Tx=abTx+baTx
标量对矩阵求导
基本法则
常数求导:
∂
c
0
∂
X
=
0
m
×
n
\frac {\partial c_0}{\partial X}=0^{m\times n}
∂X∂c0=0m×n
常数求导很简单,在此不证明。
线性变换:
∂
(
c
1
f
(
X
)
+
c
2
g
(
X
)
)
∂
X
=
c
1
∂
f
(
X
)
∂
X
+
c
2
∂
g
(
X
)
∂
X
\frac {\partial (c_1f(X)+c_2g(X))}{\partial X}=c_1\frac {\partial f(X)}{\partial X}+c_2\frac {\partial g(X)}{\partial X}
∂X∂(c1f(X)+c2g(X))=c1∂X∂f(X)+c2∂X∂g(X)
证明方法与标量的线性变换对向量求导相同。
乘积:
∂
(
f
(
X
)
g
(
X
)
)
∂
X
=
∂
f
(
X
)
∂
X
g
(
X
)
+
f
(
X
)
∂
g
(
X
)
∂
X
\frac {\partial (f(X)g(X))}{\partial X}= \frac {\partial f(X)}{\partial X}g(X)+f(X)\frac {\partial g(X)}{\partial X}
∂X∂(f(X)g(X))=∂X∂f(X)g(X)+f(X)∂X∂g(X)
证明方法与标量的乘积对向量求导相同。
除法:
∂
f
(
X
)
g
(
X
)
∂
X
=
∂
f
(
X
)
∂
X
g
(
X
)
−
f
(
X
)
∂
g
(
X
)
∂
X
g
(
X
)
2
\frac {\partial \frac {f(X)}{g(X)}}{\partial X} = \frac {\frac {\partial f(X)}{\partial X}g(X) - f(X)\frac {\partial g(X)}{\partial X}}{g(X)^2}
∂X∂g(X)f(X)=g(X)2∂X∂f(X)g(X)−f(X)∂X∂g(X)
证明方法与标量除法对向量求导相同。
公式
公式1:
∂
a
T
X
b
∂
X
=
a
b
T
\frac {\partial a^TXb}{\partial X}=ab^T
∂X∂aTXb=abT
证明:
a
T
X
b
=
a
1
b
1
x
11
+
a
2
b
1
x
21
+
⋯
+
a
n
b
1
x
n
1
+
a
1
b
2
x
12
+
a
2
b
2
x
22
+
⋯
+
a
n
b
2
x
n
2
+
…
+
a
1
b
n
x
1
n
+
a
2
b
n
x
2
n
+
⋯
+
a
n
b
n
x
n
n
∂
a
T
X
b
∂
X
=
[
a
1
b
1
a
1
b
2
…
a
1
b
n
a
2
b
1
a
2
b
2
…
a
2
b
n
…
…
…
…
a
n
b
1
a
n
b
2
…
a
n
b
n
]
=
a
b
T
a^TXb=a_1b_1x_{11}+a_2b_1x_{21}+\dots+a_nb_1x_{n1} \\ +a_1b_2x_{12}+a_2b_2x_{22}+\dots+a_nb_2x_{n2}\\ +\dots \\+a_1b_nx_{1n}+a_2b_nx_{2n}+\dots+a_nb_nx_{nn} \\ \quad \\ \frac {\partial a^TXb}{\partial X}=\begin{bmatrix} a_1b_1 & a_1b_2 & \dots & a_1b_n \\ a_2b_1 & a_2b_2 & \dots & a_2b_n \\ \dots & \dots & \dots & \dots \\ a_nb_1 & a_nb_2 & \dots & a_nb_n \end{bmatrix} =ab^T
aTXb=a1b1x11+a2b1x21+⋯+anb1xn1+a1b2x12+a2b2x22+⋯+anb2xn2+…+a1bnx1n+a2bnx2n+⋯+anbnxnn∂X∂aTXb=⎣⎢⎢⎡a1b1a2b1…anb1a1b2a2b2…anb2…………a1bna2bn…anbn⎦⎥⎥⎤=abT
公式2:
∂
a
T
X
T
b
∂
X
=
b
a
T
\frac {\partial a^TX^Tb}{\partial X}=ba^T
∂X∂aTXTb=baT
证明:
a
T
X
T
b
=
a
1
b
1
x
11
+
a
2
b
1
x
12
+
⋯
+
a
n
b
1
x
1
n
+
a
1
b
2
x
21
+
a
2
b
2
x
22
+
⋯
+
a
n
b
2
x
2
n
+
…
+
a
1
b
n
x
n
1
+
a
2
b
n
x
n
2
+
⋯
+
a
n
b
n
x
n
n
∂
a
T
X
T
b
∂
X
=
[
a
1
b
1
a
2
b
1
…
a
n
b
1
a
1
b
2
a
2
b
2
…
a
n
b
2
…
…
…
…
a
1
b
n
a
2
b
n
…
a
n
b
n
]
=
b
a
T
a^TX^Tb=a_1b_1x_{11}+a_2b_1x_{12}+\dots+a_nb_1x_{1n} \\ +a_1b_2x_{21}+a_2b_2x_{22}+\dots+a_nb_2x_{2n}\\ +\dots \\+a_1b_nx_{n1}+a_2b_nx_{n2}+\dots+a_nb_nx_{nn} \\ \quad \\ \frac {\partial a^TX^Tb}{\partial X}=\begin{bmatrix} a_1b_1 & a_2b_1 & \dots & a_nb_1 \\ a_1b_2 & a_2b_2 & \dots & a_nb_2 \\ \dots & \dots & \dots & \dots \\ a_1b_n & a_2b_n & \dots & a_nb_n \end{bmatrix} =ba^T
aTXTb=a1b1x11+a2b1x12+⋯+anb1x1n+a1b2x21+a2b2x22+⋯+anb2x2n+…+a1bnxn1+a2bnxn2+⋯+anbnxnn∂X∂aTXTb=⎣⎢⎢⎡a1b1a1b2…a1bna2b1a2b2…a2bn…………anb1anb2…anbn⎦⎥⎥⎤=baT
公式3:
∂
a
T
X
X
T
b
∂
X
=
a
b
T
X
+
b
a
T
X
\frac {\partial a^TXX^Tb}{\partial X}=ab^TX+ba^TX
∂X∂aTXXTb=abTX+baTX
这个证明与之前的标量对向量求导公式3过程类似,但是展开
a
T
X
X
T
b
a^TXX^Tb
aTXXTb非常麻烦,在此省略。
后记
本篇写起来太蛮烦了,证明部分的katex写起来简直折磨。下一篇将记录矩阵的迹的性质。
