- 引言
- 关于学习问题的介绍
- 场景介绍
- 概念的数学符号表示
- 模型参数的数学符号表示
- HMM模型的特殊性质
- 模型参数 λ \lambda λ求解
- EM算法求解模型参数
- E步操作
- M步操作
前面两节分别介绍了使用前向算法(Forward Algorithm)和后向算法(Backward Alogrithm)对求值问题(Evaluation)进行求解。本节将介绍学习问题(Learning)并使用EM算法进行求解。
关于学习问题的介绍学习问题(Learning)我们并不陌生,其本质上是 给定数据集合 X \mathcal X X,通过 X \mathcal X X求解模型的参数信息 θ \theta θ: 之前介绍的模型中:
- 极大似然估计(Maximum Likelihood Estimate,MLE)。如:线性回归(Linear Regression)、一些线性分类(Linear Classification)算法如线性判别分析(Linear Discriminant Analysis,LDA)、逻辑回归(Logistic Regression)等等。 θ ^ M L E = arg max θ P ( X ∣ θ ) \hat {\theta}_{MLE} = \mathop{\arg\max}\limits_{\theta} P(\mathcal X \mid \theta) θ^MLE=θargmaxP(X∣θ)
- 狭义EM算法(Expectation Maximization,EM)。暂时仅介绍了高斯混合模型(Gaussain Mixture Model,GMM)一种。 θ ( t + 1 ) = arg max θ ∫ Z P ( X , Z ∣ θ ) ⋅ P ( Z ∣ X , θ ( t ) ) d Z \theta^{(t+1)} = \mathop{\arg\max}\limits_{\theta} \int_{\mathcal Z} P(\mathcal X,\mathcal Z \mid \theta)\cdot P(\mathcal Z \mid \mathcal X,\theta^{(t)}) d\mathcal Z θ(t+1)=θargmax∫ZP(X,Z∣θ)⋅P(Z∣X,θ(t))dZ
本节介绍的隐马尔可夫模型同样可以采用狭义EM算法进行求解。
场景介绍 概念的数学符号表示针对隐马尔可夫模型,对场景与相关数学符号进行介绍:
- 隐马尔可夫模型中的变量由状态序列 I \mathcal I I与观测序列 O \mathcal O O构成。假设观测序列长度为 T T T,状态序列与观测序列分别表示如下: I = { i 1 , i 2 , ⋯ , i T } O = { o 1 , o 2 , ⋯ , o T } \mathcal I = \{i_1,i_2,\cdots,i_T\} \\ \mathcal O = \{o_1,o_2,\cdots,o_T\} I={i1,i2,⋯,iT}O={o1,o2,⋯,oT}
- 隐马尔可夫模型中状态序列中的 任意状态变量 i t ∈ I i_t \in \mathcal I it∈I的取值结果均是离散的。即: ∀ i t ∈ I , i t ∈ Q = { q 1 , q 2 , ⋯ , q K } \forall i_t \in \mathcal I, \quad i_t \in \mathcal Q = \{q_1,q_2,\cdots,q_{\mathcal K}\} ∀it∈I,it∈Q={q1,q2,⋯,qK} 我们同样定义观测序列中任意观测变量 o t ∈ O o_t \in \mathcal O ot∈O的取值结果是离散的。即: ∀ o t ∈ O , o t ∈ V = { v 1 , v 2 , ⋯ , v M } \forall o_t \in \mathcal O,\quad o_t \in \mathcal V = \{v_1,v_2,\cdots,v_{\mathcal M}\} ∀ot∈O,ot∈V={v1,v2,⋯,vM} 我们称 Q \mathcal Q Q为状态值集合,称 V \mathcal V V为观测值集合。
隐马尔可夫模型的参数记作 λ \lambda λ,模型参数共包含三个部分:
- 初始概率分布 π \pi π: 即 状态变量 i 1 i_1 i1 选择任意状态值 q i ( i ∈ 1 , 2 , ⋯ , K ) q_i (i \in {1,2,\cdots,\mathcal K}) qi(i∈1,2,⋯,K)的概率结果组成的向量形式。即: π = [ P ( i 1 = q 1 ) P ( i 1 = q 2 ) ⋮ P ( i 1 = q K ) ] K × 1 ∑ k = 1 K P ( i 1 = q k ) = 1 \pi = \begin{bmatrix}P(i_1 = q_1) \\ P(i_1 = q_2) \\ \vdots \\ P(i_1 = q_{\mathcal K})\end{bmatrix}_{\mathcal K \times 1} \quad \sum_{k=1}^{\mathcal K} P(i_1 = q_{k}) = 1 π=⎣ ⎡P(i1=q1)P(i1=q2)⋮P(i1=qK)⎦ ⎤K×1k=1∑KP(i1=qk)=1
- 状态转移矩阵 A \mathcal A A:矩阵 A \mathcal A A中的任意一个元素 a i j a_{ij} aij表示当前时刻的状态变量 i t = q i i_t =q_i it=qi的条件下,下一时刻的状态变量 i t + 1 = q j i_{t+1} = q_j it+1=qj的后验概率结果。即: A = [ a i j ] K × K , a i j = P ( i t + 1 = q j ∣ i t = q i ) \mathcal A = [a_{ij}]_{\mathcal K \times \mathcal K},\quad a_{ij} = P(i_{t+1}=q_j \mid i_t = q_i) A=[aij]K×K,aij=P(it+1=qj∣it=qi)
- 发射矩阵 B \mathcal B B:矩阵中的任意一个元素 b j ( k ) b_{j}(k) bj(k)表示当前时刻的状态变量 i t = q i i_t =q_i it=qi的条件下,对应时刻的观测变量 o t = v k o_t = v_k ot=vk的后验概率结果。即: B = [ b j ( k ) ] K × M , b j ( k ) = P ( o t = v k ∣ i t = q j ) \mathcal B = [b_j(k)]_{\mathcal K \times \mathcal M}, \quad b_j(k) = P(o_t = v_k \mid i_t = q_j) B=[bj(k)]K×M,bj(k)=P(ot=vk∣it=qj)
- 齐次马尔可夫假设:任一时刻状态变量 i t i_t it的后验概率,只和其前一时刻的状态变量 i t − 1 i_{t-1} it−1相关,与其他变量无关。即: P ( i t ∣ i t − 1 , ⋯ , i 1 , o t − 1 , ⋯ , o 1 ) = P ( i t ∣ i t − 1 ) P(i_t \mid i_{t-1},\cdots,i_{1},o_{t-1},\cdots,o_1) = P(i_t \mid i_{t-1}) P(it∣it−1,⋯,i1,ot−1,⋯,o1)=P(it∣it−1)
- 观测独立性假设:任一时刻观测变量 o t o_t ot的后验概率,只和对应时刻的状态变量 i t i_t it相关,与其他变量无关。 P ( o t ∣ i t , i t − 1 , ⋯ , i 1 , o t − 1 , ⋯ , o 1 ) = P ( o t ∣ i t ) P(o_t \mid i_t,i_{t-1},\cdots,i_1,o_{t-1},\cdots,o_1) = P(o_t \mid i_t) P(ot∣it,it−1,⋯,i1,ot−1,⋯,o1)=P(ot∣it)
我们在前向算法(Forward Algorithm)中介绍过,如果直接求解 P ( O ∣ λ ) P(\mathcal O \mid \lambda) P(O∣λ),它的表达式结果如下: P ( O ∣ λ ) = ∑ I P ( O , I ∣ λ ) = ∑ i 1 ⋯ ∑ i T [ π ⋅ ∏ t = 2 T a i t , i t + 1 ⋅ ∏ t = 1 T b i t ( o t ) ] \begin{aligned} P(\mathcal O \mid \lambda) & = \sum_{\mathcal I}P(\mathcal O,\mathcal I \mid \lambda) \\ & = \sum_{i_1}\cdots\sum_{i_T} \left[\pi \cdot \prod_{t=2}^{T} a_{i_{t},i_{t+1}} \cdot \prod_{t=1}^{T} b_{i_t}(o_t)\right] \end{aligned} P(O∣λ)=I∑P(O,I∣λ)=i1∑⋯iT∑[π⋅t=2∏Tait,it+1⋅t=1∏Tbit(ot)] 如果对 P ( O ∣ λ ) P(\mathcal O \mid \lambda) P(O∣λ)直接使用极大似然估计,即: λ ^ M L E = arg max λ P ( O ∣ λ ) \hat {\lambda}_{MLE} = \mathop{\arg\max}\limits_{\lambda} P(\mathcal O \mid \lambda) λ^MLE=λargmaxP(O∣λ) 直接求解的方式存在许多问题:
- 首先,观测序列 O \mathcal O O中的个观测变量 o 1 , ⋯ , o T o_1, \cdots,o_T o1,⋯,oT之间不是独立同分布,不能使用 arg max λ ∏ i = 1 T P ( o i ∣ λ ) \mathop{\arg\max}\limits_{\lambda}\prod_{i=1}^{T} P(o_i \mid \lambda) λargmax∏i=1TP(oi∣λ)进行求解;
- 并且直接求解 P ( O ∣ λ ) P(\mathcal O \mid \lambda) P(O∣λ)的时间复杂度是 O ( K 2 × T ) O(\mathcal K^2 \times T) O(K2×T),因此求解过程十分复杂。本节将介绍使用EM算法通过迭代的方式求解模型参数 λ \lambda λ。
EM算法公式表示如下: θ ( t + 1 ) = arg max θ ∫ Z P ( X , Z ∣ θ ) ⋅ P ( Z ∣ X , θ ( t ) ) d Z \theta^{(t+1)} = \mathop{\arg\max}\limits_{\theta} \int_{\mathcal Z} P(\mathcal X,\mathcal Z \mid \theta)\cdot P(\mathcal Z \mid \mathcal X,\theta^{(t)}) d\mathcal Z θ(t+1)=θargmax∫ZP(X,Z∣θ)⋅P(Z∣X,θ(t))dZ 将EM算法中出现的变量与隐马尔可夫模型中出现的概念进行映射:
- X \mathcal X X:观测数据(Observed Data) → \to → 观测序列 O \mathcal O O;
- Z \mathcal Z Z:隐变量(Latent Data) → \to → 状态序列 I \mathcal I I;
- θ \theta θ:参数(Parameter) → \to → 模型参数 λ \lambda λ;
经过映射后,基于隐马尔可夫模型的EM算法表示如下: '状态序列'
I
\mathcal I
I是离散的,因此积分过程中改成
∑
\sum
∑;
λ
(
t
+
1
)
=
arg
max
λ
∑
I
[
log
P
(
O
,
∣
λ
)
⋅
P
(
I
∣
O
,
λ
(
t
)
)
]
\lambda^{(t+1)} = \mathop{\arg\max}\limits_{\lambda} \sum_{\mathcal I} \left[\log P(\mathcal O,\mathcal \mid \lambda) \cdot P(\mathcal I \mid \mathcal O, \lambda^{(t)})\right]
λ(t+1)=λargmaxI∑[logP(O,∣λ)⋅P(I∣O,λ(t))] 为了简化运算,对上述公式进行变形: 条件概率公式~
λ
(
t
+
1
)
=
arg
max
λ
∑
I
[
log
P
(
O
,
∣
λ
)
⋅
P
(
I
,
O
∣
λ
(
t
)
)
P
(
O
,
λ
(
t
)
)
]
\lambda^{(t+1)} = \mathop{\arg\max}\limits_{\lambda} \sum_{\mathcal I} \left[\log P(\mathcal O,\mathcal \mid \lambda) \cdot \frac{P(\mathcal I,\mathcal O \mid \lambda^{(t)})}{P(\mathcal O,\lambda^{(t)})}\right]
λ(t+1)=λargmaxI∑[logP(O,∣λ)⋅P(O,λ(t))P(I,O∣λ(t))]
观察后项的分母部分 P ( O , λ ( t ) ) P(\mathcal O ,\lambda^{(t)}) P(O,λ(t)):
- λ ( t ) \lambda^{(t)} λ(t)是上一次迭代步骤的参数结果,是已知项;
- 观测序列 O \mathcal O O(就是样本数据),它和 λ \lambda λ的取值无关;
至此,EM算法可化简为如下形式: 需要注意的点:化简后的结果已经不是期望形式了~
λ
(
t
+
1
)
=
arg
max
λ
∑
I
[
log
P
(
I
,
O
∣
λ
)
⋅
P
(
I
,
O
∣
λ
(
t
)
)
]
\begin{aligned} \lambda^{(t+1)} & = \mathop{\arg\max}\limits_{\lambda} \sum_{\mathcal I} \left[\log P(\mathcal I,\mathcal O\mid \lambda) \cdot P(\mathcal I,\mathcal O \mid \lambda^{(t)})\right] \\ \end{aligned}
λ(t+1)=λargmaxI∑[logP(I,O∣λ)⋅P(I,O∣λ(t))] 并且
λ
(
t
)
\lambda^{(t)}
λ(t)以及迭代求解后的
λ
(
t
+
1
)
\lambda^{(t+1)}
λ(t+1)本身不是一个参数,而是由三个参数组成:
λ
(
t
)
=
(
π
(
t
)
,
A
(
t
)
,
B
(
t
)
)
;
λ
(
t
+
1
)
=
(
π
(
t
+
1
)
,
A
(
t
+
1
)
,
B
(
t
+
1
)
)
\lambda^{(t)} = (\pi^{(t)}, \mathcal A^{(t)},\mathcal B^{(t)}); \quad \lambda^{(t+1)} = (\pi^{(t+1)},\mathcal A^{(t+1)},\mathcal B^{(t+1)})
λ(t)=(π(t),A(t),B(t));λ(t+1)=(π(t+1),A(t+1),B(t+1)) 将EM算法的公式部分表示为关于
λ
,
λ
(
t
)
\lambda,\lambda^{(t)}
λ,λ(t)的函数:
Q
(
λ
,
λ
(
t
)
)
=
∑
I
[
log
P
(
I
,
O
∣
λ
)
⋅
P
(
I
,
O
∣
λ
(
t
)
)
]
\mathcal Q(\lambda,\lambda^{(t)}) = \sum_{\mathcal I} \left[\log P(\mathcal I,\mathcal O\mid \lambda) \cdot P(\mathcal I,\mathcal O \mid \lambda^{(t)})\right]
Q(λ,λ(t))=I∑[logP(I,O∣λ)⋅P(I,O∣λ(t))] 将直接求解得到的
P
(
O
,
I
∣
λ
)
=
π
⋅
∏
t
=
2
T
a
i
t
,
i
t
+
1
⋅
∏
t
=
1
T
b
i
t
(
o
t
)
P(\mathcal O,\mathcal I \mid \lambda) = \pi \cdot \prod_{t=2}^{T} a_{i_{t},i_{t+1}} \cdot \prod_{t=1}^{T} b_{i_t}(o_t)
P(O,I∣λ)=π⋅∏t=2Tait,it+1⋅∏t=1Tbit(ot)带入
Q
(
λ
,
λ
(
t
)
)
\mathcal Q(\lambda,\lambda^{(t)})
Q(λ,λ(t))中:
Q
(
λ
,
λ
(
t
)
)
=
∑
I
[
log
(
π
⋅
∏
t
=
2
T
a
i
t
,
i
t
+
1
⋅
∏
t
=
1
T
b
i
t
(
o
t
)
)
⋅
P
(
I
,
O
∣
λ
(
t
)
)
]
=
∑
I
[
(
log
π
+
log
∏
t
=
2
T
a
i
t
,
i
t
+
1
+
log
∏
t
=
1
T
b
i
t
(
o
t
)
)
⋅
P
(
I
,
O
∣
λ
(
t
)
)
]
=
∑
I
[
(
log
π
+
∑
t
=
2
T
log
a
i
t
,
i
t
+
1
+
∑
t
=
1
T
log
b
i
t
(
o
t
)
)
⋅
P
(
I
,
O
∣
λ
(
t
)
)
]
\begin{aligned} \mathcal Q(\lambda,\lambda^{(t)}) & = \sum_{\mathcal I} \left[\log \left(\pi \cdot \prod_{t=2}^{T} a_{i_{t},i_{t+1}} \cdot \prod_{t=1}^{T} b_{i_t}(o_t)\right) \cdot P(\mathcal I,\mathcal O \mid \lambda^{(t)})\right] \\ & = \sum_{\mathcal I}\left[\left( \log \pi + \log \prod_{t=2}^T a_{i_{t},i_{t+1}} + \log \prod_{t=1}^T b_{i_t}(o_t) \right) \cdot P(\mathcal I,\mathcal O \mid \lambda^{(t)})\right] \\ & = \sum_{\mathcal I} \left[\left(\log \pi + \sum_{t=2}^T \log a_{i_{t},i_{t+1}} + \sum_{t=1}^T \log b_{i_t}(o_t) \right) \cdot P(\mathcal I,\mathcal O \mid \lambda^{(t)})\right] \end{aligned}
Q(λ,λ(t))=I∑[log(π⋅t=2∏Tait,it+1⋅t=1∏Tbit(ot))⋅P(I,O∣λ(t))]=I∑[(logπ+logt=2∏Tait,it+1+logt=1∏Tbit(ot))⋅P(I,O∣λ(t))]=I∑[(logπ+t=2∑Tlogait,it+1+t=1∑Tlogbit(ot))⋅P(I,O∣λ(t))] 这里以求解
π
(
t
+
1
)
\pi^{(t+1)}
π(t+1)为例,进行求解。 观察上式中的小括号部分,只有
log
π
\log \pi
logπ和
π
\pi
π有关,而剩余两项均和
π
\pi
π无关联,看做常数。因此,针对求解
π
(
t
+
1
)
\pi^{(t+1)}
π(t+1)的
Q
(
λ
,
λ
(
t
)
)
\mathcal Q(\lambda,\lambda^{(t)})
Q(λ,λ(t))表达如下:
π
(
t
+
1
)
=
arg
max
π
Q
(
λ
,
λ
(
t
)
)
=
arg
max
π
∑
I
[
log
π
⋅
P
(
O
,
I
∣
λ
(
t
)
)
]
=
arg
max
π
∑
i
1
⋯
∑
i
T
[
log
π
⋅
P
(
O
,
i
1
,
⋯
,
i
T
∣
λ
(
t
)
)
]
\begin{aligned} \pi^{(t+1)} & = \mathop{\arg\max}\limits_{\pi} \mathcal Q(\lambda,\lambda^{(t)}) \\ & = \mathop{\arg\max}\limits_{\pi} \sum_{\mathcal I} [\log \pi \cdot P(\mathcal O,\mathcal I \mid \lambda^{(t)})] \\ & = \mathop{\arg\max}\limits_{\pi} \sum_{i_1} \cdots \sum_{i_T}[\log \pi \cdot P(\mathcal O,i_1,\cdots,i_T \mid \lambda^{(t)})] \end{aligned}
π(t+1)=πargmaxQ(λ,λ(t))=πargmaxI∑[logπ⋅P(O,I∣λ(t))]=πargmaxi1∑⋯iT∑[logπ⋅P(O,i1,⋯,iT∣λ(t))] 观察上述展开结果,
π
\pi
π根据定义,状态序列
I
\mathcal I
I 的初始概率分布
π
\pi
π只和 第一个状态变量
i
1
i_1
i1相关,和其他变量无关。因此,将上式继续化简成如下形式:
i
1
i_1
i1的取值范围是离散的,是状态值集合
Q
\mathcal Q
Q中的一个元素。因此将
∑
i
1
\sum_{i_1}
∑i1改为
∑
k
=
1
K
\sum_{k=1}^{\mathcal K}
∑k=1K。
π
(
t
+
1
)
=
arg
max
π
∑
i
1
[
log
π
⋅
P
(
O
,
i
1
∣
λ
(
t
)
)
]
=
arg
max
π
∑
k
=
1
K
[
log
P
(
i
1
=
q
k
)
⋅
P
(
O
,
i
1
=
q
k
∣
λ
(
t
)
)
]
\begin{aligned} \pi^{(t+1)} & = \mathop{\arg\max}\limits_{\pi} \sum_{i_1} [\log \pi \cdot P(\mathcal O,i_1 \mid \lambda^{(t)})] \\ & = \mathop{\arg\max}\limits_{\pi} \sum_{k=1}^{\mathcal K}[\log P(i_1 = q_k) \cdot P(\mathcal O,i_1=q_k \mid \lambda^{(t)})] \end{aligned}
π(t+1)=πargmaxi1∑[logπ⋅P(O,i1∣λ(t))]=πargmaxk=1∑K[logP(i1=qk)⋅P(O,i1=qk∣λ(t))] 需要注意的是,将
π
\pi
π写成概率组成向量的形式,是存在约束条件的。即:
∑
k
=
1
K
P
(
i
1
=
q
k
)
=
1
\sum_{k=1}^{\mathcal K} P(i_1 = q_k) = 1
k=1∑KP(i1=qk)=1 至此,将
π
(
t
+
1
)
\pi^{(t+1)}
π(t+1)的求问题转化为 带一个约束的优化问题:
{
arg
max
π
∑
k
=
1
K
[
log
P
(
i
1
=
q
k
)
⋅
P
(
O
,
i
1
=
q
k
∣
λ
(
t
)
)
]
s
.
t
.
∑
k
=
1
K
P
(
i
1
=
q
k
)
=
1
\begin{cases} \mathop{\arg\max}\limits_{\pi} \sum_{k=1}^{\mathcal K}[\log P(i_1 = q_k) \cdot P(\mathcal O,i_1=q_k \mid \lambda^{(t)})] \\ s.t. \quad \sum_{k=1}^{\mathcal K} P(i_1 = q_k) = 1 \end{cases}
⎩
⎨
⎧πargmax∑k=1K[logP(i1=qk)⋅P(O,i1=qk∣λ(t))]s.t.∑k=1KP(i1=qk)=1
使用拉格朗日乘数法求解该问题: 定义拉格朗日函数
L
(
π
,
η
)
\mathcal L(\pi,\eta)
L(π,η)表示如下:
∑
k
=
1
K
P
(
i
1
=
q
k
)
\sum_{k=1}^{\mathcal K} P(i_1 = q_k)
∑k=1KP(i1=qk)和
1
1
1之间怎么去减,需要视情况而定。
L
(
π
,
η
)
=
∑
k
=
1
K
[
log
P
(
i
1
=
q
k
)
⋅
P
(
O
,
i
1
=
q
k
∣
λ
(
t
)
)
]
+
η
(
∑
k
=
1
K
P
(
i
1
=
q
k
)
−
1
)
\mathcal L(\pi,\eta) = \sum_{k=1}^{\mathcal K} [\log P(i_1 = q_k) \cdot P(\mathcal O,i_1=q_k \mid \lambda^{(t)})] + \eta \left(\sum_{k=1}^{\mathcal K} P(i_1 = q_k) - 1\right)
L(π,η)=k=1∑K[logP(i1=qk)⋅P(O,i1=qk∣λ(t))]+η(k=1∑KP(i1=qk)−1) 令
π
k
=
P
(
i
1
=
q
k
)
\pi_k = P(i_1 = q_k)
πk=P(i1=qk),即
π
\pi
π向量中的其中一个分量。令
L
(
π
,
η
)
\mathcal L(\pi,\eta)
L(π,η)对
π
k
\pi_k
πk求偏导:
∂
L
∂
π
k
=
1
π
k
P
(
O
,
i
1
=
q
k
∣
λ
(
t
)
)
+
η
(
1
−
0
)
\frac{\partial \mathcal L}{\partial \pi_k} = \frac{1}{\pi_k} P(\mathcal O,i_1 = q_k \mid \lambda^{(t)}) + \eta(1 - 0)
∂πk∂L=πk1P(O,i1=qk∣λ(t))+η(1−0) 令
∂
L
∂
π
k
≜
0
\frac{\partial \mathcal L}{\partial \pi_k} \triangleq 0
∂πk∂L≜0,有:
P
(
O
,
i
1
=
q
k
∣
λ
(
t
)
)
+
π
k
⋅
η
=
0
P(\mathcal O,i_1 = q_k\mid \lambda^{(t)}) + \pi_k \cdot\eta = 0
P(O,i1=qk∣λ(t))+πk⋅η=0 基于上式,则有:
∑
k
=
1
K
[
P
(
O
,
i
1
=
q
k
∣
λ
(
t
)
)
+
π
k
⋅
η
]
=
0
\sum_{k=1}^{\mathcal K} \left[P(\mathcal O,i_1 = q_k\mid \lambda^{(t)}) + \pi_k \cdot \eta \right] = 0
k=1∑K[P(O,i1=qk∣λ(t))+πk⋅η]=0 该式可从两种角度解释:
- 从积分角度解释:上式左右两端均对 i 1 i_1 i1求积分:常数0的积分依然是常数。
- 从偏导角度解释:对 i 1 i_1 i1可选择的每一种可能对应的概率求偏导,并令其均为0。则有: ∑ k = 1 K [ P ( O , i 1 = q k ∣ λ ( t ) ) + π k ⋅ η ] = 0 + 0 + ⋯ + 0 = 0 \sum_{k=1}^{\mathcal K} \left[P(\mathcal O,i_1 = q_k \mid \lambda^{(t)}) + \pi_k \cdot \eta \right] = 0+ 0 + \cdots + 0 = 0 k=1∑K[P(O,i1=qk∣λ(t))+πk⋅η]=0+0+⋯+0=0
将
∑
k
=
1
K
[
P
(
O
,
i
1
=
q
k
∣
λ
(
t
)
)
+
π
k
⋅
η
]
=
0
\sum_{k=1}^{\mathcal K} \left[P(\mathcal O,i_1 = q_k\mid \lambda^{(t)}) + \pi_k \cdot \eta \right] = 0
∑k=1K[P(O,i1=qk∣λ(t))+πk⋅η]=0展开: 边缘概率分布~
约束条件~
- ∑ k = 1 K P ( O , i 1 = q k ∣ λ ( t ) ) = ∑ i 1 P ( O , i 1 = q k ∣ λ ( t ) ) = P ( O ∣ λ ( t ) ) \sum_{k=1}^{\mathcal K} P(\mathcal O,i_1 = q_k\mid \lambda^{(t)}) = \sum_{i_1}P(\mathcal O,i_1 = q_k\mid \lambda^{(t)}) = P(\mathcal O \mid \lambda^{(t)}) ∑k=1KP(O,i1=qk∣λ(t))=∑i1P(O,i1=qk∣λ(t))=P(O∣λ(t))
- ∑ k = 1 K π k ⋅ η = η ⋅ ( ∑ k = 1 K π k ) = η ⋅ 1 = η \sum_{k=1}^{\mathcal K} \pi_{k} \cdot \eta = \eta \cdot \left(\sum_{k=1}^{\mathcal K} \pi_k\right) = \eta \cdot 1 = \eta ∑k=1Kπk⋅η=η⋅(∑k=1Kπk)=η⋅1=η
最终结果有: P ( O ∣ λ ( t ) ) + η = 0 → η = − P ( O ∣ λ ( t ) ) P(\mathcal O \mid \lambda^{(t)}) + \eta = 0 \\ \to \eta =-P(\mathcal O \mid \lambda^{(t)}) P(O∣λ(t))+η=0→η=−P(O∣λ(t))
将 η = − P ( O ∣ λ ( t ) ) \eta =-P(\mathcal O \mid \lambda^{(t)}) η=−P(O∣λ(t))带回 P ( O , i 1 = q k ∣ λ ( t ) ) + π k ⋅ η = 0 P(\mathcal O,i_1 = q_k\mid \lambda^{(t)}) + \pi_k \cdot\eta = 0 P(O,i1=qk∣λ(t))+πk⋅η=0中,求得 p i k pi_k pik的最终结果为: π k ( t + 1 ) = P ^ ( i 1 = q k ) = P ( O , i 1 = q k ∣ λ ( t ) ) P ( O ∣ λ ( t ) ) \pi_k^{(t+1)} = \hat P(i_1 = q_k)=\frac{P(\mathcal O,i_1 = q_k \mid \lambda^{(t)})}{P(\mathcal O \mid \lambda^{(t)})} πk(t+1)=P^(i1=qk)=P(O∣λ(t))P(O,i1=qk∣λ(t))
同理,可以求解出其他分量的迭代结果,从而得到迭代后的初始概率分布 π ( t + 1 ) \pi^{(t+1)} π(t+1): π ( t + 1 ) = ( π 1 ( t + 1 ) , π 2 ( t + 1 ) , ⋯ , π K ( t + 1 ) ) T \pi^{(t+1)} = (\pi_1^{(t+1)},\pi_2^{(t+1)},\cdots,\pi_{\mathcal K}^{(t+1)})^{T} π(t+1)=(π1(t+1),π2(t+1),⋯,πK(t+1))T 此时,就已经将 π ( t + 1 ) \pi^{(t+1)} π(t+1)和 λ ( t ) \lambda^{(t)} λ(t)的迭代方式找出,同理 A ( t + 1 ) , B ( t + 1 ) \mathcal A^{(t+1)},\mathcal B^{(t+1)} A(t+1),B(t+1)也使用相似方式。
下一节将介绍解码问题(Decoding)。 相关参考: 机器学习-隐马尔可夫模型5-Learning问题-Baum Welch算法(EM)