您当前的位置: 首页 >  算法

静静的喝酒

暂无认证

  • 4浏览

    0关注

    99博文

    0收益

  • 0浏览

    0点赞

    0打赏

    0留言

私信
关注
热门博文

机器学习笔记之隐马尔可夫模型(五)学习问题——EM算法

静静的喝酒 发布时间:2022-09-13 13:43:30 ,浏览量:4

机器学习笔记之隐马尔可夫模型——EM算法处理学习问题
  • 引言
    • 关于学习问题的介绍
    • 场景介绍
      • 概念的数学符号表示
      • 模型参数的数学符号表示
      • HMM模型的特殊性质
    • 模型参数 λ \lambda λ求解
      • EM算法求解模型参数
        • E步操作
        • M步操作

引言

前面两节分别介绍了使用前向算法(Forward Algorithm)和后向算法(Backward Alogrithm)对求值问题(Evaluation)进行求解。本节将介绍学习问题(Learning)并使用EM算法进行求解。

关于学习问题的介绍

学习问题(Learning)我们并不陌生,其本质上是 给定数据集合 X \mathcal X X,通过 X \mathcal X X求解模型的参数信息 θ \theta θ: 之前介绍的模型中:

  • 极大似然估计(Maximum Likelihood Estimate,MLE)。如:线性回归(Linear Regression)、一些线性分类(Linear Classification)算法如线性判别分析(Linear Discriminant Analysis,LDA)、逻辑回归(Logistic Regression)等等。 θ ^ M L E = arg ⁡ max ⁡ θ P ( X ∣ θ ) \hat {\theta}_{MLE} = \mathop{\arg\max}\limits_{\theta} P(\mathcal X \mid \theta) θ^MLE​=θargmax​P(X∣θ)
  • 狭义EM算法(Expectation Maximization,EM)。暂时仅介绍了高斯混合模型(Gaussain Mixture Model,GMM)一种。 θ ( t + 1 ) = arg ⁡ max ⁡ θ ∫ Z P ( X , Z ∣ θ ) ⋅ P ( Z ∣ X , θ ( t ) ) d Z \theta^{(t+1)} = \mathop{\arg\max}\limits_{\theta} \int_{\mathcal Z} P(\mathcal X,\mathcal Z \mid \theta)\cdot P(\mathcal Z \mid \mathcal X,\theta^{(t)}) d\mathcal Z θ(t+1)=θargmax​∫Z​P(X,Z∣θ)⋅P(Z∣X,θ(t))dZ

本节介绍的隐马尔可夫模型同样可以采用狭义EM算法进行求解。

场景介绍 概念的数学符号表示

针对隐马尔可夫模型,对场景与相关数学符号进行介绍:

  • 隐马尔可夫模型中的变量由状态序列 I \mathcal I I与观测序列 O \mathcal O O构成。假设观测序列长度为 T T T,状态序列与观测序列分别表示如下: I = { i 1 , i 2 , ⋯   , i T } O = { o 1 , o 2 , ⋯   , o T } \mathcal I = \{i_1,i_2,\cdots,i_T\} \\ \mathcal O = \{o_1,o_2,\cdots,o_T\} I={i1​,i2​,⋯,iT​}O={o1​,o2​,⋯,oT​}
  • 隐马尔可夫模型中状态序列中的 任意状态变量 i t ∈ I i_t \in \mathcal I it​∈I的取值结果均是离散的。即: ∀ i t ∈ I , i t ∈ Q = { q 1 , q 2 , ⋯   , q K } \forall i_t \in \mathcal I, \quad i_t \in \mathcal Q = \{q_1,q_2,\cdots,q_{\mathcal K}\} ∀it​∈I,it​∈Q={q1​,q2​,⋯,qK​} 我们同样定义观测序列中任意观测变量 o t ∈ O o_t \in \mathcal O ot​∈O的取值结果是离散的。即: ∀ o t ∈ O , o t ∈ V = { v 1 , v 2 , ⋯   , v M } \forall o_t \in \mathcal O,\quad o_t \in \mathcal V = \{v_1,v_2,\cdots,v_{\mathcal M}\} ∀ot​∈O,ot​∈V={v1​,v2​,⋯,vM​} 我们称 Q \mathcal Q Q为状态值集合,称 V \mathcal V V为观测值集合。
模型参数的数学符号表示

隐马尔可夫模型的参数记作 λ \lambda λ,模型参数共包含三个部分:

  • 初始概率分布 π \pi π: 即 状态变量 i 1 i_1 i1​ 选择任意状态值 q i ( i ∈ 1 , 2 , ⋯   , K ) q_i (i \in {1,2,\cdots,\mathcal K}) qi​(i∈1,2,⋯,K)的概率结果组成的向量形式。即: π = [ P ( i 1 = q 1 ) P ( i 1 = q 2 ) ⋮ P ( i 1 = q K ) ] K × 1 ∑ k = 1 K P ( i 1 = q k ) = 1 \pi = \begin{bmatrix}P(i_1 = q_1) \\ P(i_1 = q_2) \\ \vdots \\ P(i_1 = q_{\mathcal K})\end{bmatrix}_{\mathcal K \times 1} \quad \sum_{k=1}^{\mathcal K} P(i_1 = q_{k}) = 1 π=⎣ ⎡​P(i1​=q1​)P(i1​=q2​)⋮P(i1​=qK​)​⎦ ⎤​K×1​k=1∑K​P(i1​=qk​)=1
  • 状态转移矩阵 A \mathcal A A:矩阵 A \mathcal A A中的任意一个元素 a i j a_{ij} aij​表示当前时刻的状态变量 i t = q i i_t =q_i it​=qi​的条件下,下一时刻的状态变量 i t + 1 = q j i_{t+1} = q_j it+1​=qj​的后验概率结果。即: A = [ a i j ] K × K , a i j = P ( i t + 1 = q j ∣ i t = q i ) \mathcal A = [a_{ij}]_{\mathcal K \times \mathcal K},\quad a_{ij} = P(i_{t+1}=q_j \mid i_t = q_i) A=[aij​]K×K​,aij​=P(it+1​=qj​∣it​=qi​)
  • 发射矩阵 B \mathcal B B:矩阵中的任意一个元素 b j ( k ) b_{j}(k) bj​(k)表示当前时刻的状态变量 i t = q i i_t =q_i it​=qi​的条件下,对应时刻的观测变量 o t = v k o_t = v_k ot​=vk​的后验概率结果。即: B = [ b j ( k ) ] K × M , b j ( k ) = P ( o t = v k ∣ i t = q j ) \mathcal B = [b_j(k)]_{\mathcal K \times \mathcal M}, \quad b_j(k) = P(o_t = v_k \mid i_t = q_j) B=[bj​(k)]K×M​,bj​(k)=P(ot​=vk​∣it​=qj​)
HMM模型的特殊性质
  • 齐次马尔可夫假设:任一时刻状态变量 i t i_t it​的后验概率,只和其前一时刻的状态变量 i t − 1 i_{t-1} it−1​相关,与其他变量无关。即: P ( i t ∣ i t − 1 , ⋯   , i 1 , o t − 1 , ⋯   , o 1 ) = P ( i t ∣ i t − 1 ) P(i_t \mid i_{t-1},\cdots,i_{1},o_{t-1},\cdots,o_1) = P(i_t \mid i_{t-1}) P(it​∣it−1​,⋯,i1​,ot−1​,⋯,o1​)=P(it​∣it−1​)
  • 观测独立性假设:任一时刻观测变量 o t o_t ot​的后验概率,只和对应时刻的状态变量 i t i_t it​相关,与其他变量无关。 P ( o t ∣ i t , i t − 1 , ⋯   , i 1 , o t − 1 , ⋯   , o 1 ) = P ( o t ∣ i t ) P(o_t \mid i_t,i_{t-1},\cdots,i_1,o_{t-1},\cdots,o_1) = P(o_t \mid i_t) P(ot​∣it​,it−1​,⋯,i1​,ot−1​,⋯,o1​)=P(ot​∣it​)
模型参数 λ \lambda λ求解

我们在前向算法(Forward Algorithm)中介绍过,如果直接求解 P ( O ∣ λ ) P(\mathcal O \mid \lambda) P(O∣λ),它的表达式结果如下: P ( O ∣ λ ) = ∑ I P ( O , I ∣ λ ) = ∑ i 1 ⋯ ∑ i T [ π ⋅ ∏ t = 2 T a i t , i t + 1 ⋅ ∏ t = 1 T b i t ( o t ) ] \begin{aligned} P(\mathcal O \mid \lambda) & = \sum_{\mathcal I}P(\mathcal O,\mathcal I \mid \lambda) \\ & = \sum_{i_1}\cdots\sum_{i_T} \left[\pi \cdot \prod_{t=2}^{T} a_{i_{t},i_{t+1}} \cdot \prod_{t=1}^{T} b_{i_t}(o_t)\right] \end{aligned} P(O∣λ)​=I∑​P(O,I∣λ)=i1​∑​⋯iT​∑​[π⋅t=2∏T​ait​,it+1​​⋅t=1∏T​bit​​(ot​)]​ 如果对 P ( O ∣ λ ) P(\mathcal O \mid \lambda) P(O∣λ)直接使用极大似然估计,即: λ ^ M L E = arg ⁡ max ⁡ λ P ( O ∣ λ ) \hat {\lambda}_{MLE} = \mathop{\arg\max}\limits_{\lambda} P(\mathcal O \mid \lambda) λ^MLE​=λargmax​P(O∣λ) 直接求解的方式存在许多问题:

  • 首先,观测序列 O \mathcal O O中的个观测变量 o 1 , ⋯   , o T o_1, \cdots,o_T o1​,⋯,oT​之间不是独立同分布,不能使用 arg ⁡ max ⁡ λ ∏ i = 1 T P ( o i ∣ λ ) \mathop{\arg\max}\limits_{\lambda}\prod_{i=1}^{T} P(o_i \mid \lambda) λargmax​∏i=1T​P(oi​∣λ)进行求解;
  • 并且直接求解 P ( O ∣ λ ) P(\mathcal O \mid \lambda) P(O∣λ)的时间复杂度是 O ( K 2 × T ) O(\mathcal K^2 \times T) O(K2×T),因此求解过程十分复杂。本节将介绍使用EM算法通过迭代的方式求解模型参数 λ \lambda λ。
EM算法求解模型参数 E步操作

EM算法公式表示如下: θ ( t + 1 ) = arg ⁡ max ⁡ θ ∫ Z P ( X , Z ∣ θ ) ⋅ P ( Z ∣ X , θ ( t ) ) d Z \theta^{(t+1)} = \mathop{\arg\max}\limits_{\theta} \int_{\mathcal Z} P(\mathcal X,\mathcal Z \mid \theta)\cdot P(\mathcal Z \mid \mathcal X,\theta^{(t)}) d\mathcal Z θ(t+1)=θargmax​∫Z​P(X,Z∣θ)⋅P(Z∣X,θ(t))dZ 将EM算法中出现的变量与隐马尔可夫模型中出现的概念进行映射:

  • X \mathcal X X:观测数据(Observed Data) → \to → 观测序列 O \mathcal O O;
  • Z \mathcal Z Z:隐变量(Latent Data) → \to → 状态序列 I \mathcal I I;
  • θ \theta θ:参数(Parameter) → \to → 模型参数 λ \lambda λ;

经过映射后,基于隐马尔可夫模型的EM算法表示如下: '状态序列' I \mathcal I I是离散的,因此积分过程中改成 ∑ \sum ∑; λ ( t + 1 ) = arg ⁡ max ⁡ λ ∑ I [ log ⁡ P ( O , ∣ λ ) ⋅ P ( I ∣ O , λ ( t ) ) ] \lambda^{(t+1)} = \mathop{\arg\max}\limits_{\lambda} \sum_{\mathcal I} \left[\log P(\mathcal O,\mathcal \mid \lambda) \cdot P(\mathcal I \mid \mathcal O, \lambda^{(t)})\right] λ(t+1)=λargmax​I∑​[logP(O,∣λ)⋅P(I∣O,λ(t))] 为了简化运算,对上述公式进行变形: 条件概率公式~ λ ( t + 1 ) = arg ⁡ max ⁡ λ ∑ I [ log ⁡ P ( O , ∣ λ ) ⋅ P ( I , O ∣ λ ( t ) ) P ( O , λ ( t ) ) ] \lambda^{(t+1)} = \mathop{\arg\max}\limits_{\lambda} \sum_{\mathcal I} \left[\log P(\mathcal O,\mathcal \mid \lambda) \cdot \frac{P(\mathcal I,\mathcal O \mid \lambda^{(t)})}{P(\mathcal O,\lambda^{(t)})}\right] λ(t+1)=λargmax​I∑​[logP(O,∣λ)⋅P(O,λ(t))P(I,O∣λ(t))​]

观察后项的分母部分 P ( O , λ ( t ) ) P(\mathcal O ,\lambda^{(t)}) P(O,λ(t)):

  • λ ( t ) \lambda^{(t)} λ(t)是上一次迭代步骤的参数结果,是已知项;
  • 观测序列 O \mathcal O O(就是样本数据),它和 λ \lambda λ的取值无关;

至此,EM算法可化简为如下形式: 需要注意的点:化简后的结果已经不是期望形式了~ λ ( t + 1 ) = arg ⁡ max ⁡ λ ∑ I [ log ⁡ P ( I , O ∣ λ ) ⋅ P ( I , O ∣ λ ( t ) ) ] \begin{aligned} \lambda^{(t+1)} & = \mathop{\arg\max}\limits_{\lambda} \sum_{\mathcal I} \left[\log P(\mathcal I,\mathcal O\mid \lambda) \cdot P(\mathcal I,\mathcal O \mid \lambda^{(t)})\right] \\ \end{aligned} λ(t+1)​=λargmax​I∑​[logP(I,O∣λ)⋅P(I,O∣λ(t))]​ 并且 λ ( t ) \lambda^{(t)} λ(t)以及迭代求解后的 λ ( t + 1 ) \lambda^{(t+1)} λ(t+1)本身不是一个参数,而是由三个参数组成: λ ( t ) = ( π ( t ) , A ( t ) , B ( t ) ) ; λ ( t + 1 ) = ( π ( t + 1 ) , A ( t + 1 ) , B ( t + 1 ) ) \lambda^{(t)} = (\pi^{(t)}, \mathcal A^{(t)},\mathcal B^{(t)}); \quad \lambda^{(t+1)} = (\pi^{(t+1)},\mathcal A^{(t+1)},\mathcal B^{(t+1)}) λ(t)=(π(t),A(t),B(t));λ(t+1)=(π(t+1),A(t+1),B(t+1)) 将EM算法的公式部分表示为关于 λ , λ ( t ) \lambda,\lambda^{(t)} λ,λ(t)的函数: Q ( λ , λ ( t ) ) = ∑ I [ log ⁡ P ( I , O ∣ λ ) ⋅ P ( I , O ∣ λ ( t ) ) ] \mathcal Q(\lambda,\lambda^{(t)}) = \sum_{\mathcal I} \left[\log P(\mathcal I,\mathcal O\mid \lambda) \cdot P(\mathcal I,\mathcal O \mid \lambda^{(t)})\right] Q(λ,λ(t))=I∑​[logP(I,O∣λ)⋅P(I,O∣λ(t))] 将直接求解得到的 P ( O , I ∣ λ ) = π ⋅ ∏ t = 2 T a i t , i t + 1 ⋅ ∏ t = 1 T b i t ( o t ) P(\mathcal O,\mathcal I \mid \lambda) = \pi \cdot \prod_{t=2}^{T} a_{i_{t},i_{t+1}} \cdot \prod_{t=1}^{T} b_{i_t}(o_t) P(O,I∣λ)=π⋅∏t=2T​ait​,it+1​​⋅∏t=1T​bit​​(ot​)带入 Q ( λ , λ ( t ) ) \mathcal Q(\lambda,\lambda^{(t)}) Q(λ,λ(t))中: Q ( λ , λ ( t ) ) = ∑ I [ log ⁡ ( π ⋅ ∏ t = 2 T a i t , i t + 1 ⋅ ∏ t = 1 T b i t ( o t ) ) ⋅ P ( I , O ∣ λ ( t ) ) ] = ∑ I [ ( log ⁡ π + log ⁡ ∏ t = 2 T a i t , i t + 1 + log ⁡ ∏ t = 1 T b i t ( o t ) ) ⋅ P ( I , O ∣ λ ( t ) ) ] = ∑ I [ ( log ⁡ π + ∑ t = 2 T log ⁡ a i t , i t + 1 + ∑ t = 1 T log ⁡ b i t ( o t ) ) ⋅ P ( I , O ∣ λ ( t ) ) ] \begin{aligned} \mathcal Q(\lambda,\lambda^{(t)}) & = \sum_{\mathcal I} \left[\log \left(\pi \cdot \prod_{t=2}^{T} a_{i_{t},i_{t+1}} \cdot \prod_{t=1}^{T} b_{i_t}(o_t)\right) \cdot P(\mathcal I,\mathcal O \mid \lambda^{(t)})\right] \\ & = \sum_{\mathcal I}\left[\left( \log \pi + \log \prod_{t=2}^T a_{i_{t},i_{t+1}} + \log \prod_{t=1}^T b_{i_t}(o_t) \right) \cdot P(\mathcal I,\mathcal O \mid \lambda^{(t)})\right] \\ & = \sum_{\mathcal I} \left[\left(\log \pi + \sum_{t=2}^T \log a_{i_{t},i_{t+1}} + \sum_{t=1}^T \log b_{i_t}(o_t) \right) \cdot P(\mathcal I,\mathcal O \mid \lambda^{(t)})\right] \end{aligned} Q(λ,λ(t))​=I∑​[log(π⋅t=2∏T​ait​,it+1​​⋅t=1∏T​bit​​(ot​))⋅P(I,O∣λ(t))]=I∑​[(logπ+logt=2∏T​ait​,it+1​​+logt=1∏T​bit​​(ot​))⋅P(I,O∣λ(t))]=I∑​[(logπ+t=2∑T​logait​,it+1​​+t=1∑T​logbit​​(ot​))⋅P(I,O∣λ(t))]​ 这里以求解 π ( t + 1 ) \pi^{(t+1)} π(t+1)为例,进行求解。 观察上式中的小括号部分,只有 log ⁡ π \log \pi logπ和 π \pi π有关,而剩余两项均和 π \pi π无关联,看做常数。因此,针对求解 π ( t + 1 ) \pi^{(t+1)} π(t+1)的 Q ( λ , λ ( t ) ) \mathcal Q(\lambda,\lambda^{(t)}) Q(λ,λ(t))表达如下: π ( t + 1 ) = arg ⁡ max ⁡ π Q ( λ , λ ( t ) ) = arg ⁡ max ⁡ π ∑ I [ log ⁡ π ⋅ P ( O , I ∣ λ ( t ) ) ] = arg ⁡ max ⁡ π ∑ i 1 ⋯ ∑ i T [ log ⁡ π ⋅ P ( O , i 1 , ⋯   , i T ∣ λ ( t ) ) ] \begin{aligned} \pi^{(t+1)} & = \mathop{\arg\max}\limits_{\pi} \mathcal Q(\lambda,\lambda^{(t)}) \\ & = \mathop{\arg\max}\limits_{\pi} \sum_{\mathcal I} [\log \pi \cdot P(\mathcal O,\mathcal I \mid \lambda^{(t)})] \\ & = \mathop{\arg\max}\limits_{\pi} \sum_{i_1} \cdots \sum_{i_T}[\log \pi \cdot P(\mathcal O,i_1,\cdots,i_T \mid \lambda^{(t)})] \end{aligned} π(t+1)​=πargmax​Q(λ,λ(t))=πargmax​I∑​[logπ⋅P(O,I∣λ(t))]=πargmax​i1​∑​⋯iT​∑​[logπ⋅P(O,i1​,⋯,iT​∣λ(t))]​ 观察上述展开结果, π \pi π根据定义,状态序列 I \mathcal I I 的初始概率分布 π \pi π只和 第一个状态变量 i 1 i_1 i1​相关,和其他变量无关。因此,将上式继续化简成如下形式: i 1 i_1 i1​的取值范围是离散的,是状态值集合 Q \mathcal Q Q中的一个元素。因此将 ∑ i 1 \sum_{i_1} ∑i1​​改为 ∑ k = 1 K \sum_{k=1}^{\mathcal K} ∑k=1K​。 π ( t + 1 ) = arg ⁡ max ⁡ π ∑ i 1 [ log ⁡ π ⋅ P ( O , i 1 ∣ λ ( t ) ) ] = arg ⁡ max ⁡ π ∑ k = 1 K [ log ⁡ P ( i 1 = q k ) ⋅ P ( O , i 1 = q k ∣ λ ( t ) ) ] \begin{aligned} \pi^{(t+1)} & = \mathop{\arg\max}\limits_{\pi} \sum_{i_1} [\log \pi \cdot P(\mathcal O,i_1 \mid \lambda^{(t)})] \\ & = \mathop{\arg\max}\limits_{\pi} \sum_{k=1}^{\mathcal K}[\log P(i_1 = q_k) \cdot P(\mathcal O,i_1=q_k \mid \lambda^{(t)})] \end{aligned} π(t+1)​=πargmax​i1​∑​[logπ⋅P(O,i1​∣λ(t))]=πargmax​k=1∑K​[logP(i1​=qk​)⋅P(O,i1​=qk​∣λ(t))]​ 需要注意的是,将 π \pi π写成概率组成向量的形式,是存在约束条件的。即: ∑ k = 1 K P ( i 1 = q k ) = 1 \sum_{k=1}^{\mathcal K} P(i_1 = q_k) = 1 k=1∑K​P(i1​=qk​)=1 至此,将 π ( t + 1 ) \pi^{(t+1)} π(t+1)的求问题转化为 带一个约束的优化问题: { arg ⁡ max ⁡ π ∑ k = 1 K [ log ⁡ P ( i 1 = q k ) ⋅ P ( O , i 1 = q k ∣ λ ( t ) ) ] s . t . ∑ k = 1 K P ( i 1 = q k ) = 1 \begin{cases} \mathop{\arg\max}\limits_{\pi} \sum_{k=1}^{\mathcal K}[\log P(i_1 = q_k) \cdot P(\mathcal O,i_1=q_k \mid \lambda^{(t)})] \\ s.t. \quad \sum_{k=1}^{\mathcal K} P(i_1 = q_k) = 1 \end{cases} ⎩ ⎨ ⎧​πargmax​∑k=1K​[logP(i1​=qk​)⋅P(O,i1​=qk​∣λ(t))]s.t.∑k=1K​P(i1​=qk​)=1​

M步操作

使用拉格朗日乘数法求解该问题: 定义拉格朗日函数 L ( π , η ) \mathcal L(\pi,\eta) L(π,η)表示如下: ∑ k = 1 K P ( i 1 = q k ) \sum_{k=1}^{\mathcal K} P(i_1 = q_k) ∑k=1K​P(i1​=qk​) 1 1 1之间怎么去减,需要视情况而定。 L ( π , η ) = ∑ k = 1 K [ log ⁡ P ( i 1 = q k ) ⋅ P ( O , i 1 = q k ∣ λ ( t ) ) ] + η ( ∑ k = 1 K P ( i 1 = q k ) − 1 ) \mathcal L(\pi,\eta) = \sum_{k=1}^{\mathcal K} [\log P(i_1 = q_k) \cdot P(\mathcal O,i_1=q_k \mid \lambda^{(t)})] + \eta \left(\sum_{k=1}^{\mathcal K} P(i_1 = q_k) - 1\right) L(π,η)=k=1∑K​[logP(i1​=qk​)⋅P(O,i1​=qk​∣λ(t))]+η(k=1∑K​P(i1​=qk​)−1) 令 π k = P ( i 1 = q k ) \pi_k = P(i_1 = q_k) πk​=P(i1​=qk​),即 π \pi π向量中的其中一个分量。令 L ( π , η ) \mathcal L(\pi,\eta) L(π,η)对 π k \pi_k πk​求偏导: ∂ L ∂ π k = 1 π k P ( O , i 1 = q k ∣ λ ( t ) ) + η ( 1 − 0 ) \frac{\partial \mathcal L}{\partial \pi_k} = \frac{1}{\pi_k} P(\mathcal O,i_1 = q_k \mid \lambda^{(t)}) + \eta(1 - 0) ∂πk​∂L​=πk​1​P(O,i1​=qk​∣λ(t))+η(1−0) 令 ∂ L ∂ π k ≜ 0 \frac{\partial \mathcal L}{\partial \pi_k} \triangleq 0 ∂πk​∂L​≜0,有: P ( O , i 1 = q k ∣ λ ( t ) ) + π k ⋅ η = 0 P(\mathcal O,i_1 = q_k\mid \lambda^{(t)}) + \pi_k \cdot\eta = 0 P(O,i1​=qk​∣λ(t))+πk​⋅η=0 基于上式,则有: ∑ k = 1 K [ P ( O , i 1 = q k ∣ λ ( t ) ) + π k ⋅ η ] = 0 \sum_{k=1}^{\mathcal K} \left[P(\mathcal O,i_1 = q_k\mid \lambda^{(t)}) + \pi_k \cdot \eta \right] = 0 k=1∑K​[P(O,i1​=qk​∣λ(t))+πk​⋅η]=0 该式可从两种角度解释:

  • 从积分角度解释:上式左右两端均对 i 1 i_1 i1​求积分:常数0的积分依然是常数。
  • 从偏导角度解释:对 i 1 i_1 i1​可选择的每一种可能对应的概率求偏导,并令其均为0。则有: ∑ k = 1 K [ P ( O , i 1 = q k ∣ λ ( t ) ) + π k ⋅ η ] = 0 + 0 + ⋯ + 0 = 0 \sum_{k=1}^{\mathcal K} \left[P(\mathcal O,i_1 = q_k \mid \lambda^{(t)}) + \pi_k \cdot \eta \right] = 0+ 0 + \cdots + 0 = 0 k=1∑K​[P(O,i1​=qk​∣λ(t))+πk​⋅η]=0+0+⋯+0=0

将 ∑ k = 1 K [ P ( O , i 1 = q k ∣ λ ( t ) ) + π k ⋅ η ] = 0 \sum_{k=1}^{\mathcal K} \left[P(\mathcal O,i_1 = q_k\mid \lambda^{(t)}) + \pi_k \cdot \eta \right] = 0 ∑k=1K​[P(O,i1​=qk​∣λ(t))+πk​⋅η]=0展开: 边缘概率分布~ 约束条件~

  • ∑ k = 1 K P ( O , i 1 = q k ∣ λ ( t ) ) = ∑ i 1 P ( O , i 1 = q k ∣ λ ( t ) ) = P ( O ∣ λ ( t ) ) \sum_{k=1}^{\mathcal K} P(\mathcal O,i_1 = q_k\mid \lambda^{(t)}) = \sum_{i_1}P(\mathcal O,i_1 = q_k\mid \lambda^{(t)}) = P(\mathcal O \mid \lambda^{(t)}) ∑k=1K​P(O,i1​=qk​∣λ(t))=∑i1​​P(O,i1​=qk​∣λ(t))=P(O∣λ(t))
  • ∑ k = 1 K π k ⋅ η = η ⋅ ( ∑ k = 1 K π k ) = η ⋅ 1 = η \sum_{k=1}^{\mathcal K} \pi_{k} \cdot \eta = \eta \cdot \left(\sum_{k=1}^{\mathcal K} \pi_k\right) = \eta \cdot 1 = \eta ∑k=1K​πk​⋅η=η⋅(∑k=1K​πk​)=η⋅1=η

最终结果有: P ( O ∣ λ ( t ) ) + η = 0 → η = − P ( O ∣ λ ( t ) ) P(\mathcal O \mid \lambda^{(t)}) + \eta = 0 \\ \to \eta =-P(\mathcal O \mid \lambda^{(t)}) P(O∣λ(t))+η=0→η=−P(O∣λ(t))

将 η = − P ( O ∣ λ ( t ) ) \eta =-P(\mathcal O \mid \lambda^{(t)}) η=−P(O∣λ(t))带回 P ( O , i 1 = q k ∣ λ ( t ) ) + π k ⋅ η = 0 P(\mathcal O,i_1 = q_k\mid \lambda^{(t)}) + \pi_k \cdot\eta = 0 P(O,i1​=qk​∣λ(t))+πk​⋅η=0中,求得 p i k pi_k pik​的最终结果为: π k ( t + 1 ) = P ^ ( i 1 = q k ) = P ( O , i 1 = q k ∣ λ ( t ) ) P ( O ∣ λ ( t ) ) \pi_k^{(t+1)} = \hat P(i_1 = q_k)=\frac{P(\mathcal O,i_1 = q_k \mid \lambda^{(t)})}{P(\mathcal O \mid \lambda^{(t)})} πk(t+1)​=P^(i1​=qk​)=P(O∣λ(t))P(O,i1​=qk​∣λ(t))​

同理,可以求解出其他分量的迭代结果,从而得到迭代后的初始概率分布 π ( t + 1 ) \pi^{(t+1)} π(t+1): π ( t + 1 ) = ( π 1 ( t + 1 ) , π 2 ( t + 1 ) , ⋯   , π K ( t + 1 ) ) T \pi^{(t+1)} = (\pi_1^{(t+1)},\pi_2^{(t+1)},\cdots,\pi_{\mathcal K}^{(t+1)})^{T} π(t+1)=(π1(t+1)​,π2(t+1)​,⋯,πK(t+1)​)T 此时,就已经将 π ( t + 1 ) \pi^{(t+1)} π(t+1)和 λ ( t ) \lambda^{(t)} λ(t)的迭代方式找出,同理 A ( t + 1 ) , B ( t + 1 ) \mathcal A^{(t+1)},\mathcal B^{(t+1)} A(t+1),B(t+1)也使用相似方式。

下一节将介绍解码问题(Decoding)。 相关参考: 机器学习-隐马尔可夫模型5-Learning问题-Baum Welch算法(EM)

关注
打赏
1664446683
查看更多评论
立即登录/注册

微信扫码登录

0.3020s