您当前的位置: 首页 >  算法

静静的喝酒

暂无认证

  • 2浏览

    0关注

    99博文

    0收益

  • 0浏览

    0点赞

    0打赏

    0留言

私信
关注
热门博文

机器学习笔记之高斯混合模型(三)EM算法求解高斯混合模型(E步操作)

静静的喝酒 发布时间:2022-09-09 22:16:03 ,浏览量:2

机器学习笔记之高斯混合模型——EM算法求解高斯混合模型【E步操作】
  • 引言
    • 回顾:高斯混合模型及模型参数
    • 回顾:狭义EM算法
    • 使用EM算法求解高斯混合模型参数
      • 场景整理
      • 求解过程(E步过程)

引言

上一节介绍了尝试使用极大似然估计求解高斯混合模型的模型参数,但无法求出解析解。本节将介绍使用EM算法求解高斯混合模型的模型参数。

回顾:高斯混合模型及模型参数

令 X \mathcal X X表示观测数据(Observed Data),共包含 N N N个样本点,并假设 任意样本之间独立同分布: X = { x ( 1 ) , x ( 2 ) , ⋯   , x ( N ) } x ( i ) ∼ i.i.d. x ( j ) ( x ( i ) , x ( j ) ∈ X ; i ≠ j ) \mathcal X = \{x^{(1)},x^{(2)},\cdots, x^{(N)}\} \\ x^{(i)} \overset{\text{i.i.d.}}{\sim}x^{(j)} \quad (x^{(i)},x^{(j)} \in \mathcal X;i\neq j) X={x(1),x(2),⋯,x(N)}x(i)∼i.i.d.x(j)(x(i),x(j)∈X;i=j) 任意一个样本点 x ( i ) x^{(i)} x(i)均对应一个隐变量 z ( i ) z^{(i)} z(i)。从样本数量角度观察,隐变量集合 Z \mathcal Z Z表示如下: Z = { z ( 1 ) , z ( 2 ) , ⋯   , z ( N ) } \mathcal Z = \{z^{(1)},z^{(2)},\cdots,z^{(N)}\} Z={z(1),z(2),⋯,z(N)} 称 ( X , Z ) (\mathcal X,\mathcal Z) (X,Z)为完整数据(Complete Data),样本数量角度表示如下: ( X , Z ) = { ( x ( 1 ) , z ( 1 ) ) , ⋯   , ( x ( N ) , z ( N ) ) } (\mathcal X,\mathcal Z) = \{(x^{(1)},z^{(1)}),\cdots,(x^{(N)},z^{(N)})\} (X,Z)={(x(1),z(1)),⋯,(x(N),z(N))} 从变量分布的角度观察,隐变量 Z \mathcal Z Z是基于 K \mathcal K K个参数的离散分布,各参数及对应概率分布表示如下:

Z \mathcal Z Z z 1 z_1 z1​ z 2 z_2 z2​ ⋯ \cdots ⋯ z K z_{\mathcal K} zK​ P ( Z ) P(\mathcal Z) P(Z) p 1 p_1 p1​ p 2 p_2 p2​ ⋯ \cdots ⋯ p K p_{\mathcal K} pK​

并满足: ∑ k = 1 K p k = 1 \sum_{k=1}^{\mathcal K} p_k = 1 k=1∑K​pk​=1 任意 z j ∈ Z z_j \in \mathcal Z zj​∈Z均唯一对应一个高斯分布。换句话说,给定隐变量标签 z j ∈ Z z_j \in \mathcal Z zj​∈Z的条件下, z j z_j zj​标签下的样本数据 x x x服从高斯分布。因而共包含 K \mathcal K K个高斯分布:

Z \mathcal Z Z z 1 z_1 z1​ z 2 z_2 z2​ ⋯ \cdots ⋯ z K z_{\mathcal K} zK​ P ( X ∣ Z ) P(\mathcal X \mid \mathcal Z) P(X∣Z) N ( μ 1 , Σ 1 ) \mathcal N(\mu_1,\Sigma_1) N(μ1​,Σ1​) N ( μ 2 , Σ 2 ) \mathcal N(\mu_2,\Sigma_2) N(μ2​,Σ2​) ⋯ \cdots ⋯ N ( μ K , Σ K ) \mathcal N(\mu_{\mathcal K},\Sigma_{\mathcal K}) N(μK​,ΣK​)

数学符号表达即: P ( X ∣ Z = z k ) ∼ N ( X ∣ μ k , Σ k ) ( k = 1 , 2 , ⋯   , K ) P(\mathcal X\mid \mathcal Z = z_k) \sim \mathcal N(\mathcal X \mid \mu_{k},\Sigma_k) \quad (k=1,2,\cdots,\mathcal K) P(X∣Z=zk​)∼N(X∣μk​,Σk​)(k=1,2,⋯,K) 因此,高斯混合模型的概率模型 P ( X ) P(\mathcal X) P(X)表达如下: P ( X ) = ∑ Z P ( X ∣ Z = z k ) P ( Z = z k ) = ∑ k = 1 K N ( X ∣ μ k , Σ k ) ⋅ p k \begin{aligned}P(\mathcal X) & = \sum_{\mathcal Z} P(\mathcal X \mid \mathcal Z = z_k)P(\mathcal Z = z_k) \\ & = \sum_{k=1}^{\mathcal K}\mathcal N(\mathcal X \mid \mu_k,\Sigma_k)\cdot p_k \end{aligned} P(X)​=Z∑​P(X∣Z=zk​)P(Z=zk​)=k=1∑K​N(X∣μk​,Σk​)⋅pk​​ 概率模型的模型参数 θ \theta θ表示如下: θ = { p 1 , ⋯   , p K , μ 1 , ⋯   , μ K , Σ 1 , ⋯   , Σ K } \theta = \{p_1,\cdots,p_{\mathcal K},\mu_1,\cdots,\mu_{\mathcal K},\Sigma_1,\cdots,\Sigma_{\mathcal K}\} θ={p1​,⋯,pK​,μ1​,⋯,μK​,Σ1​,⋯,ΣK​}

回顾:狭义EM算法

EM算法是求解概率模型 P ( X ∣ θ ) P(\mathcal X \mid \theta) P(X∣θ)模型参数的一种方法,它的底层是极大似然估计,它的迭代求解公式具体表示如下: θ ( t + 1 ) = arg ⁡ max ⁡ θ ∫ Z log ⁡ P ( X , Z ∣ θ ) ⋅ P ( Z ∣ X , θ ( t ) ) d Z = arg ⁡ max ⁡ θ E Z ∣ X , θ [ log ⁡ P ( X , Z ∣ θ ) ] \begin{aligned} \theta^{(t+1)} & = \mathop{\arg\max}\limits_{\theta} \int_{\mathcal Z} \log P(\mathcal X,\mathcal Z \mid \theta) \cdot P(\mathcal Z \mid \mathcal X,\theta^{(t)}) d\mathcal Z \\ & = \mathop{\arg\max}\limits_{\theta} \mathbb E_{\mathcal Z \mid \mathcal X,\theta} \left[\log P(\mathcal X,\mathcal Z \mid \theta)\right] \end{aligned} θ(t+1)​=θargmax​∫Z​logP(X,Z∣θ)⋅P(Z∣X,θ(t))dZ=θargmax​EZ∣X,θ​[logP(X,Z∣θ)]​ 基于上述公式,可以将EM算法分成两个步骤:

  • E步(Expection-Step):令 E Z ∣ X , θ [ log ⁡ P ( X , Z ∣ θ ) ] E_{\mathcal Z \mid \mathcal X,\theta} \left[\log P(\mathcal X,\mathcal Z \mid \theta)\right] EZ∣X,θ​[logP(X,Z∣θ)]表示为 关于 θ , θ ( t ) \theta,\theta^{(t)} θ,θ(t)的函数。则有: L ( θ , θ ( t ) ) = ∫ Z log ⁡ P ( X , Z ∣ θ ) ⋅ P ( Z ∣ X , θ ( t ) ) d Z \mathcal L(\theta,\theta^{(t)}) = \int_{\mathcal Z} \log P(\mathcal X,\mathcal Z \mid \theta) \cdot P(\mathcal Z \mid \mathcal X,\theta^{(t)}) d\mathcal Z L(θ,θ(t))=∫Z​logP(X,Z∣θ)⋅P(Z∣X,θ(t))dZ
  • M步(Maximization-Step):基于E步操作,选择合适的 θ \theta θ,使得 L ( θ , θ ( t ) ) \mathcal L(\theta,\theta^{(t)}) L(θ,θ(t))最大。 θ ( t + 1 ) = arg ⁡ max ⁡ θ L ( θ , θ ( t ) ) \theta^{(t+1)} = \mathop{\arg\max}\limits_{\theta} \mathcal L(\theta,\theta^{(t)}) θ(t+1)=θargmax​L(θ,θ(t))

E步、M步交替进行,最终迭代收敛至最优解(至少局部最优)。

使用EM算法求解高斯混合模型参数 场景整理

EM算法中符号表示与高斯混合模型中的符号表示 对比如下: 等号左端是‘EM算法’的符号表示;等号右端是‘高斯混合模型’的符号表示。 P ( X , Z ∣ θ ) = P ( Z = z j ) ⋅ P ( X ∣ Z = z j ) = p Z ⋅ N ( X ∣ μ Z , Σ Z ) = ∏ i = 1 N p z ( i ) ⋅ N ( x ( i ) ∣ μ z ( i ) , Σ z ( i ) ) P ( Z ∣ X , θ ) = P ( X , Z ) P ( X ) = ∏ i = 1 N p z ( i ) ⋅ N ( x ( i ) ∣ μ z ( i ) , Σ z ( i ) ) ∑ k = 1 K p k ⋅ N ( X ∣ μ k , Σ k ) \begin{aligned}P(\mathcal X,\mathcal Z \mid \theta) & = P(\mathcal Z = z_j)\cdot P(\mathcal X \mid \mathcal Z = z_j) \\ & = p_{\mathcal Z} \cdot \mathcal N(\mathcal X \mid \mu_{\mathcal Z},\Sigma_{\mathcal Z}) \\ & = \prod_{i=1}^N p_{z^{(i)}} \cdot \mathcal N(x^{(i)} \mid \mu_{z^{(i)}},\Sigma_{z^{(i)}})\\ P(\mathcal Z \mid \mathcal X,\theta) & = \frac{P(\mathcal X,\mathcal Z)}{P(\mathcal X)} \\ & = \frac{\prod_{i=1}^N p_{z^{(i)}} \cdot \mathcal N(x^{(i)} \mid \mu_{z^{(i)}},\Sigma_{z^{(i)}})}{\sum_{k=1}^{\mathcal K} p_k\cdot \mathcal N(\mathcal X \mid \mu_k,\Sigma_k)} \end{aligned} P(X,Z∣θ)P(Z∣X,θ)​=P(Z=zj​)⋅P(X∣Z=zj​)=pZ​⋅N(X∣μZ​,ΣZ​)=i=1∏N​pz(i)​⋅N(x(i)∣μz(i)​,Σz(i)​)=P(X)P(X,Z)​=∑k=1K​pk​⋅N(X∣μk​,Σk​)∏i=1N​pz(i)​⋅N(x(i)∣μz(i)​,Σz(i)​)​​

求解过程(E步过程)
  • 已知 L ( θ , θ ( t ) ) \mathcal L(\theta,\theta^{(t)}) L(θ,θ(t))函数表示如下: L ( θ , θ ( t ) ) = ∫ Z log ⁡ P ( X , Z ∣ θ ) ⋅ P ( Z ∣ X , θ ( t ) ) d Z \mathcal L(\theta,\theta^{(t)}) = \int_{\mathcal Z} \log P(\mathcal X,\mathcal Z \mid \theta) \cdot P(\mathcal Z \mid \mathcal X,\theta^{(t)}) d\mathcal Z L(θ,θ(t))=∫Z​logP(X,Z∣θ)⋅P(Z∣X,θ(t))dZ

  • 将 P ( X , Z ∣ θ ) , P ( Z ∣ X , θ ) P(\mathcal X,\mathcal Z \mid \theta),P(\mathcal Z \mid \mathcal X,\theta) P(X,Z∣θ),P(Z∣X,θ)代入上式: 由于‘高斯混合模型’隐变量 Z \mathcal Z Z是离散型参数,因而将 ∫ \int ∫符号改为 ∑ \sum ∑符号,并且各样本之间服从’独立同分布‘。 ∑ Z log ⁡ ∏ i = 1 N P ( x ( i ) , z ( i ) ∣ θ ) ⋅ ∏ i = 1 N P ( z ( i ) ∣ x ( i ) , θ ( t ) ) \sum_{\mathcal Z} \log \prod_{i=1}^N P(x^{(i)},z^{(i)} \mid \theta) \cdot \prod_{i=1}^N P(z^{(i)} \mid x^{(i)},\theta^{(t)}) Z∑​logi=1∏N​P(x(i),z(i)∣θ)⋅i=1∏N​P(z(i)∣x(i),θ(t))

  • 将 log ⁡ ∏ i = 1 N P ( x ( i ) , z ( i ) ∣ θ ) \log \prod_{i=1}^N P(x^{(i)},z^{(i)} \mid \theta) log∏i=1N​P(x(i),z(i)∣θ)进行变换,并将 ∑ Z \sum_{\mathcal Z} ∑Z​展开: ∑ z ( 1 ) , z ( 2 ) , ⋯   , z ( N ) ∑ i = 1 N log ⁡ P ( x ( i ) , z ( i ) ∣ θ ) ⋅ ∏ i = 1 N P ( z ( i ) ∣ x ( i ) , θ ( t ) ) \sum_{z^{(1)},z^{(2)},\cdots,z^{(N)}} \sum_{i=1}^N \log P(x^{(i)},z^{(i)} \mid \theta) \cdot \prod_{i=1}^N P(z^{(i)} \mid x^{(i)},\theta^{(t)}) z(1),z(2),⋯,z(N)∑​i=1∑N​logP(x(i),z(i)∣θ)⋅i=1∏N​P(z(i)∣x(i),θ(t)) 关于 ∑ z ( 1 ) , z ( 2 ) , ⋯   , z ( N ) \sum_{z^{(1)},z^{(2)},\cdots,z^{(N)}} ∑z(1),z(2),⋯,z(N)​这个形式 需要解释一下,我们之前并没有讨论过 z ( i ) ( i = 1 , 2 , ⋯   , N ) z^{(i)}(i=1,2,\cdots,N) z(i)(i=1,2,⋯,N)到底是什么,只是知道每个样本下 x ( i ) x^{(i)} x(i)均对应一个隐变量 z ( i ) z^{(i)} z(i)。

    z ( i ) z^{(i)} z(i)不是一个具体数值,而是一个向量。它表示样本 x ( i ) x^{(i)} x(i)“可能属于的高斯分布”所组成的向量。示例: 依然假设样本空间内一共包含 K \mathcal K K个高斯分布,样本 x ( 1 ) x^{(1)} x(1)对应的隐变量 z ( 1 ) z^{(1)} z(1)表示如下: z ( 1 ) = ( z 1 ( 1 ) , z 2 ( 1 ) , ⋯   , z K ( 1 ) ) T z^{(1)} = (z_1^{(1)},z_2^{(1)},\cdots,z_{\mathcal K}^{(1)})^{T} z(1)=(z1(1)​,z2(1)​,⋯,zK(1)​)T 其中, z k ( 1 ) ( k = 1 , 2 , ⋯   , K ) z_{k}^{(1)}(k=1,2,\cdots,\mathcal K) zk(1)​(k=1,2,⋯,K)表示 样本 x ( 1 ) x^{(1)} x(1)可能属于编号为 k k k的高斯分布。注意, z k ( 1 ) z_k^{(1)} zk(1)​只表示高斯分布的编号(或者称为离散参数),它不表示概率。 它是如何表示概率的?结果如下表:

    z ( 1 ) z^{(1)} z(1) z 1 ( 1 ) z_1^{(1)} z1(1)​ z 2 ( 1 ) z_2^{(1)} z2(1)​ ⋯ \cdots ⋯ z K ( 1 ) z_{\mathcal K}^{(1)} zK(1)​ P ( z ( 1 ) ) P(z^{(1)}) P(z(1)) p 1 ( 1 ) p_1^{(1)} p1(1)​ p 2 ( 1 ) p_2^{(1)} p2(1)​ ⋯ \cdots ⋯ p K ( 1 ) p_{\mathcal K}^{(1)} pK(1)​

    p j ( i ) p_j^{(i)} pj(i)​是样本点 x ( i ) x^{(i)} x(i)指向编号为 z j z_j zj​的隐变量对应的高斯分布 N ( μ j , Σ j ) \mathcal N(\mu_j,\Sigma_j) N(μj​,Σj​)的概率,而 P ( z ( i ) ) P(z^{(i)}) P(z(i))表示 K \mathcal K K个概率结果组成的向量。用数学语言表达即: p j ( i ) = P ( x ( i ) → z j ) = P ( x ( i ) ∈ N ( μ j , Σ j ) ) P ( z ( i ) ) = ( p 1 ( i ) , p 2 ( i ) , ⋯   , p K ( i ) ) T p_j^{(i)} = P(x^{(i)} \to z_j) = P(x^{(i)} \in \mathcal N(\mu_j,\Sigma_j)) \\ P(z^{(i)}) = (p_1^{(i)},p_2^{(i)},\cdots,p_{\mathcal K}^{(i)}) ^{T} pj(i)​=P(x(i)→zj​)=P(x(i)∈N(μj​,Σj​))P(z(i))=(p1(i)​,p2(i)​,⋯,pK(i)​)T 同样存在这种现象的不仅仅是概率,还有均值、协方差:

    • μ z ( i ) \mu_{z^{(i)}} μz(i)​表示样本点 x ( i ) x^{(i)} x(i)对应在 K \mathcal K K个高斯分布上的期望结果组成的向量: μ z ( i ) = ( μ 1 ( i ) , μ 2 ( i ) , ⋯   , μ K ( i ) ) T \mu_{z^{(i)}} = (\mu_{1}^{(i)},\mu_2^{(i)}, \cdots, \mu_{\mathcal K}^{(i)})^{T} μz(i)​=(μ1(i)​,μ2(i)​,⋯,μK(i)​)T
    • Σ z ( i ) \Sigma_{z^{(i)}} Σz(i)​表示样本点 x ( i ) x^{(i)} x(i)对应在 K \mathcal K K个高斯分布上的协方差结果组成的向量: Σ z ( i ) = ( Σ 1 ( i ) , Σ 2 ( i ) , ⋯   , Σ K ( i ) ) T \Sigma_{z^{(i)}} = (\Sigma_{1}^{(i)},\Sigma_2^{(i)}, \cdots, \Sigma_{\mathcal K}^{(i)})^{T} Σz(i)​=(Σ1(i)​,Σ2(i)​,⋯,ΣK(i)​)T

    由于 ∑ i = 1 N log ⁡ P ( x ( i ) , z ( i ) ∣ θ ) \sum_{i=1}^N \log P(x^{(i)},z^{(i)}\mid \theta) ∑i=1N​logP(x(i),z(i)∣θ)中隐变量的形式是 z ( i ) ( i = 1 , 2 , ⋯   , N ) z^{(i)}(i=1,2,\cdots,N) z(i)(i=1,2,⋯,N)而不是 z j ( j = 1 , 2 , ⋯   , K ) z_j(j=1,2,\cdots,\mathcal K) zj​(j=1,2,⋯,K)因此对 ∑ Z \sum_{\mathcal Z} ∑Z​的展开不是 ∑ z 1 , z 2 , ⋯   , z K \sum_{z_1,z_2,\cdots,z_{\mathcal K}} ∑z1​,z2​,⋯,zK​​而是 ∑ z ( 1 ) , z ( 2 ) , ⋯   , z ( N ) \sum_{z^{(1)},z^{(2)},\cdots,z^{(N)}} ∑z(1),z(2),⋯,z(N)​。

  • 继续将 ∑ i = 1 N log ⁡ P ( x ( i ) , z ( i ) ∣ θ ) \sum_{i=1}^N \log P(x^{(i)},z^{(i)}\mid \theta) ∑i=1N​logP(x(i),z(i)∣θ)展开,展开结果如下: L ( θ , θ ( t ) ) = ∑ z ( 1 ) , z ( 2 ) , ⋯   , z ( N ) [ log ⁡ P ( x ( 1 ) , z ( 1 ) ∣ θ ) + P ( x ( 2 ) , z ( 2 ) ∣ θ ) + ⋯ + P ( x ( N ) , z ( N ) ∣ θ ) ] ⋅ ∏ i = 1 N P ( z ( i ) ∣ x ( i ) , θ ( t ) ) = [ ∑ z ( 1 ) , z ( 2 ) , ⋯   , z ( N ) log ⁡ P ( x ( 1 ) , z ( 1 ) ∣ θ ) ∏ i = 1 N P ( z ( i ) ∣ x ( i ) , θ ( t ) ) ] + ⋯ + [ ∑ z ( 1 ) , z ( 2 ) , ⋯   , z ( N ) log ⁡ P ( x ( N ) , z ( N ) ∣ θ ) ∏ i = 1 N P ( z ( i ) ∣ x ( i ) , θ ( t ) ) ] \begin{aligned} \mathcal L(\theta,\theta^{(t)}) & = \sum_{z^{(1)},z^{(2)},\cdots,z^{(N)}} \left[\log P(x^{(1)},z^{(1)} \mid \theta) + P(x^{(2)},z^{(2)} \mid \theta) + \cdots + P(x^{(N)},z^{(N)} \mid \theta)\right] \cdot \prod_{i=1}^N P(z^{(i)} \mid x^{(i)},\theta^{(t)}) \\ & = \left[\sum_{z^{(1)},z^{(2)},\cdots,z^{(N)}} \log P(x^{(1)},z^{(1)} \mid \theta) \prod_{i=1}^N P(z^{(i)} \mid x^{(i)},\theta^{(t)})\right] + \cdots + \left[\sum_{z^{(1)},z^{(2)},\cdots,z^{(N)}} \log P(x^{(N)},z^{(N)} \mid \theta) \prod_{i=1}^N P(z^{(i)} \mid x^{(i)},\theta^{(t)})\right] \end{aligned} L(θ,θ(t))​=z(1),z(2),⋯,z(N)∑​[logP(x(1),z(1)∣θ)+P(x(2),z(2)∣θ)+⋯+P(x(N),z(N)∣θ)]⋅i=1∏N​P(z(i)∣x(i),θ(t))=⎣ ⎡​z(1),z(2),⋯,z(N)∑​logP(x(1),z(1)∣θ)i=1∏N​P(z(i)∣x(i),θ(t))⎦ ⎤​+⋯+⎣ ⎡​z(1),z(2),⋯,z(N)∑​logP(x(N),z(N)∣θ)i=1∏N​P(z(i)∣x(i),θ(t))⎦ ⎤​​

  • 基于上述结果,仅观察第一项: ∑ z ( 1 ) , z ( 2 ) , ⋯   , z ( N ) log ⁡ P ( x ( 1 ) , z ( 1 ) ∣ θ ) ∏ i = 1 N P ( z ( i ) ∣ x ( i ) , θ ( t ) ) \sum_{z^{(1)},z^{(2)},\cdots,z^{(N)}} \log P(x^{(1)},z^{(1)} \mid \theta) \prod_{i=1}^N P(z^{(i)} \mid x^{(i)},\theta^{(t)}) z(1),z(2),⋯,z(N)∑​logP(x(1),z(1)∣θ)i=1∏N​P(z(i)∣x(i),θ(t)) 观察 ∏ i = 1 N P ( z ( i ) ∣ x ( i ) , θ ( t ) ) \prod_{i=1}^N P(z^{(i)} \mid x^{(i)},\theta^{(t)}) ∏i=1N​P(z(i)∣x(i),θ(t)),发现只有第一项 P ( z ( 1 ) ∣ x ( 1 ) , θ ( t ) ) P(z^{(1)} \mid x^{(1)},\theta^{(t)}) P(z(1)∣x(1),θ(t))和 z ( 1 ) z^{(1)} z(1)相关;因此,将上式表示为如下形式: ∑ z ( 1 ) , z ( 2 ) , ⋯   , z ( N ) log ⁡ P ( x ( 1 ) , z ( 1 ) ∣ θ ) ⋅ P ( z ( 1 ) ∣ x ( 1 ) , θ ( t ) ) ∏ i = 2 N P ( z ( i ) ∣ x ( i ) , θ ( t ) ) = ∑ z ( 1 ) [ log ⁡ P ( x ( 1 ) , z ( 1 ) ∣ θ ) ⋅ P ( z ( 1 ) ∣ x ( 1 ) , θ ( t ) ) ] ⋅ ∑ z ( 2 ) , ⋯   , z ( N ) [ ∏ i = 2 N P ( z ( i ) ∣ x ( i ) , θ ( t ) ) ] \begin{aligned} & \sum_{z^{(1)},z^{(2)},\cdots,z^{(N)}} \log P(x^{(1)},z^{(1)} \mid \theta) \cdot P(z^{(1)} \mid x^{(1)},\theta^{(t)}) \prod_{i=2}^N P(z^{(i)} \mid x^{(i)},\theta^{(t)}) \\ & = \sum_{z^{(1)}} \left[ \log P(x^{(1)},z^{(1)} \mid \theta) \cdot P(z^{(1)} \mid x^{(1)},\theta^{(t)})\right] \cdot \sum_{z^{(2)},\cdots,z^{(N)}} \left[\prod_{i=2}^N P(z^{(i)} \mid x^{(i)},\theta^{(t)})\right] \end{aligned} ​z(1),z(2),⋯,z(N)∑​logP(x(1),z(1)∣θ)⋅P(z(1)∣x(1),θ(t))i=2∏N​P(z(i)∣x(i),θ(t))=z(1)∑​[logP(x(1),z(1)∣θ)⋅P(z(1)∣x(1),θ(t))]⋅z(2),⋯,z(N)∑​[i=2∏N​P(z(i)∣x(i),θ(t))]​ 观察 ∑ z ( 2 ) , ⋯   , z ( N ) [ ∏ i = 2 N P ( z ( i ) ∣ x ( i ) , θ ( t ) ) ] \sum_{z^{(2)},\cdots,z^{(N)}} \left[\prod_{i=2}^N P(z^{(i)} \mid x^{(i)},\theta^{(t)})\right] ∑z(2),⋯,z(N)​[∏i=2N​P(z(i)∣x(i),θ(t))],它可以展开成如下形式: ∑ z ( 2 ) , ⋯   , z ( N ) [ ∏ i = 2 N P ( z ( i ) ∣ x ( i ) , θ ( t ) ) ] = ∑ z ( 2 ) , ⋯   , z ( N ) [ P ( z ( 2 ) ∣ x ( 2 ) , θ ( t ) ) × ⋯ × P ( z ( N ) ∣ x ( N ) , θ ( t ) ) ] = ∑ z ( 2 ) P ( z ( 2 ) ∣ x ( 2 ) , θ ( t ) ) × ⋯ × ∑ z ( N ) P ( z ( N ) ∣ x ( N ) , θ ( t ) ) \begin{aligned} \sum_{z^{(2)},\cdots,z^{(N)}} \left[\prod_{i=2}^N P(z^{(i)} \mid x^{(i)},\theta^{(t)})\right] & = \sum_{z^{(2),\cdots,z^{(N)}}} \left[P(z^{(2)} \mid x^{(2)},\theta^{(t)}) \times \cdots \times P(z^{(N)} \mid x^{(N)},\theta^{(t)})\right] \\ & = \sum_{z^{(2)}} P(z^{(2)} \mid x^{(2)},\theta^{(t)}) \times \cdots \times \sum_{z^{(N)}}P(z^{(N)} \mid x^{(N)},\theta^{(t)}) \end{aligned} z(2),⋯,z(N)∑​[i=2∏N​P(z(i)∣x(i),θ(t))]​=z(2),⋯,z(N)∑​[P(z(2)∣x(2),θ(t))×⋯×P(z(N)∣x(N),θ(t))]=z(2)∑​P(z(2)∣x(2),θ(t))×⋯×z(N)∑​P(z(N)∣x(N),θ(t))​ 上述结果的任意一项 ∑ z ( j ) P ( z ( j ) ∣ x ( j ) , θ ( t ) ) ( j = 2 , ⋯   , N ) \sum_{z^{(j)}} P(z^{(j)} \mid x^{(j)},\theta^{(t)}) (j=2,\cdots,N) ∑z(j)​P(z(j)∣x(j),θ(t))(j=2,⋯,N)都是 基于离散型变量的概率密度积分,因此则有: ∑ z ( 2 ) P ( z ( 2 ) ∣ x ( 2 ) , θ ( t ) ) = ⋯ = ∑ z ( N ) P ( z ( N ) ∣ x ( N ) , θ ( t ) ) = 1 ∑ z ( 2 ) , ⋯   , z ( N ) [ ∏ i = 2 N P ( z ( i ) ∣ x ( i ) , θ ( t ) ) ] = 1 × ⋯ × 1 = 1 \sum_{z^{(2)}} P(z^{(2)} \mid x^{(2)},\theta^{(t)}) = \cdots =\sum_{z^{(N)}} P(z^{(N)} \mid x^{(N)},\theta^{(t)}) = 1 \\ \sum_{z^{(2)},\cdots,z^{(N)}} \left[\prod_{i=2}^N P(z^{(i)} \mid x^{(i)},\theta^{(t)})\right] = 1 \times \cdots \times 1 = 1 z(2)∑​P(z(2)∣x(2),θ(t))=⋯=z(N)∑​P(z(N)∣x(N),θ(t))=1z(2),⋯,z(N)∑​[i=2∏N​P(z(i)∣x(i),θ(t))]=1×⋯×1=1 因此,被观察的第一项 结果如下: ∑ z ( 1 ) , z ( 2 ) , ⋯   , z ( N ) log ⁡ P ( x ( 1 ) , z ( 1 ) ∣ θ ) ∏ i = 1 N P ( z ( i ) ∣ x ( i ) , θ ( t ) ) = ∑ z ( 1 ) [ log ⁡ P ( x ( 1 ) , z ( 1 ) ∣ θ ) ⋅ P ( z ( 1 ) ∣ x ( 1 ) , θ ( t ) ) ] ⋅ 1 = ∑ z ( 1 ) [ log ⁡ P ( x ( 1 ) , z ( 1 ) ∣ θ ) ⋅ P ( z ( 1 ) ∣ x ( 1 ) , θ ( t ) ) ] \begin{aligned} & \sum_{z^{(1)},z^{(2)},\cdots,z^{(N)}} \log P(x^{(1)},z^{(1)} \mid \theta) \prod_{i=1}^N P(z^{(i)} \mid x^{(i)},\theta^{(t)}) \\ & = \sum_{z^{(1)}} \left[ \log P(x^{(1)},z^{(1)} \mid \theta) \cdot P(z^{(1)} \mid x^{(1)},\theta^{(t)})\right] \cdot 1 \\ & = \sum_{z^{(1)}} \left[ \log P(x^{(1)},z^{(1)} \mid \theta) \cdot P(z^{(1)} \mid x^{(1)},\theta^{(t)})\right] \end{aligned} ​z(1),z(2),⋯,z(N)∑​logP(x(1),z(1)∣θ)i=1∏N​P(z(i)∣x(i),θ(t))=z(1)∑​[logP(x(1),z(1)∣θ)⋅P(z(1)∣x(1),θ(t))]⋅1=z(1)∑​[logP(x(1),z(1)∣θ)⋅P(z(1)∣x(1),θ(t))]​

  • 基于上一步骤, L ( θ , θ ( t ) ) \mathcal L(\theta,\theta^{(t)}) L(θ,θ(t))可表示为如下形式: L ( θ , θ ( t ) ) = ∑ z ( 1 ) [ log ⁡ P ( x ( 1 ) , z ( 1 ) ∣ θ ) ⋅ P ( z ( 1 ) ∣ x ( 1 ) , θ ( t ) ) ] + ⋯ + ∑ z ( N ) [ log ⁡ P ( x ( N ) , z ( N ) ∣ θ ) ⋅ P ( z ( N ) ∣ x ( N ) , θ ( t ) ) ] = ∑ i = 1 N ∑ z ( i ) [ log ⁡ P ( x ( i ) , z ( i ) ∣ θ ) ⋅ P ( z ( i ) ∣ x ( i ) , θ ( t ) ) ] \begin{aligned} \mathcal L(\theta,\theta^{(t)}) & = \sum_{z^{(1)}} \left[\log P(x^{(1)},z^{(1)} \mid \theta) \cdot P(z^{(1)} \mid x^{(1)},\theta^{(t)})\right] + \cdots + \sum_{z^{(N)}} \left[\log P(x^{(N)},z^{(N)} \mid \theta) \cdot P(z^{(N)} \mid x^{(N)},\theta^{(t)})\right] \\ & = \sum_{i=1}^N \sum_{z^{(i)}}\left[\log P(x^{(i)},z^{(i)} \mid \theta) \cdot P(z^{(i)} \mid x^{(i)},\theta^{(t)})\right] \end{aligned} L(θ,θ(t))​=z(1)∑​[logP(x(1),z(1)∣θ)⋅P(z(1)∣x(1),θ(t))]+⋯+z(N)∑​[logP(x(N),z(N)∣θ)⋅P(z(N)∣x(N),θ(t))]=i=1∑N​z(i)∑​[logP(x(i),z(i)∣θ)⋅P(z(i)∣x(i),θ(t))]​ 将场景整理中的对应结果代入,有: 关于 P ( z ( i ) ) , μ z ( i ) , Σ z ( i ) P(z^{(i)}),\mu_{z^{(i)}},\Sigma_{z^{(i)}} P(z(i)),μz(i)​,Σz(i)​详见上面黄色字解释。 L ( θ , θ ( t ) ) = ∑ i = 1 N ∑ z ( i ) log ⁡ P ( z ( i ) ) ⋅ N ( x ( i ) ∣ μ z ( i ) , Σ z ( i ) ) ⋅ P ( z ( i ) ) ⋅ N ( x ( i ) ∣ μ z ( i ) , Σ z ( i ) ) ∑ k = 1 K p k ⋅ N ( x ( i ) ∣ μ k , Σ k ) \mathcal L(\theta,\theta^{(t)}) = \sum_{i=1}^N \sum_{z^{(i)}} \log P(z^{(i)}) \cdot \mathcal N(x^{(i)} \mid \mu_{z^{(i)}},\Sigma_{z^{(i)}}) \cdot \frac{P(z^{(i)}) \cdot \mathcal N(x^{(i)} \mid \mu_{z^{(i)}},\Sigma_{z^{(i)}})}{\sum_{k=1}^{\mathcal K} p_k \cdot \mathcal N(x^{(i)} \mid \mu_k,\Sigma_k)} L(θ,θ(t))=i=1∑N​z(i)∑​logP(z(i))⋅N(x(i)∣μz(i)​,Σz(i)​)⋅∑k=1K​pk​⋅N(x(i)∣μk​,Σk​)P(z(i))⋅N(x(i)∣μz(i)​,Σz(i)​)​

至此,使用EM算法对高斯混合模型求解过程的E步求解完毕,下一节将介绍M步的求解过程。

p.s.这节视频中符号表示的信息确实很复杂,要多想~

相关参考: 机器学习-高斯混合模型(3) -EM求解-E-step

关注
打赏
1664446683
查看更多评论
立即登录/注册

微信扫码登录

0.0688s