- 引言
- 回顾:指数族分布
- 从概率密度积分角度观察充分统计量与模型参数间的联系
- 从极大似然估计角度观察充分统计量与模型参数间的联系
- 总结
 
在指数族分布介绍中提到了充分统计量这个概念,并且介绍了如果一个指数族分布已知充分统计量,就可以基于该统计量得到完整的概率分布表达形式。本节将从概率密度积分和极大似然估计的角度介绍 如何通过充分统计量 ϕ ( x ) \phi(x) ϕ(x)求解概率分布 P ( x ∣ η ) P(x\mid \eta) P(x∣η)中的模型参数 η \eta η.。
回顾:指数族分布指数族分布的一般式表达如下: P ( x ∣ η ) = h ( x ) e η T ϕ ( x ) − A ( η ) P(x \mid \eta) = h(x) e^{\eta^{T}\phi(x) - A(\eta)} P(x∣η)=h(x)eηTϕ(x)−A(η) 其中, η \eta η表示概率模型/概率分布 P ( x ∣ η ) P(x \mid \eta) P(x∣η)的参数; ϕ ( x ) \phi(x) ϕ(x)表示样本的充分统计量,它本质上是关于样本 x x x的函数; A ( η ) A(\eta) A(η)表示对数配分函数。
从概率密度积分角度观察充分统计量与模型参数间的联系观察上式, P ( x ∣ η ) P(x\mid \eta) P(x∣η)本质上是关于样本 x x x的概率分布,则 P ( x ∣ η ) P(x \mid \eta) P(x∣η)的概率密度积分结果等于1。即: ∫ x P ( x ∣ η ) d x = 1 \int_{x} P(x \mid \eta) dx = 1 ∫xP(x∣η)dx=1
将指数族分布一般式带入上式,则有: ∫ x h ( x ) e η T ϕ ( x ) − A ( η ) d x = 1 ∫ x h ( x ) e η T ϕ ( x ) e A ( η ) d x = 1 \int_{x} h(x) e^{\eta^{T}\phi(x) - A(\eta)}dx = 1 \\ \int_{x} \frac{h(x) e^{\eta^{T}\phi(x)}}{e^{A(\eta)}}dx = 1 ∫xh(x)eηTϕ(x)−A(η)dx=1∫xeA(η)h(x)eηTϕ(x)dx=1
由于 e A ( η ) e^{A(\eta)} eA(η)中不含 x x x,上式可转化为: ∫ x h ( x ) e η T ϕ ( x ) d x e A ( η ) = 1 e A ( η ) = ∫ x h ( x ) e η T ϕ ( x ) d x \frac{\int_{x}h(x) e^{\eta^{T}\phi(x)} dx}{e^{A(\eta)}} = 1\\ e^{A(\eta)} = \int_{x}h(x)e^{\eta^{T}\phi(x)}dx eA(η)∫xh(x)eηTϕ(x)dx=1eA(η)=∫xh(x)eηTϕ(x)dx
基于该式,观察对数配分函数 A ( η ) A(\eta) A(η)与充分统计量 ϕ ( x ) \phi(x) ϕ(x)之间的联系。上述等式两端对 η \eta η求导:
- 等式左端: ∂ e A ( η ) ∂ η = e A ( η ) ⋅ A ′ ( η ) \frac{\partial e^{A(\eta)}}{\partial \eta} = e^{A(\eta)}\cdot A'(\eta) ∂η∂eA(η)=eA(η)⋅A′(η)
- 等式右端: 牛顿-莱布尼兹公式,将偏导提到积分号内部;积分号内部公式只有η T \eta^{T} ηT和η \eta η相关。 ∂ ∫ x h ( x ) e η T ϕ ( x ) ∂ η = ∫ x h ( x ) e η T ϕ ( x ) ⋅ ϕ ( x ) d x \frac{\partial \int_{x}h(x)e^{\eta^{T}\phi(x)}}{\partial \eta} = \int_{x}h(x)e^{\eta^{T}\phi(x)}\cdot\phi(x)dx ∂η∂∫xh(x)eηTϕ(x)=∫xh(x)eηTϕ(x)⋅ϕ(x)dx
最终有: e A ( η ) ⋅ A ′ ( η ) = ∫ x h ( x ) e η T ϕ ( x ) ⋅ ϕ ( x ) d x A ′ ( η ) = ∫ x h ( x ) e η T ϕ ( x ) ⋅ ϕ ( x ) d x e A ( η ) e^{A(\eta)}\cdot A'(\eta) = \int_{x}h(x)e^{\eta^{T}\phi(x)}\cdot\phi(x)dx \\ A'(\eta) = \frac{\int_{x}h(x)e^{\eta^{T}\phi(x)}\cdot\phi(x)dx}{e^{A(\eta)}} eA(η)⋅A′(η)=∫xh(x)eηTϕ(x)⋅ϕ(x)dxA′(η)=eA(η)∫xh(x)eηTϕ(x)⋅ϕ(x)dx
由于 
     
      
       
        
        
          e 
         
         
         
           A 
          
         
           ( 
          
         
           η 
          
         
           ) 
          
         
        
       
      
        e^{A(\eta)} 
       
      
    eA(η)与 
     
      
       
       
         x 
        
       
      
        x 
       
      
    x无关,因此可以直接加入到积分号内部。  
     
      
       
        
         
          
           
           
             1 
            
            
            
              e 
             
             
             
               A 
              
             
               ( 
              
             
               η 
              
             
               ) 
              
             
            
           
          
         
        
       
      
        \begin{aligned}\frac{1}{e^{A(\eta)}}\end{aligned} 
       
      
    eA(η)1在对 
     
      
       
       
         x 
        
       
      
        x 
       
      
    x的积分式子中视为常数。  
      
       
        
         
          
           
            
             
             
               A 
              
             
               ′ 
              
             
            
              ( 
             
            
              η 
             
            
              ) 
             
            
           
          
          
           
            
             
            
              = 
             
             
             
               ∫ 
              
             
               x 
              
             
             
             
               1 
              
              
              
                e 
               
               
               
                 A 
                
               
                 ( 
                
               
                 η 
                
               
                 ) 
                
               
              
             
            
              ⋅ 
             
            
              h 
             
            
              ( 
             
            
              x 
             
            
              ) 
             
             
             
               e 
              
              
               
               
                 η 
                
               
                 T 
                
               
              
                ϕ 
               
              
                ( 
               
              
                x 
               
              
                ) 
               
              
             
            
              ⋅ 
             
            
              ϕ 
             
            
              ( 
             
            
              x 
             
            
              ) 
             
            
              d 
             
            
              x 
             
            
           
          
         
         
          
           
            
           
          
          
           
            
             
            
              = 
             
             
             
               ∫ 
              
             
               x 
              
             
            
              h 
             
            
              ( 
             
            
              x 
             
            
              ) 
             
             
             
               e 
              
              
               
               
                 η 
                
               
                 T 
                
               
               
               
                 [ 
                
               
                 ϕ 
                
               
                 ( 
                
               
                 x 
                
               
                 ) 
                
               
                 − 
                
               
                 A 
                
               
                 ( 
                
               
                 η 
                
               
                 ) 
                
               
                 ] 
                
               
              
             
            
              ⋅ 
             
            
              ϕ 
             
            
              ( 
             
            
              x 
             
            
              ) 
             
            
              d 
             
            
              x 
             
            
           
          
         
        
       
         \begin{aligned} A'(\eta) & = \int_{x} \frac{1}{e^{A(\eta)}}\cdot h(x)e^{\eta^{T}\phi(x)}\cdot\phi(x)dx \\ & = \int_{x} h(x) e^{\eta^{T} \left[\phi(x)- A(\eta) \right]}\cdot\phi(x)dx \end{aligned} 
        
       
     A′(η)=∫xeA(η)1⋅h(x)eηTϕ(x)⋅ϕ(x)dx=∫xh(x)eηT[ϕ(x)−A(η)]⋅ϕ(x)dx
观察上式,积分号中的 h ( x ) e η T [ ϕ ( x ) − A ( η ) ] h(x) e^{\eta^{T} [\phi(x)- A(\eta)]} h(x)eηT[ϕ(x)−A(η)]就是概率分布 P ( x ∣ η ) P(x \mid \eta) P(x∣η)的一般式形式。因此,使用 P ( x ∣ η ) P(x \mid \eta) P(x∣η)进行替换: A ′ ( η ) = ∫ x P ( x ∣ η ) ⋅ ϕ ( x ) d x A'(\eta) = \int_{x} P(x \mid \eta)\cdot \phi(x) dx A′(η)=∫xP(x∣η)⋅ϕ(x)dx 可以将该式写成期望形式: A ′ ( η ) = E p ( x ∣ η ) [ ϕ ( x ) ] A'(\eta) = \mathbb E_{p(x\mid \eta)}[\phi(x)] A′(η)=Ep(x∣η)[ϕ(x)]
至此,我们发现对数配分函数的一阶导函数与充分统计量之间的关联关系。 实际上,我们已经找到了概率模型 
     
      
       
       
         P 
        
       
         ( 
        
       
         x 
        
       
         ∣ 
        
       
         η 
        
       
         ) 
        
       
      
        P(x \mid \eta) 
       
      
    P(x∣η)中的模型参数 
     
      
       
       
         η 
        
       
      
        \eta 
       
      
    η与充分统计量 
     
      
       
       
         ϕ 
        
       
         ( 
        
       
         x 
        
       
         ) 
        
       
      
        \phi(x) 
       
      
    ϕ(x)之间的联系: 其中 
     
      
       
        
        
          A 
         
         
         
           ′ 
          
          
          
            ( 
           
          
            − 
           
          
            1 
           
          
            ) 
           
          
         
        
       
         ( 
        
       
         η 
        
       
         ) 
        
       
      
        A'^{(-1)}(\eta) 
       
      
    A′(−1)(η)表示 
     
      
       
        
        
          A 
         
        
          ′ 
         
        
       
         ( 
        
       
         η 
        
       
         ) 
        
       
      
        A'(\eta) 
       
      
    A′(η)的反函数。  
      
       
        
        
          η 
         
        
          = 
         
         
         
           A 
          
          
          
            ′ 
           
           
           
             ( 
            
           
             − 
            
           
             1 
            
           
             ) 
            
           
          
         
        
          ( 
         
        
          η 
         
        
          ) 
         
        
          = 
         
         
         
           E 
          
          
          
            P 
           
          
            ( 
           
          
            x 
           
          
            ∣ 
           
          
            η 
           
          
            ) 
           
          
          
          
            ( 
           
          
            − 
           
          
            1 
           
          
            ) 
           
          
         
        
          [ 
         
        
          ϕ 
         
        
          ( 
         
        
          x 
         
        
          ) 
         
        
          ] 
         
        
       
         \eta = A'^{(-1)}(\eta) = \mathbb E^{(-1)}_{P(x \mid \eta)}[\phi(x)] 
        
       
     η=A′(−1)(η)=EP(x∣η)(−1)[ϕ(x)]
下面从样本极大似然估计的角度观察似然结果最大的概率模型参数 η M L E \eta_{MLE} ηMLE与充分统计量 ϕ ( x ) \phi(x) ϕ(x)之间的联系。
从极大似然估计角度观察充分统计量与模型参数间的联系- 符号定义:假设数据集合 X \mathcal X X中包含 N N N个样本: X = { x ( 1 ) , x ( 2 ) , ⋯ , x ( N ) } \mathcal X = \{x^{(1)},x^{(2)},\cdots,x^{(N)}\} X={x(1),x(2),⋯,x(N)}
基于极大似然估计的定义,极大似然估计方法求解最优模型参数 
     
      
       
        
        
          η 
         
         
         
           M 
          
         
           L 
          
         
           E 
          
         
        
       
      
        \eta_{MLE} 
       
      
    ηMLE表示如下:  
     
      
       
       
         P 
        
       
      
        P 
       
      
    P表示概率分布, 
     
      
       
       
         p 
        
       
      
        p 
       
      
    p表示概率密度函数。  
      
       
        
         
          
           
            
            
              η 
             
             
             
               M 
              
             
               L 
              
             
               E 
              
             
            
           
          
          
           
            
             
            
              = 
             
             
              
              
                arg 
               
              
                 
               
              
                max 
               
              
                 
               
              
             
               η 
              
             
            
              log 
             
            
               
             
            
              P 
             
            
              ( 
             
            
              X 
             
            
              ∣ 
             
            
              η 
             
            
              ) 
             
            
           
          
         
         
          
           
            
           
          
          
           
            
             
            
              = 
             
             
              
              
                arg 
               
              
                 
               
              
                max 
               
              
                 
               
              
             
               η 
              
             
             
             
               ∏ 
              
              
               
               
                 x 
                
                
                
                  ( 
                 
                
                  i 
                 
                
                  ) 
                 
                
               
              
                ∈ 
               
              
                X 
               
              
             
            
              p 
             
            
              ( 
             
             
             
               x 
              
              
              
                ( 
               
              
                i 
               
              
                ) 
               
              
             
            
              ∣ 
             
            
              η 
             
            
              ) 
             
            
           
          
         
         
          
           
            
           
          
          
           
            
             
            
              = 
             
             
              
              
                arg 
               
              
                 
               
              
                max 
               
              
                 
               
              
             
               η 
              
             
             
             
               ∑ 
              
              
               
               
                 x 
                
                
                
                  ( 
                 
                
                  i 
                 
                
                  ) 
                 
                
               
              
                ∈ 
               
              
                X 
               
              
             
            
              log 
             
            
               
             
            
              p 
             
            
              ( 
             
             
             
               x 
              
              
              
                ( 
               
              
                i 
               
              
                ) 
               
              
             
            
              ∣ 
             
            
              η 
             
            
              ) 
             
            
           
          
         
        
       
         \begin{aligned} \eta_{MLE} & = \mathop{\arg\max}\limits_{\eta} \log P(\mathcal X \mid \eta) \\ & = \mathop{\arg\max}\limits_{\eta} \prod_{x^{(i)} \in \mathcal X} p(x^{(i)} \mid \eta) \\ & = \mathop{\arg\max}\limits_{\eta} \sum_{x^{(i)} \in \mathcal X} \log p(x^{(i)} \mid \eta) \end{aligned} 
        
       
     ηMLE=ηargmaxlogP(X∣η)=ηargmaxx(i)∈X∏p(x(i)∣η)=ηargmaxx(i)∈X∑logp(x(i)∣η)
将指数族分布一般式带入: arg  max  η ∑ x ( i ) ∈ X log  [ h ( x ( i ) ) e η T ϕ ( x ( i ) ) − A ( η ) ] \begin{aligned} \mathop{\arg\max}\limits_{\eta} \sum_{x^{(i)} \in \mathcal X}\log \left[h(x^{(i)}) e^{\eta^{T} \phi(x^{(i)}) -A(\eta)}\right] \end{aligned} ηargmaxx(i)∈X∑log[h(x(i))eηTϕ(x(i))−A(η)]
将公式展开, log  \log log带进公式: arg  max  η ∑ x ( i ) ∈ X [ log  h ( x ( i ) ) + η T ϕ ( x ( i ) ) − A ( η ) ] \mathop{\arg\max}\limits_{\eta} \sum_{x^{(i)} \in \mathcal X}\left[\log h(x^{(i)}) + \eta^{T}\phi(x^{(i)}) - A(\eta)\right] ηargmaxx(i)∈X∑[logh(x(i))+ηTϕ(x(i))−A(η)]
由于求解关于 η \eta η的最优值,因此 log  h ( x ( i ) ) \log h(x^{(i)}) logh(x(i))与 η \eta η无关。最终将公式化简为: η M L E = arg  max  η ∑ x ( i ) ∈ X [ η T ϕ ( x ( i ) ) − A ( η ) ] \eta_{MLE} =\mathop{\arg\max}\limits_{\eta} \sum_{x^{(i)} \in \mathcal X}\left[\eta^{T}\phi(x^{(i)}) - A(\eta)\right] ηMLE=ηargmaxx(i)∈X∑[ηTϕ(x(i))−A(η)]
为了求解最优值 
     
      
       
        
        
          η 
         
         
         
           M 
          
         
           L 
          
         
           E 
          
         
        
       
      
        \eta_{MLE} 
       
      
    ηMLE,我们对上述公式对 
     
      
       
       
         η 
        
       
      
        \eta 
       
      
    η进行求导: 离散条件下的牛顿莱布尼兹公式。  
      
       
        
         
          
           
            
             
             
               ∂ 
              
              
              
                ∑ 
               
               
                
                
                  x 
                 
                 
                 
                   ( 
                  
                 
                   i 
                  
                 
                   ) 
                  
                 
                
               
                 ∈ 
                
               
                 X 
                
               
              
              
              
                [ 
               
               
               
                 η 
                
               
                 T 
                
               
              
                ϕ 
               
              
                ( 
               
               
               
                 x 
                
                
                
                  ( 
                 
                
                  i 
                 
                
                  ) 
                 
                
               
              
                ) 
               
              
                − 
               
              
                A 
               
              
                ( 
               
              
                η 
               
              
                ) 
               
              
                ] 
               
              
             
             
             
               ∂ 
              
             
               η 
              
             
            
           
          
          
           
            
             
            
              = 
             
             
             
               ∑ 
              
              
               
               
                 x 
                
                
                
                  ( 
                 
                
                  i 
                 
                
                  ) 
                 
                
               
              
                ∈ 
               
              
                X 
               
              
             
             
              
              
                ∂ 
               
              
                [ 
               
               
               
                 η 
                
               
                 T 
                
               
              
                ϕ 
               
              
                ( 
               
               
               
                 x 
                
                
                
                  ( 
                 
                
                  i 
                 
                
                  ) 
                 
                
               
              
                ) 
               
              
                − 
               
              
                A 
               
              
                ( 
               
              
                η 
               
              
                ) 
               
              
                ] 
               
              
              
              
                ∂ 
               
              
                η 
               
              
             
            
           
          
         
         
          
           
            
           
          
          
           
            
             
            
              = 
             
             
             
               ∑ 
              
              
               
               
                 x 
                
                
                
                  ( 
                 
                
                  i 
                 
                
                  ) 
                 
                
               
              
                ∈ 
               
              
                X 
               
              
             
            
              ϕ 
             
            
              ( 
             
             
             
               x 
              
              
              
                ( 
               
              
                i 
               
              
                ) 
               
              
             
            
              ) 
             
            
              − 
             
             
             
               ∑ 
              
              
               
               
                 x 
                
                
                
                  ( 
                 
                
                  i 
                 
                
                  ) 
                 
                
               
              
                ∈ 
               
              
                X 
               
              
             
             
             
               A 
              
             
               ′ 
              
             
            
              ( 
             
            
              η 
             
            
              ) 
             
            
           
          
         
        
       
         \begin{aligned} \frac{\partial \sum_{x^{(i)} \in \mathcal X}\left[\eta^{T}\phi(x^{(i)}) - A(\eta)\right]}{\partial \eta} & = \sum_{x^{(i)} \in \mathcal X} \frac{\partial [\eta^{T}\phi(x^{(i)}) - A(\eta)]}{\partial \eta} \\ & = \sum_{x^{(i)} \in \mathcal X}\phi(x^{(i)}) - \sum_{x^{(i)} \in \mathcal X}A'(\eta) \end{aligned} 
        
       
     ∂η∂∑x(i)∈X[ηTϕ(x(i))−A(η)]=x(i)∈X∑∂η∂[ηTϕ(x(i))−A(η)]=x(i)∈X∑ϕ(x(i))−x(i)∈X∑A′(η)
由于 A ′ ( η ) A'(\eta) A′(η)与 i i i无关,因此上式转化为: ∑ x ( i ) ∈ X ϕ ( x ( i ) ) − N ⋅ A ′ ( η ) \sum_{x^{(i)} \in \mathcal X}\phi(x^{(i)}) - N\cdot A'(\eta) x(i)∈X∑ϕ(x(i))−N⋅A′(η)
令 ∂ ∑ x ( i ) ∈ X [ η T ϕ ( x ( i ) ) − A ( η ) ] ∂ η ≜ 0 \begin{aligned}\frac{\partial \sum_{x^{(i)} \in \mathcal X}\left[\eta^{T}\phi(x^{(i)}) - A(\eta)\right]}{\partial \eta} \triangleq 0\end{aligned} ∂η∂∑x(i)∈X[ηTϕ(x(i))−A(η)]≜0,有: A ′ ( η M L E ) = 1 N ∑ x ( i ) ∈ X ϕ ( x ( i ) ) η M L E = A ′ ( − 1 ) ( η M L E ) A'(\eta_{MLE}) = \frac{1}{N}\sum_{x^{(i)} \in \mathcal X} \phi(x^{(i)}) \\ \eta_{MLE} = A'^{(-1)}(\eta_{MLE}) A′(ηMLE)=N1x(i)∈X∑ϕ(x(i))ηMLE=A′(−1)(ηMLE)
总结无论是通过概率密度积分角度直接观察 A ′ ( η ) A'(\eta) A′(η)和 ϕ ( x ) \phi(x) ϕ(x)之间关系的方式还是通过极大似然估计方式求解最优模型参数 η M L E \eta_{MLE} ηMLE,都能发现求解 η \eta η最关键的因素就是充分统计量。
这进一步验证了指数族分布中如果已知充分统计量,我们就可以对 概率分布进行完整估计。
相关参考: 机器学习-白板推导系列(八)-指数族分布(Exponential Family Distribution)

 
                 
    