This is my notes for Amortized Inference and Learning in Latent Conditional Random Fields for Weakly-Supervised Semantic Image Segmentation.

## Introduction

CRF属于判别式的图模型，通常被用来标注或分析序列资料。2014年左右的时候，CRF也被用来做图像分割。但是随着深度学习的兴起，CRF慢慢就沦为了进行post-precessing的工具，以优化结果。一般的做法就是用分割网络输出的pixel-level的各类的概率分布作为CRF的unary potential，然后用Efficient inference in fully connected CRFs with Gaussian edge potentials里面的方法来设置pairwise potential。

## The proposed model

SymbolDescriptionNote
$x^{(i)}$第$i$幅图像$1\le i \le n$
$y^{(i)}$第$i$幅图像对应的image-level标签Each $y^{(i)}$ is a boolean vector whose length equals the number of classes used for training.
$z^{(i)}=(z^{(i)}_j)$第$i$幅图像对应pixel-level标签$1\le j\le m$, and we use one-hot encoding for $z^{(i)}_j$

$$p(y|x) = \sum_z p(z|x)p(y|z,x)$$

$$p(z|x) \propto exp \left(-\sum_{j<j’} k(t_j, t_{j’})\mu (z_j, z_{j’}))\right)$$

### Variational Lower Bound

\begin{align} \log p(y|x) & = \log \sum _z p(z|x) p(y|z,x) \\ & = \log \sum _z q(z|y,x) \frac{p(y|z,x)p(z|x)}{q(z|y,x)} \\ & \ge \sum _z q(z|y,x) \log \frac{p(y|z,x)p(z|x)}{q(z|y,x)} \\ & = -KL(q(z|y,x)||p(z|x)) + E_{q(z|x,y)} \log p(y|z,x) \end{align}