Latent variable models
e.g. p(x|z) is conditional normal
p(z) is normal
z is a Categorial Discrete RV (1-3) corresponding to the clusters
x is a Continuous 2D (mean and variance) RV
(1) Reason
Since we need to take the integral over all the gaussian mixture elements for each gradient step, it is completely intractable and has bad properties.
Guess which group $z$ the data point $x_i$ is in, and pretend it’s the right one
Law of Total Probability
integral over z → expectation
We wish to take the maximum log-likelihood of $p_\theta(x_i)$. However, it is intractable in latent variable models. Therefore, we maximize the evidence lower bound instead
sample z from $q_i(x_i)$ and use samples to find gradient of ELBO
then update $q_i$ to maximize ELBO
Update on $q_i$ (which is approximated by Gaussians) is difficult since there are so many parameters ⬇️
We will use policy gradient to take grad of the expected portion of ELBO
Problem: It has very high variance
Solution: Reparameterize
Now we are actually using the derivatives of $r$, so we will have a lower variance