Latent variable models

able to represent more complex distributions by integrating over 2 simple distributions

e.g. p(x|z) is conditional normal

p(z) is normal

Untitled

z is a Categorial Discrete RV (1-3) corresponding to the clusters

x is a Continuous 2D (mean and variance) RV

(1) Reason

Untitled

Since we need to take the integral over all the gaussian mixture elements for each gradient step, it is completely intractable and has bad properties.

Screenshot 2023-09-15 at 2.35.05 PM.png

Guess which group $z$ the data point $x_i$ is in, and pretend it’s the right one

Untitled

Law of Total Probability

integral over z → expectation

Untitled

Summary

Untitled

We wish to take the maximum log-likelihood of $p_\theta(x_i)$. However, it is intractable in latent variable models. Therefore, we maximize the evidence lower bound instead

sample z from $q_i(x_i)$ and use samples to find gradient of ELBO

then update $q_i$ to maximize ELBO

Update on $q_i$ (which is approximated by Gaussians) is difficult since there are so many parameters ⬇️

Untitled

Amortized Variational Inference

Untitled

We will use policy gradient to take grad of the expected portion of ELBO

Untitled

Problem: It has very high variance

Solution: Reparameterize

Untitled

Now we are actually using the derivatives of $r$, so we will have a lower variance