Stanford CS236: Deep Generative Models I 2023 I Lecture 17 - Discrete Latent Variable Models
Score-Based Diffusion Models (SDMs)
- SDMs are closely connected to noising diffusion probabilistic models (DDPMs).
- DDPMs can be interpreted as a VAE where the encoder adds noise to the data and the decoder denoises it.
- Optimizing the evidence lower bound in DDPMs corresponds to learning a sequence of denoisers, similar to noise conditional score models.
- The diffusion version of DDPMs considers a continuous spectrum of noise levels, allowing for more efficient sampling and likelihood evaluation.
- The process of adding noise is described by a stochastic differential equation (SDE).
- The drift term in the SDE becomes important when reversing the direction of time.
- The reverse SDE has a drift term that is the score of the corresponding perturbed data density at time T.
- Both the forward and reverse SDEs describe the same kind of trajectories, and the only difference is the direction of time.
- Score-based models can be used to learn generative models by estimating score functions using a neural network.
- The score-based MCMC method uses Langevin dynamics to generate samples from a density corresponding to a given time.
- Discretizing the time axis in the score-based SDE leads to numerical errors, which can be reduced by using larger Langevin dynamics steps.
- Score-based models can be converted into flow models by eliminating noise at every step, resulting in an infinitely deep continuous time normalizing flow model.
Efficient Sampling Techniques
- The sampling process in SDMs can be reinterpreted as solving an ODE, where the dynamics of the ODE are defined by the score function of the diffusion model.
- This perspective allows for leveraging techniques from numerical analysis and scientific computing to improve sampling efficiency and generate higher-quality samples.
- Consistency models are neural networks that directly output the solution of the ODE, enabling fast sampling procedures.
- Parallel-in-time methods can further accelerate the sampling process by leveraging multiple GPUs to compute the solution of the ODE in parallel.
- Distillation techniques can be used to train student models that can approximate the solution of the ODE in fewer steps, leading to even faster sampling.
Stable Diffusion
- Stable Diffusion uses a latent diffusion model, which adds an extra encoder and decoder layer at the beginning of the model.
- This allows for faster training on low-resolution images or low-dimensional data.
- Stable Diffusion pre-trains the outer encoder and then keeps it fixed while training the diffusion model over the latent space.
- To incorporate text into the model, a pre-trained language model is used to map the text to a vector representation, which is then fed into the neural network architecture.
Conditional Generation
- To control the generation process without training a different model, the prior distribution of the generative model is combined with a classifier's likelihood to sample from the conditional distribution of images given a specific label.
- Computing the denominator of the posterior distribution is intractable, making it difficult to directly sample from the posterior.
- Working at the level of scores simplifies the computation of the posterior score, allowing for easy incorporation of pre-trained models and classifiers.
- By modifying the drift in the SDE or ODE to include the score of the classifier, one can steer the generative process towards images that are consistent with a desired class or caption.
- Classifier-free guidance is a technique that avoids explicit classifier training by taking the difference of two diffusion models, one conditioned on side information and the other not.