# Stanford CS236: Deep Generative Models I 2023 I Lecture 17 - Discrete Latent Variable Models

()

## Score-Based Diffusion Models (SDMs)

• SDMs are closely connected to noising diffusion probabilistic models (DDPMs).
• DDPMs can be interpreted as a VAE where the encoder adds noise to the data and the decoder denoises it.
• Optimizing the evidence lower bound in DDPMs corresponds to learning a sequence of denoisers, similar to noise conditional score models.
• The diffusion version of DDPMs considers a continuous spectrum of noise levels, allowing for more efficient sampling and likelihood evaluation.
• The process of adding noise is described by a stochastic differential equation (SDE).
• The drift term in the SDE becomes important when reversing the direction of time.
• The reverse SDE has a drift term that is the score of the corresponding perturbed data density at time T.
• Both the forward and reverse SDEs describe the same kind of trajectories, and the only difference is the direction of time.
• Score-based models can be used to learn generative models by estimating score functions using a neural network.
• The score-based MCMC method uses Langevin dynamics to generate samples from a density corresponding to a given time.
• Discretizing the time axis in the score-based SDE leads to numerical errors, which can be reduced by using larger Langevin dynamics steps.
• Score-based models can be converted into flow models by eliminating noise at every step, resulting in an infinitely deep continuous time normalizing flow model.

## Efficient Sampling Techniques

• The sampling process in SDMs can be reinterpreted as solving an ODE, where the dynamics of the ODE are defined by the score function of the diffusion model.
• This perspective allows for leveraging techniques from numerical analysis and scientific computing to improve sampling efficiency and generate higher-quality samples.
• Consistency models are neural networks that directly output the solution of the ODE, enabling fast sampling procedures.
• Parallel-in-time methods can further accelerate the sampling process by leveraging multiple GPUs to compute the solution of the ODE in parallel.
• Distillation techniques can be used to train student models that can approximate the solution of the ODE in fewer steps, leading to even faster sampling.

## Stable Diffusion

• Stable Diffusion uses a latent diffusion model, which adds an extra encoder and decoder layer at the beginning of the model.
• This allows for faster training on low-resolution images or low-dimensional data.
• Stable Diffusion pre-trains the outer encoder and then keeps it fixed while training the diffusion model over the latent space.
• To incorporate text into the model, a pre-trained language model is used to map the text to a vector representation, which is then fed into the neural network architecture.

## Conditional Generation

• To control the generation process without training a different model, the prior distribution of the generative model is combined with a classifier's likelihood to sample from the conditional distribution of images given a specific label.
• Computing the denominator of the posterior distribution is intractable, making it difficult to directly sample from the posterior.
• Working at the level of scores simplifies the computation of the posterior score, allowing for easy incorporation of pre-trained models and classifiers.
• By modifying the drift in the SDE or ODE to include the score of the classifier, one can steer the generative process towards images that are consistent with a desired class or caption.
• Classifier-free guidance is a technique that avoids explicit classifier training by taking the difference of two diffusion models, one conditioned on side information and the other not.