Stanford CS236: Deep Generative Models I 2023 I Lecture 13 - Score Based Models

Stanford CS236: Deep Generative Models I 2023 I Lecture 13 - Score Based Models

Score-based models

  • Score-based models, also known as diffusion models, are a state-of-the-art class of generative models for continuous data modalities like images, videos, speech, and audio.
  • Unlike likelihood-based models that work with probability density functions, score-based models focus on the gradient of the log density, known as the score function.
  • Score-based models offer an alternative interpretation of probability distributions by representing them as vector fields or gradients, which can be computationally advantageous.
  • Score-based models address the challenge of normalization by modeling data using the score instead of the density, allowing for more flexible parameterizations without strict normalization constraints.

Score matching

  • Score matching is a technique used to train energy-based models by fitting the model's score function to match the score function of the data distribution.
  • The Fisher Divergence, which measures the difference between two probability distributions, can be rewritten in terms of the score function, enabling efficient optimization without computing the partition function.
  • Score matching can be applied to a wide range of model families beyond energy-based models, as long as the gradient of the log density with respect to the input can be computed.
  • Score matching directly models the gradients (scores) rather than the likelihood, and it does not involve a normalization constant or latent variables.
  • The term "scores" is used in the literature and for loss functions like the Fisher score, hence the name "score matching."
  • Score matching aims to estimate the gradient of the data distribution to model the data.
  • The Fisher divergence is used to measure the difference between the true and estimated vector fields of gradients.
  • Minimizing the Fisher divergence as a function of theta is a reasonable learning objective.

Denoising score matching

  • Denoising score matching is an approach to address the computational challenges of score matching by estimating the gradient of data perturbed with noise.
  • Denoising score matching is computationally more efficient, especially when the noise level is relatively small.
  • The speaker introduces a method to approximate the score of a data density perturbed with noise, denoted as Q Sigma.
  • This approximation is achieved by replacing the Fisher Divergence between the model and the data with the Fisher Divergence between the model and the noise-perturbed data density.
  • The key idea is that when the noise level Sigma is small, the noise-perturbed data density Q Sigma is close to the original data density, making the estimated scores similar.
  • The resulting algorithm involves sampling data points, adding Gaussian noise, and estimating the denoising score matching loss based on the mini-batch.

Noising score matching

  • Noising score matching is a technique used in generative modeling.
  • It involves adding noise to data points and training a model to estimate the noise.
  • The goal is to minimize the loss between the estimated noise and the actual noise.
  • This approach is scalable and easier to implement compared to directly modeling the distribution of clean data.
  • Noising score matching is equivalent to minimizing the original loss function up to a constant.
  • The optimal denoising strategy involves following the gradient of the perturbed log-likelihood.
  • The technique is applicable to various noise distributions as long as the gradient can be computed.

Sliced score matching

  • Random projections can be used to efficiently approximate the regional score matching loss.
  • Sliced Fisher Divergence is a variant of the Fisher Divergence that involves Jacobian Vector products, which can be efficiently estimated using backpropagation.
  • The projection operation is a dot product between the data and model gradients projected along a random direction.
  • Biasing the projections towards certain directions does not seem to make a significant difference in practice.
  • Sliced versions of score matching are constant with respect to the data dimension and perform similarly to exact score matching.

Inference in diffusion models

  • Inference in diffusion models can be done by following the gradient of the log probability density or using Markov Chain Monte Carlo (MCMC) methods.
  • Lyapunov dynamics sampling is a method for generating samples from a density using the estimated gradient.
  • Lyapunov dynamics sampling is a valid Markov chain Monte Carlo (MCMC) procedure in the limit of small step sizes and an infinite number of steps.
  • Real-world data tends to lie on low-dimensional manifolds, which can cause problems for Lyapunov dynamics sampling.
  • Diffusion models provide a way to fix this problem by estimating these scores more accurately all over the space and getting better guidance.

Overwhelmed by Endless Content?