# Stanford CS236: Deep Generative Models I 2023 I Lecture 13 - Score Based Models

06 May 2024 (4 months ago)

## Score-based models

• Score-based models, also known as diffusion models, are a state-of-the-art class of generative models for continuous data modalities like images, videos, speech, and audio.
• Unlike likelihood-based models that work with probability density functions, score-based models focus on the gradient of the log density, known as the score function.
• Score-based models offer an alternative interpretation of probability distributions by representing them as vector fields or gradients, which can be computationally advantageous.
• Score-based models address the challenge of normalization by modeling data using the score instead of the density, allowing for more flexible parameterizations without strict normalization constraints.

## Score matching

• Score matching is a technique used to train energy-based models by fitting the model's score function to match the score function of the data distribution.
• The Fisher Divergence, which measures the difference between two probability distributions, can be rewritten in terms of the score function, enabling efficient optimization without computing the partition function.
• Score matching can be applied to a wide range of model families beyond energy-based models, as long as the gradient of the log density with respect to the input can be computed.
• Score matching directly models the gradients (scores) rather than the likelihood, and it does not involve a normalization constant or latent variables.
• The term "scores" is used in the literature and for loss functions like the Fisher score, hence the name "score matching."
• Score matching aims to estimate the gradient of the data distribution to model the data.
• The Fisher divergence is used to measure the difference between the true and estimated vector fields of gradients.
• Minimizing the Fisher divergence as a function of theta is a reasonable learning objective.

## Denoising score matching

• Denoising score matching is an approach to address the computational challenges of score matching by estimating the gradient of data perturbed with noise.
• Denoising score matching is computationally more efficient, especially when the noise level is relatively small.
• The speaker introduces a method to approximate the score of a data density perturbed with noise, denoted as Q Sigma.
• This approximation is achieved by replacing the Fisher Divergence between the model and the data with the Fisher Divergence between the model and the noise-perturbed data density.
• The key idea is that when the noise level Sigma is small, the noise-perturbed data density Q Sigma is close to the original data density, making the estimated scores similar.
• The resulting algorithm involves sampling data points, adding Gaussian noise, and estimating the denoising score matching loss based on the mini-batch.

## Noising score matching

• Noising score matching is a technique used in generative modeling.
• It involves adding noise to data points and training a model to estimate the noise.
• The goal is to minimize the loss between the estimated noise and the actual noise.
• This approach is scalable and easier to implement compared to directly modeling the distribution of clean data.
• Noising score matching is equivalent to minimizing the original loss function up to a constant.
• The optimal denoising strategy involves following the gradient of the perturbed log-likelihood.
• The technique is applicable to various noise distributions as long as the gradient can be computed.

## Sliced score matching

• Random projections can be used to efficiently approximate the regional score matching loss.
• Sliced Fisher Divergence is a variant of the Fisher Divergence that involves Jacobian Vector products, which can be efficiently estimated using backpropagation.
• The projection operation is a dot product between the data and model gradients projected along a random direction.
• Biasing the projections towards certain directions does not seem to make a significant difference in practice.
• Sliced versions of score matching are constant with respect to the data dimension and perform similarly to exact score matching.

## Inference in diffusion models

• Inference in diffusion models can be done by following the gradient of the log probability density or using Markov Chain Monte Carlo (MCMC) methods.
• Lyapunov dynamics sampling is a method for generating samples from a density using the estimated gradient.
• Lyapunov dynamics sampling is a valid Markov chain Monte Carlo (MCMC) procedure in the limit of small step sizes and an infinite number of steps.
• Real-world data tends to lie on low-dimensional manifolds, which can cause problems for Lyapunov dynamics sampling.
• Diffusion models provide a way to fix this problem by estimating these scores more accurately all over the space and getting better guidance.