# Stanford CS236: Deep Generative Models I 2023 I Lecture 13 - Score Based Models

## Score-based models

- Score-based models, also known as diffusion models, are a state-of-the-art class of generative models for continuous data modalities like images, videos, speech, and audio.
- Unlike likelihood-based models that work with probability density functions, score-based models focus on the gradient of the log density, known as the score function.
- Score-based models offer an alternative interpretation of probability distributions by representing them as vector fields or gradients, which can be computationally advantageous.
- Score-based models address the challenge of normalization by modeling data using the score instead of the density, allowing for more flexible parameterizations without strict normalization constraints.

## Score matching

- Score matching is a technique used to train energy-based models by fitting the model's score function to match the score function of the data distribution.
- The Fisher Divergence, which measures the difference between two probability distributions, can be rewritten in terms of the score function, enabling efficient optimization without computing the partition function.
- Score matching can be applied to a wide range of model families beyond energy-based models, as long as the gradient of the log density with respect to the input can be computed.
- Score matching directly models the gradients (scores) rather than the likelihood, and it does not involve a normalization constant or latent variables.
- The term "scores" is used in the literature and for loss functions like the Fisher score, hence the name "score matching."
- Score matching aims to estimate the gradient of the data distribution to model the data.
- The Fisher divergence is used to measure the difference between the true and estimated vector fields of gradients.
- Minimizing the Fisher divergence as a function of theta is a reasonable learning objective.

## Denoising score matching

- Denoising score matching is an approach to address the computational challenges of score matching by estimating the gradient of data perturbed with noise.
- Denoising score matching is computationally more efficient, especially when the noise level is relatively small.
- The speaker introduces a method to approximate the score of a data density perturbed with noise, denoted as Q Sigma.
- This approximation is achieved by replacing the Fisher Divergence between the model and the data with the Fisher Divergence between the model and the noise-perturbed data density.
- The key idea is that when the noise level Sigma is small, the noise-perturbed data density Q Sigma is close to the original data density, making the estimated scores similar.
- The resulting algorithm involves sampling data points, adding Gaussian noise, and estimating the denoising score matching loss based on the mini-batch.

## Noising score matching

- Noising score matching is a technique used in generative modeling.
- It involves adding noise to data points and training a model to estimate the noise.
- The goal is to minimize the loss between the estimated noise and the actual noise.
- This approach is scalable and easier to implement compared to directly modeling the distribution of clean data.
- Noising score matching is equivalent to minimizing the original loss function up to a constant.
- The optimal denoising strategy involves following the gradient of the perturbed log-likelihood.
- The technique is applicable to various noise distributions as long as the gradient can be computed.

## Sliced score matching

- Random projections can be used to efficiently approximate the regional score matching loss.
- Sliced Fisher Divergence is a variant of the Fisher Divergence that involves Jacobian Vector products, which can be efficiently estimated using backpropagation.
- The projection operation is a dot product between the data and model gradients projected along a random direction.
- Biasing the projections towards certain directions does not seem to make a significant difference in practice.
- Sliced versions of score matching are constant with respect to the data dimension and perform similarly to exact score matching.

## Inference in diffusion models

- Inference in diffusion models can be done by following the gradient of the log probability density or using Markov Chain Monte Carlo (MCMC) methods.
- Lyapunov dynamics sampling is a method for generating samples from a density using the estimated gradient.
- Lyapunov dynamics sampling is a valid Markov chain Monte Carlo (MCMC) procedure in the limit of small step sizes and an infinite number of steps.
- Real-world data tends to lie on low-dimensional manifolds, which can cause problems for Lyapunov dynamics sampling.
- Diffusion models provide a way to fix this problem by estimating these scores more accurately all over the space and getting better guidance.