# Stanford CS236: Deep Generative Models I 2023 I Lecture 12 - Energy Based Models

## Energy-Based Models (EBMs)

- EBMs use an energy function to represent the probability distribution of data.
- Sampling from EBMs is challenging, especially in high dimensions, making training computationally expensive.
- Alternative training methods for EBMs are needed that do not require sampling during training.

## Score Matching

- The score function provides an alternative view of the original function by looking at things from the perspective of the gradient instead of the likelihood itself.
- The Fisher divergence between two probability densities can be used as a loss function for training EBMs.
- The Fisher divergence can be expressed in terms of the difference between the gradients of the log data density and the log model density.
- This results in a loss function that can be evaluated and optimized as a function of the model parameters.
- The loss function encourages the data points to be local maxima of the log-likelihood, ensuring a good fit of the model to the data.

## Contrastive Learning

- An alternative training method for EBMs involves contrasting data to samples from a noise distribution rather than directly to samples from the model.
- By parameterizing the discriminator in terms of an energy-based model, the optimal discriminator will force the energy-based model to match the data distribution.
- Contrastive learning with EBMs involves distinguishing between real data and fake samples generated from a fixed noise distribution.
- The noise distribution should be close to the data distribution for effective learning.
- Sampling during inference is not necessary as the trained model can be used as an energy-based model.

## Noise Contrastive Estimation (NCE)

- NCE is similar to AGN in that it uses binary cross-entropy loss and is a likelihood-free method.
- Unlike AGN, NCE does not involve a Minimax optimization and is more stable to train.
- NCE requires the ability to evaluate the likelihood of contrastive samples, while AGN only requires the ability to sample from the generator.
- In NCE, the discriminator is trained to distinguish between real and noisy samples, and the energy function derived from the discriminator defines an energy-based model.

## Flow Contrastive Estimation (FCE)

- FCE is a variant of NCE where the noise distribution is defined by a normalizing flow model.
- The flow model is trained adversarially to confuse the discriminator, making the classification problem harder and the noise distribution closer to the data distribution.
- FCE provides both an energy-based model and a flow model, with the choice of which to use depending on the specific task.