Radon-Nikodym, Girsanov, and Diffusion Flow Cheat Sheet
This note is for the common situation where the same random object is being described by two different probability laws. In physics language, think of two ensembles on the same configuration space. The Radon-Nikodym derivative is the pointwise reweighting factor between them. Girsanov’s theorem is the path-space version for diffusion processes: it tells us how much probability weight changes when we change the drift of an SDE but keep the noise strength fixed.
The goal here is not measure-theory elegance. The goal is to make the object computable, while keeping just enough rigor to avoid misleading pictures. Then we connect it to the formulas that appear in score-based diffusion, DDIM/probability-flow ODEs, flow matching, Fokker-Planck equations, continuity equations, and reverse-time SDEs.
1. Radon-Nikodym Derivative
Suppose two probability laws $P$ and $Q$ live on the same measurable space $(\Omega,\mathcal F)$. Here $\Omega$ is the sample space and $\mathcal F$ is the collection of events we are allowed to assign probabilities to.
The Radon-Nikodym derivative
is the local density ratio: how much more strongly $P$ weights the point $x$ compared with $Q$.
The upright $\mathrm d$ is the convention many books use for measure/integration notation:
So $\mathrm dP/\mathrm dQ$ should not be read as an ordinary derivative with respect to a coordinate. It means “the density of the measure $P$ relative to the measure $Q$.” In informal writing people often type $dP/dQ$, but the upright $\mathrm d$ version is the cleaner notation.
If both laws have ordinary densities $p(x)$ and $q(x)$ with respect to the same base measure $\mathrm dx$, then nothing mysterious is happening:
The extra condition is that $Q$ must not assign zero probability to regions where $P$ assigns positive probability. In symbols,
Computably, this just means: do not try to reweight from samples that never visit the important states.
Reweighting Identity
The main use is expectation conversion:
This is importance sampling in its cleanest form.
More explicitly,
Physics Picture
For canonical ensembles with energies $E_0(x)$ and $E_1(x)$,
Then
So the Radon-Nikodym derivative is the Boltzmann reweighting factor plus a normalization constant. If the two ensembles barely overlap, the ratio has huge variance, and Monte Carlo becomes painful.
Log Form
Most computations use the log ratio:
This is the quantity that appears in likelihood-ratio estimators, variational bounds, KL divergence, and path-probability ratios.
2. The Radon-Nikodym Theorem
The theorem says:
If $P$ is absolutely continuous with respect to $Q$, meaning $Q(A)=0$ implies $P(A)=0$, then there exists a nonnegative measurable function $r(x)$ such that
That function is unique up to $Q$-almost everywhere equality, and is written
The phrase “$Q$-almost everywhere” just means that changing $r$ on a set with $Q$-probability zero does not change any integral against $Q$. Intuitively: if $P$ never puts probability mass where $Q$ sees nothing, then $P$ can be built by locally reweighting $Q$.
Computable Versions
- Discrete states: $\mathrm dP/\mathrm dQ(x)=P(x)/Q(x)$.
- Continuous densities: $\mathrm dP/\mathrm dQ(x)=p(x)/q(x)$.
- Path measures: $\mathrm dP/\mathrm dQ(\omega)$ is a likelihood ratio for the whole trajectory $\omega={X_t:0\le t\le T}$.
- Monte Carlo: sample from $Q$, weight by $\mathrm dP/\mathrm dQ$.
- Training objective: log-likelihood ratios are usually easier than raw likelihood ratios.
3. Girsanov Theorem
Girsanov is the Radon-Nikodym theorem for changing the drift of a diffusion.
Start with a reference process
Now compare it with another process with the same diffusion matrix but different drift:
Let
Then, under standard regularity/integrability conditions, the path probability ratio is
Here $P_0$ and $P_1$ are probability measures on path space, not just densities at a single time. The random variable inside the exponential depends on the whole trajectory. That is the main conceptual jump from ordinary density ratios to stochastic-process density ratios.
The important interpretation:
This is directly related to Girsanov’s theorem. Radon-Nikodym gives the language of density ratios between probability laws. Girsanov gives the explicit density ratio between two diffusion path measures.
Discrete-Time Intuition
Euler discretize with step $\Delta t$:
Changing $b_0$ to $b_1$ shifts the mean of each Gaussian transition. The full trajectory likelihood ratio is the product of Gaussian transition-density ratios. Taking the continuous-time limit gives the Girsanov exponential.
Why Same Noise Matters
Girsanov changes the drift while keeping the quadratic variation fixed. In physics language, you can bias the force field, but you cannot secretly change the temperature/noise amplitude and expect the same theorem to apply in the same simple form.
4. Score-Based Diffusion Cheat Sheet
Let $p_t(x)$ be the marginal density at time $t$, and define the score
Forward SDE:
Fokker-Planck equation:
Reverse-time SDE:
Here $dt<0$ if we parameterize the solver backward in the original time variable. Many codebases instead introduce a new positive reverse time variable.
Probability-flow ODE:
Same marginals as the forward/reverse SDE, but deterministic trajectories.
Score matching denoising target, common VP-style form:
Denoising score matching loss:
Noise-prediction parameterization:
$x_0$-prediction relation:
5. VP, VE, and Sub-VP Common Forms
Variance-preserving SDE:
VP marginal:
VP reverse SDE:
VP probability-flow ODE:
Variance-exploding SDE:
VE reverse SDE:
VE probability-flow ODE:
Sub-VP SDE, one common form:
6. DDPM and DDIM Relations
Discrete forward noising:
Noise form:
Predict clean sample:
DDPM mean:
DDPM sampling:
DDIM deterministic update:
DDIM with stochasticity parameter $\eta$:
The deterministic DDIM update is a discrete non-Markovian sampler related to the probability-flow ODE. It transports samples without injecting fresh noise at every step.
7. Continuity Equation and Flow Matching
For a deterministic flow
the density evolves by the continuity equation
This is the zero-diffusion version of Fokker-Planck.
| Flow matching trains a velocity field $v_\theta(x,t)$ to match a target conditional velocity $u_t(x | x_0,x_1)$ along a chosen interpolation path: |
Linear interpolation path:
Gaussian conditional path:
Conditional velocity from the path:
Sampling ODE:
8. Fokker-Planck, Current, and Physics Notation
General Ito SDE:
Diffusion tensor:
Fokker-Planck:
If $D$ is constant:
Probability current form:
Using the score:
So diffusion can be viewed as a score-driven current down density gradients.
9. Reverse SDE and Probability-Flow ODE Side by Side
Forward:
Reverse SDE:
Probability-flow ODE:
Same one-time marginals:
Different trajectories:
Likelihood from probability-flow ODE:
Therefore
with sign depending on whether the ODE is integrated forward or backward.
10. Where Radon-Nikodym and Girsanov Enter Diffusion Models
- Score models learn $\nabla\log p_t$, not the full density $p_t$.
- Reverse SDE formulas come from comparing forward and backward path measures.
- Girsanov gives the likelihood ratio when one drift is replaced by another drift.
- Classifier guidance and control terms can often be interpreted as drift changes.
- Path-space KLs between controlled and uncontrolled diffusions are quadratic control costs:
up to initial/terminal density terms and convention choices.
- Schrodinger bridge methods use this idea directly: find the most likely drift control that transforms one marginal distribution into another while staying close to a reference diffusion.
- Probability-flow ODEs turn stochastic density evolution into deterministic transport with the same marginals.
- Flow matching starts from the deterministic transport side and learns the velocity field directly.
Big Picture
Radon-Nikodym derivative:
Girsanov theorem:
Score-based diffusion:
Flow matching:
Physics translation: