Time-Dependent NNQMC, LVM, and Canonical AFQMC
This note is a method-first tutorial. The model can be a spin system, a fermion lattice Hamiltonian, or a continuum many-electron problem after choosing a basis. The common goal is to compute either real-time dynamics or the corresponding canonical thermal benchmark.
The three algorithms play different roles. Time-dependent NNQMC represents $\lvert\psi(t)\rangle$ by a neural state $\lvert\psi_{\theta(t)}\rangle$ and evolves $\theta(t)$ by a time-dependent variational principle. The Linear Variational Method (LVM) represents $\lvert\psi(t)\rangle=\sum_k c_k(t)\lvert\phi_k\rangle$ in a fixed subspace and solves projected dynamics. Canonical AFQMC samples finite-temperature equilibrium in a fixed particle-number or symmetry sector.
Notation
Let $x$ be a computational-basis configuration and $\psi_\theta(x)=\langle x\vert\psi_\theta\rangle$. Neural quantum Monte Carlo samples
For any operator $A$, the local estimator is
The practical primitive is simple: for each sampled $x$, enumerate all connected configurations $y$ with $A_{xy}\ne0$, compute $\psi_\theta(y)/\psi_\theta(x)$, multiply by $A_{xy}$, and sum. For fermions, this is where operator-ordering signs enter.
1. Time-Dependent NNQMC
Exact real-time evolution obeys
TD-NNQMC restricts the state to a variational manifold $\lvert\psi_{\theta(t)}\rangle$ and chooses $\dot\theta$ so that $\sum_j\dot\theta_j\lvert\partial_j\psi_\theta\rangle$ is as close as possible to $-iH\lvert\psi_\theta\rangle$.
Log-Derivative Geometry
Define logarithmic derivatives $O_j(x)=\partial_{\theta_j}\log\psi_\theta(x)$. Then $\partial_{\theta_j}\psi_\theta(x)=O_j(x)\psi_\theta(x)$. Remove norm/phase redundancy by centering: $\Delta O_j(x)=O_j(x)-E[O_j]$ and $\Delta E(x)=E_{\rm loc}(x)-E[E_{\rm loc}]$.
To see where the TDVP matrices come from, write the variational time derivative as $\dot\psi_\theta=\sum_j\dot\theta_j\partial_j\psi_\theta$. TDVP chooses $\dot\theta$ by minimizing the residual between this tangent-space velocity and the exact Schrodinger velocity:
At the minimum, the residual is orthogonal to every tangent vector $\lvert\partial_i\psi_\theta\rangle$. This gives the projected linear system
Now use $\partial_i\psi_\theta(x)=O_i(x)\psi_\theta(x)$. The tangent-vector overlap becomes an average of two log derivatives, and the Hamiltonian projection becomes an average with the local energy. After removing the component parallel to the state by centering, the TDVP equations are
For real parameters, split complex quantities into real and imaginary parts; many implementations use ${\rm Re}(S)\dot\theta={\rm Im}(F)$, up to sign conventions.
Equivalently, $\dot\theta$ minimizes the residual $r(x)=\sum_j\dot\theta_j\Delta O_j(x)+i\Delta E(x)$ in mean square. This residual is the most useful interpretation: TDVP projects the Schrodinger vector field onto the neural tangent space.
Monte Carlo Estimators
Draw $M$ samples $x_a\sim p_\theta$. Estimate $\bar O_j=M^{-1}\sum_a O_j(x_a)$ and $\bar E=M^{-1}\sum_a E_{\rm loc}(x_a)$. Then
Directly forming and inverting $S$ is expensive because $S$ is $P\times P$, where $P$ is the number of neural-network parameters.
Sample-Space Solve
Let $Y_{aj}=(O_j(x_a)-\bar O_j)/\sqrt M$ and $e_a=(E_{\rm loc}(x_a)-\bar E)/\sqrt M$. Then $S=Y^\dagger Y$ and $F=Y^\dagger e$.
Instead of solving in parameter space, use the identity
This is the key computational trick. The matrix $YY^\dagger$ is $M\times M$, the empirical neural tangent kernel on the Monte Carlo batch. Usually $M\ll P$, so this solve is feasible even for large networks.
Regularization can be a diagonal shift $\lambda I$ or a spectral cutoff: diagonalize $YY^\dagger$, discard eigenvalues smaller than $r_{\rm cond}\lambda_{\max}$, and invert only the retained modes.
Time Integration
TDVP gives an ODE $\dot\theta=f(\theta,t)$. A robust low-cost integrator is Heun:
k1 = f(theta_n, t_n)
theta_pred = theta_n + dt*k1
k2 = f(theta_pred, t_n + dt)
theta_{n+1} = theta_n + dt*(k1 + k2)/2
For a time-dependent Hamiltonian, a common midpoint variant holds $H$ fixed at $H(t_n+dt/2)$ during the step:
tH = t_n + dt/2
k1 = f(theta_n, tH)
theta_pred = theta_n + dt*k1
k2 = f(theta_pred, tH)
theta_{n+1} = theta_n + dt*(k1 + k2)/2
Choose $dt$ so the largest local energy scale times $dt$ is small. In a static Hamiltonian, energy drift is a direct check of time-step and variational error.
Sampling and Measurements
Use a sampler that stays in the intended Hilbert-space sector. For spin systems, this may mean magnetization-preserving spin exchanges. For fermions, propose particle moves from occupied to empty orbitals while preserving $N_\uparrow$ and $N_\downarrow$.
At each measurement time, sample from $\lvert\psi_\theta\rvert^2$, evaluate $A_{\rm loc}(x)$, and average. For Hermitian $A$, keep the real part but monitor the imaginary part as a noise diagnostic.
The TDVP residual should be estimated on an independent validation batch:
Using the same batch as the linear solve can underestimate the error because the solve may fit sampling noise.
TD-NNQMC Algorithm
logpsi(theta,x), Hamiltonian connections, theta0, time grid, sector-preserving sampler.- Set
theta = theta0. - At each time step, sample configurations
x_afrom|psi_theta(x)|^2. - Compute local energies
E_loc(x_a)and log derivativesO_j(x_a), or equivalent JVP/VJP objects. - Center
O_jandE_loc, then build the sample-space kernelK = Y Y^dagger. - Solve
alpha = (K + lambda I)^(-1) eand recoverdot_theta = -i Y^dagger alpha. - Advance
thetawith Heun or RK4. - Measure observables and the validation residual on fresh samples.
- Save selected snapshots
theta(t)for later LVM use.
For large networks, avoid explicitly storing the full Jacobian if possible. The solve only needs $YY^\dagger$ and $Y^\dagger\alpha$, which can be obtained through vector-Jacobian and Jacobian-vector products.
2. Linear Variational Method
LVM approximates dynamics in a fixed basis:
The basis can be TD-NNQMC snapshots, Krylov vectors, Lanczos vectors, Chebyshev vectors, or separately optimized variational states.
Define the overlap matrix $S_{ij}=\langle\phi_i\vert\phi_j\rangle$ and projected Hamiltonian $H_{ij}=\langle\phi_i\vert H\vert\phi_j\rangle$. Projecting Schrodinger evolution gives
In an orthonormal basis this reduces to $\dot c=-iHc$, but neural snapshots are generally nonorthogonal.
Matrix Elements by Monte Carlo
For neural basis states, choose a sampling distribution with support over all basis states, for example $\Pi(x)=\sum_k w_k\lvert\phi_k(x)\rvert^2$ with $w_k>0$. Then
For an operator $A$, define the transition local estimator $A_{\rm loc}^{(j)}(x)=\langle x\vert A\vert\phi_j\rangle/\phi_j(x)$. Then $A_{ij}=Z_\Pi E_\Pi[\phi_i^\ast(x)\phi_j(x)A_{\rm loc}^{(j)}(x)/\Pi(x)]$.
Use the same logic for $H_{ij}$. After Monte Carlo estimation, symmetrize Hermitian matrices: $S\leftarrow(S+S^\dagger)/2$, $H\leftarrow(H+H^\dagger)/2$, and $A\leftarrow(A+A^\dagger)/2$.
Orthonormalization
Do not invert $S$ blindly. Diagonalize $S=UsU^\dagger$, discard directions with $s_\alpha<\epsilon_Ss_{\max}$, and define $B=U_{\rm kept}s_{\rm kept}^{-1/2}$.
The retained orthonormal basis and transformed matrices are
If $\lvert\psi\rangle=\sum_i c_i\lvert\phi_i\rangle$, its coefficients in the orthonormal retained basis are $d=B^\dagger Sc$. Conversely, within the retained subspace, $c=Bd$.
Finite-Time and Infinite-Time Dynamics
For time-independent $H$, evolve $d(t)=e^{-iH_ot}d(0)$. Observables are $\langle A(t)\rangle=d^\dagger(t)A_od(t)/d^\dagger(t)d(t)$.
For the infinite-time average, diagonalize $H_ou_\alpha=E_\alpha u_\alpha$ and expand $d(0)=\sum_\alpha a_\alpha u_\alpha$. If there are no degeneracies,
With degeneracies, keep all pairs satisfying $E_\alpha=E_\beta$: $\bar A_\infty=\sum_{\alpha,\beta:E_\alpha=E_\beta}a_\alpha^\ast a_\beta u_\alpha^\dagger A_ou_\beta$.
In finite precision, group energies using a tolerance. The tolerance should exceed numerical diagonalization noise but not merge physically distinct frequencies.
LVM Algorithm
phi_k, Hamiltonian connections, observable A, initial state.- Choose a mixture distribution
Pi(x) = sum_k w_k |phi_k(x)|^2. - Sample configurations
x_afromPi. - Evaluate all basis amplitudes
phi_k(x_a). - Estimate
S_ij,H_ij, andA_ijby reweighting. - Hermitize
S,H, andA. - Diagonalize
S, discard small-overlap directions, and buildB = U_kept s_kept^(-1/2). - Transform
H_o = B^dagger H BandA_o = B^dagger A B. - Project the initial state into the retained basis.
- Evolve
d(t), or diagonalizeH_ofor infinite-time averages. - Repeat with larger bases and independent Monte Carlo samples.
For dynamics generated from time snapshots, a natural sequence is $\lvert\phi_k\rangle\approx(e^{-iH\Delta t})^k\lvert\psi_0\rangle$. Increasing the final snapshot time or the number of basis states controls the subspace approximation.
3. Canonical AFQMC
AFQMC is an imaginary-time finite-temperature method. Grand-canonical AFQMC samples ${\rm Tr}\,e^{-\beta(H-\mu N)}$, while canonical AFQMC samples fixed-particle-number traces such as ${\rm Tr}^{(N)}e^{-\beta H}$, or fixed up/down particle-number sectors. Canonical sampling is the right benchmark when the real-time dynamics conserves particle number or magnetization.
BSS Structure
Write the Hamiltonian as a one-body part plus interactions that can be Hubbard-Stratonovich decoupled. Discretize $\beta=L_\tau\Delta\tau$ and use a symmetric Trotter step. After HS decoupling, each time slice is a one-body propagator $B_\ell(s_\ell)$, and a full auxiliary-field configuration $s$ gives $U_s=B_{L_\tau}(s_{L_\tau})\cdots B_1(s_1)$.
For one species, the grand-canonical field weight is $W_{\rm GC}(s)=\det(I+U_s)$. For spin species, multiply the determinants over spin sectors. If $W$ is not positive, sample $\lvert W\rvert$ and reweight by the sign or phase.
Exact Canonical Projection
Introduce a fugacity $z$. For fixed fields,
For two spin sectors, $W_{N_\uparrow,N_\downarrow}(s)=[z_\uparrow^{N_\uparrow}]\det(I+z_\uparrow U_{s,\uparrow})[z_\downarrow^{N_\downarrow}]\det(I+z_\downarrow U_{s,\downarrow})$.
One way to compute the coefficient is Fourier projection:
Choose $N_{\phi}\ge N_{\rm orb}+1$ for exact projection in exact arithmetic.
Another way is to use eigenvalues $\lambda_a$ of $U_s$: $\det(I+zU_s)=\prod_a(1+z\lambda_a)$. The coefficient of $z^N$ is the degree-$N$ elementary symmetric polynomial in ${\lambda_a}$, computed recursively by updating coefficients in descending order.
At low temperature, $U_s$ is ill-conditioned, so practical codes use QR/SVD stabilization and stabilized coefficient extraction.
Canonical Green’s Functions
At fugacity $z$, the grand-canonical one-body density matrix for fixed fields is $\rho(z)=zU_s(I+zU_s)^{-1}$. The canonical one-body density matrix is obtained by projecting the numerator and denominator separately:
Using Fourier projection, this is the weighted sum over phases $z=e^{i\phi_m}$. Two-body canonical estimators can be derived by adding sources and differentiating the canonical generating function, or by library-provided canonical measurement routines. Do not blindly use grand-canonical Wick contractions after fixing $N$.
Penalty-Enforced Canonical Sampling
A practical alternative is to simulate an augmented Hamiltonian
Large penalties exponentially suppress unwanted sectors while allowing reuse of grand-canonical AFQMC machinery.
The procedure is empirical but controlled: increase $\lambda_N$ and $\lambda_S$ until $\langle(\hat N-N_0)^2\rangle$ and $\langle(\hat S_z-S_{z,0})^2\rangle$ vanish within error bars. Then measure observables in the effectively canonical ensemble.
Updates, Stabilization, and Measurements
A local update proposes changing one auxiliary field. The acceptance ratio is the ratio of canonical weights, $R=W_{N}(s_{\rm new})/W_{N}(s)$, or the corresponding penalty-Hamiltonian grand-canonical ratio. Accept with probability $\min(1,\lvert R\rvert)$ and reweight signs if needed.
The product $U_s=B_{L_\tau}\cdots B_1$ must be stabilized by QR or SVD factorizations. Store products in factorized form, track log determinants, and periodically recompute Green’s functions from stabilized matrices.
Measurements use $\langle A\rangle^{(N)}=\sum_s W^{(N)}(s)A^{(N)}(s)/\sum_s W^{(N)}(s)$. With signs, accumulate the reweighted ratio $\langle A\rangle^{(N)}=\langle A^{(N)}(s)\,{\rm sgn}^{(N)}(s)\rangle/\langle{\rm sgn}^{(N)}(s)\rangle$. Use blocking or jackknife because Markov-chain samples are correlated.
Canonical AFQMC Algorithm
K, interaction channels, target sector, beta, Delta tau, HS decomposition.- Initialize the auxiliary fields
s. - Propose local or global field updates.
- Compute the canonical weight ratio by coefficient projection.
- Accept or reject using
min(1, |R|). - Periodically stabilize the product
B_L ... B_1. - After warmup, compute the canonical Green's function.
- Measure observables and accumulate sign-reweighted statistics.
- Postprocess with blocking/jackknife errors,
Delta tauextrapolation, and temperature interpolation.
For a thermal benchmark to a real-time calculation, compute the thermal energy $E_{\rm th}(T)$ over a temperature grid, fit or interpolate it, solve $E_{\rm th}(T_{\rm eff})=E_{\rm dyn}$, and evaluate the observable at $T_{\rm eff}$ in the same canonical sector.
How They Fit Together
A common pipeline is:
- Choose the conserved sector.
- Prepare the initial state.
- Run TD-NNQMC and save snapshots.
- Build an LVM basis from snapshots.
- Extract finite-time or infinite-time observables.
- Run canonical AFQMC in the same sector.
- Match the temperature by conserved energy.
- Compare dynamical and thermal observables.
TD-NNQMC and LVM approximate unitary real-time dynamics. AFQMC samples imaginary-time equilibrium. AFQMC does not show that a system thermalizes; it gives the value that real-time dynamics should approach if thermalization holds.
References
- Time-dependent VMC for correlated electrons: Ido, Ohgoe, and Imada
- TDVP and variational quantum simulation: Yuan et al.
- Large-scale stochastic reconfiguration identity: Rende et al.
- Time-dependent Neural Galerkin / LVM: Sinibaldi et al.
- Canonical finite-temperature AFQMC: Wang, Assaad, and Parisen Toldin
- BSS AFQMC: Blankenbecler, Scalapino, and Sugar
- ALF AFQMC documentation: Assaad et al.