Variational Quantum Algorithms

Theory of variational quantum algorithms including the variational principle, VQE, QAOA, barren plateaus, and gradient computation methods. This guide connects the theoretical foundations to pyqpanda3's variational quantum circuit (VQC) framework.

The Variational Principle

Rayleigh-Ritz Variational Principle

The foundation of all variational quantum algorithms is the Rayleigh-Ritz variational principle:

For any parameterized quantum state $| ψ (\vec{θ}) ⟩$ and any Hamiltonian $H$ , the expectation value of the energy provides an upper bound on the ground state energy $E_{0}$ :
$⟨ ψ (\vec{θ}) | H | ψ (\vec{θ}) ⟩ \geq E_{0}$

Equality holds if and only if $| ψ (\vec{θ}) ⟩$ is the ground state of $H$ .

Energy as a Cost Function

The variational principle allows us to formulate the ground state problem as an optimization:

E_{0} = min_{\vec{θ}} ⟨ ψ (\vec{θ}) | H | ψ (\vec{θ}) ⟩

The loss function (energy) is:

L (\vec{θ}) = ⟨ ψ (\vec{θ}) | H | ψ (\vec{θ}) ⟩

This loss function has important properties:

Non-negative curvature: $L (\vec{θ}) \geq E_{0} \geq λ_{min} (H)$
Smooth: $L$ is a smooth function of $\vec{θ}$ for analytic ansatzes
Observable: $L (\vec{θ})$ can be estimated from quantum measurements

Variational Quantum Circuits as Trial States

In pyqpanda3, the trial state $| ψ (\vec{θ}) ⟩$ is prepared by a variational quantum circuit (VQC) — a parameterized quantum circuit:

| ψ (\vec{θ}) ⟩ = U (\vec{θ}) | 0 ⟩^{\otimes n}

where $U (\vec{θ})$ is a unitary circuit parameterized by $\vec{θ} = (θ_{1}, θ_{2}, \dots, θ_{p})$ .

The VQC is constructed using pyqpanda3's VQCircuit class:

python

from pyqpanda3.vqcircuit import VQCircuit
from pyqpanda3.core import RY, CNOT
import numpy as np

# Build a hardware-efficient ansatz
vqc = VQCircuit()
vqc << RY(0, vqc.Param([0], "theta_0"))
vqc << RY(1, vqc.Param([1], "theta_1"))
vqc << CNOT(0, 1)
vqc << RY(0, vqc.Param([2], "theta_2"))
vqc << RY(1, vqc.Param([3], "theta_3"))

Expressibility

The expressibility of a VQC measures how well it can explore the Hilbert space. A perfectly expressible ansatz generates the Haar-uniform distribution of unitaries. Less expressible ansatzes may not be able to reach the optimal solution.

Expressibility is measured by the deviation from the Haar distribution:

E = D_{KL} (P_{ansatz} | P_{Haar})

A well-designed ansatz balances expressibility against trainability (avoiding barren plateaus).

VQE — Variational Quantum Eigensolver

Algorithm Overview

The Variational Quantum Eigensolver (VQE) is a hybrid quantum-classical algorithm for finding the ground state energy of a Hamiltonian:

Hamiltonian Decomposition

To measure the energy on a quantum computer, the Hamiltonian must be decomposed into measurable Pauli strings:

H = \sum_{i} h_{i} P_{i}

where $P_{i} \in {I, X, Y, Z}^{\otimes n}$ are Pauli operators and $h_{i}$ are coefficients.

The energy expectation is then:

⟨ H ⟩ = \sum_{i} h_{i} ⟨ P_{i} ⟩

Each $⟨ P_{i} ⟩$ is estimated from quantum measurements. Pauli strings that commute can be measured simultaneously (grouped into the same measurement circuit).

pyqpanda3 provides:

expval_hamiltonian(prog, hamiltonian, qvm, shots) — direct expectation value computation
expval_pauli_operator(prog, pauli_op, qvm, shots) — Pauli operator expectation

Ansatz Design Strategies

Hardware-Efficient Ansatz:

Designed to match the native gate set and connectivity of the target hardware:

U (\vec{θ}) = \prod_{l = 1}^{L} [\prod_{i} R Y (θ_{l, i})] \cdot ENTANGLE

where ENTANGLE is a layer of CNOT gates matching the hardware topology.

Pros: Shallow depth, native gates, fewer SWAP overhead
Cons: May require many layers, susceptible to barren plateaus

Chemically-Inspired Ansatz (Unitary Coupled Cluster):

Based on the unitary coupled cluster singles and doubles (UCCSD):

U (\vec{θ}) = e^{T (\vec{θ}) - T^{†} (\vec{θ})}

where $T (\vec{θ}) = \sum_{i a} θ_{i}^{a} a_{a}^{†} a_{i} + \sum_{i < j, a < b} θ_{i j}^{a b} a_{a}^{†} a_{b}^{†} a_{j} a_{i}$

Pros: Physically motivated, few parameters for small molecules
Cons: Deep circuits, requires Trotterization

Adaptive Ansatz (ADAPT-VQE):

Grow the ansatz iteratively by adding the operator with the largest gradient:

U_{k + 1} = e^{θ_{k + 1} A_{k + 1}} U_{k}

where $A_{k + 1} = \arg max_{A \in pool} | \frac{\partial E}{\partial θ_{A}} |_{θ_{A} = 0} |$

Pros: Compact ansatz, systematic construction
Cons: Requires gradient pool evaluation at each step

Classical Optimization

The classical optimizer updates parameters to minimize the energy:

{\vec{θ}}_{k + 1} = {\vec{θ}}_{k} - η \nabla L ({\vec{θ}}_{k})

Common optimizers used with VQE:

Optimizer	Type	Pros	Cons
Gradient Descent	First-order	Simple, stable	Slow convergence
Adam	First-order (adaptive)	Fast, robust	May oscillate
L-BFGS-B	Quasi-Newton	Superlinear convergence	Noisy gradient sensitive
COBYLA	Derivative-free	Handles noise	Slow for many parameters
SPSA	Stochastic	Noise-tolerant, $O (1)$ measurements per step	Slow convergence
SNOBFIT	Surrogate model	Good for noisy landscapes	Expensive per iteration

Shot Noise and Statistical Error

The energy estimate has statistical uncertainty from finite measurement shots:

σ_{E} = \sqrt{\sum_{i} h_{i}^{2} \frac{Var (P_{i})}{N_{shots}}}

where $Var (P_{i}) = 1 - ⟨ P_{i} ⟩^{2}$ for Pauli observables.

Rule of thumb: Doubling the accuracy requires $4 \times$ more shots (standard quantum limit).

Convergence Analysis

VQE convergence depends on:

Ansatz quality: Can the ansatz reach the ground state?
Optimizer efficiency: How quickly does the optimizer find the minimum?
Noise resilience: How much does noise affect the energy estimate?

VQE is generally considered noise-resilient because:

The variational principle still holds (noisy energy is still an upper bound)
The optimizer can partially compensate for systematic noise
Error bars from shot noise can guide the optimizer

QAOA — Quantum Approximate Optimization Algorithm

Algorithm Overview

QAOA is designed for combinatorial optimization problems. Given a cost function $C (x)$ on bitstrings $x \in {0, 1}^{n}$ , QAOA prepares a state that approximately minimizes $C$ :

| \vec{γ}, \vec{β} ⟩ = \prod_{p = 1}^{P} e^{- i β_{p} H_{M}} e^{- i γ_{p} H_{C}} | + ⟩^{\otimes n}

where:

$H_{C} = \sum_{x} C (x) | x ⟩ ⟨ x |$ is the cost Hamiltonian (diagonal in computational basis)
$H_{M} = \sum_{i} X_{i}$ is the mixer Hamiltonian
$\vec{γ} = (γ_{1}, \dots, γ_{P})$ and $\vec{β} = (β_{1}, \dots, β_{P})$ are the variational parameters
$P$ is the QAOA depth (number of alternating layers)

MaxCut Example

The canonical QAOA application is MaxCut. Given a graph $G = (V, E)$ :

Cost Hamiltonian:

H_{C} = \sum_{(i, j) \in E} \frac{1}{2} (I - Z_{i} Z_{j})

This assigns energy $- 1$ to edges whose vertices have different $Z$ values (i.e., are in different partitions) and energy $+ 1$ to edges in the same partition.

Mixer Hamiltonian:

H_{M} = \sum_{i \in V} X_{i}

QAOA circuit construction:

Each cost layer $e^{- i γ_{p} H_{C}}$ decomposes into RZZ gates:

e^{- i γ_{p} H_{C}} = \prod_{(i, j) \in E} e^{i γ_{p} Z_{i} Z_{j} / 2}

Each mixer layer $e^{- i β_{p} H_{M}}$ decomposes into RX gates:

e^{- i β_{p} H_{M}} = \prod_{i} e^{- i β_{p} X_{i}} = \prod_{i} R X (2 β_{p})_{i}

Approximation Ratio

The approximation ratio measures QAOA quality:

α = \frac{⟨ H_{C} ⟩_{QAOA}}{opt (H_{C})}

$P = 1$ : $α = 0.6924$ for MaxCut on 3-regular graphs (guaranteed by QAOA theory)
$P \to \infty$ : $α \to 1$ (QAOA converges to optimal)
In practice, $P = 5$ to $20$ often achieves $α > 0.9$

QAOA Variants

Variant	Description	Advantage
Standard QAOA	Fixed depth $P$ , optimize $\vec{γ}, \vec{β}$	Theoretical guarantees
Warm-Start QAOA	Initialize from classical solution	Better initial point
Recursive QAOA	Fix qubits iteratively	Reduces problem size
Multi-angle QAOA	Different angles per gate	More expressive
QAOA with constraints	Penalty terms in $H_{C}$	Handles constrained problems

Barren Plateaus

The Vanishing Gradient Problem

Barren plateaus occur when the gradient of the cost function vanishes exponentially with the number of qubits:

Var [\frac{\partial L}{\partial θ_{i}}] \in O (\frac{1}{2^{n}})

This means that for a system of $n$ qubits, the gradient is exponentially small, making optimization intractable.

Causes of Barren Plateaus

1. Expressibility-induced: Highly expressible ansatzes that cover the entire Hilbert space tend to concentrate cost function values around the mean, flattening the landscape.

2. Entanglement-induced: Deep entangling circuits produce global states where local parameter changes have exponentially small effects.

3. Noise-induced: Hardware noise flattens the cost landscape by driving all outputs toward the maximally mixed state.

4. Global cost functions: Cost functions that depend on all qubits simultaneously (e.g., global fidelity) exhibit barren plateaus even for shallow circuits.

Mitigation Strategies

1. Local Cost Functions:

Replace global cost functions with local ones:

L_{global} = 1 - ⟨ ψ | U_{target}^{†} O U_{target} | ψ ⟩

L_{local} = \frac{1}{n} \sum_{i = 1}^{n} (1 - ⟨ ψ | U_{target}^{†} O_{i} U_{target} | ψ ⟩)

Local cost functions have gradients that vanish polynomially rather than exponentially.

2. Structured Ansatz:

Use ansatzes with built-in structure (e.g., symmetry-preserving, problem-inspired):

QAOA ansatz for combinatorial optimization
UCCSD ansatz for chemistry
Hardware-efficient ansatz with limited depth

3. Parameter Initialization:

Identity initialization: Start with parameters close to the identity circuit ( $θ \approx 0$ )
Layer-by-layer training: Train one layer at a time, adding layers incrementally
Classical pre-training: Use classical optimization to find a good initial point

4. Gradient Preserving Architectures:

Use circuits where the gradient magnitude is provably lower-bounded
Examples: quantum convolutional neural networks (QCNN), certain hierarchical circuits

Gradient Computation Methods

Overview

Computing gradients of the expectation value $⟨ H ⟩_{\vec{θ}}$ with respect to parameters $\vec{θ}$ is central to optimizing variational quantum algorithms. Several methods exist with different trade-offs:

Parameter-Shift Rule

The parameter-shift rule provides an exact gradient using two circuit evaluations per parameter:

\frac{\partial}{\partial θ_{i}} ⟨ H ⟩ = \frac{⟨ H ⟩_{θ_{i} + s} - ⟨ H ⟩_{θ_{i} - s}}{2 \sin (s)}

where $s$ is the shift value (typically $s = π / 2$ ).

For generators $G$ with eigenvalues $\pm r / 2$ (e.g., Pauli generators):

\frac{\partial}{\partial θ_{i}} ⟨ H ⟩ = r (⟨ H ⟩_{θ_{i} + π / (2 r)} - ⟨ H ⟩_{θ_{i} - π / (2 r)})

Cost: $2 p$ circuit evaluations for $p$ parameters.

Advantages:

Exact gradient (no approximation error)
Hardware-compatible (only needs forward evaluations)
Works with shot noise (statistical error only)

Adjoint Differentiation

Adjoint differentiation computes the gradient in $O (1)$ circuit evaluations regardless of the number of parameters. This is the method used by pyqpanda3's ADJOINT_DIFF:

\frac{\partial}{\partial θ_{i}} ⟨ H ⟩ = Re [⟨ ϕ_{L} | \frac{\partial U}{\partial θ_{i}} | ϕ_{R} ⟩]

where $| ϕ_{L} ⟩ = U^{†} | ψ_{meas} ⟩$ and $| ϕ_{R} ⟩ = U | ψ_{init} ⟩$ .

Algorithm (Jones & Gacon, 2020):

Forward pass: compute $| ϕ_{R} ⟩ = U (\vec{θ}) | 0 ⟩$
Backward pass: apply gates in reverse, computing inner products
Total: 2 circuit evaluations (one forward, one backward) regardless of parameter count

Cost: 2 circuit evaluations total.

Advantages:

Dramatically faster for large parameter counts: $O (1)$ vs $O (2 p)$
Exact gradient
Well-suited for simulation-based optimization

Limitations:

Requires storing intermediate state vectors (memory: $O (2^{n})$ )
Only applicable in statevector simulation, not hardware measurements
Implementation-dependent on the simulator

Usage in pyqpanda3:

python

from pyqpanda3.vqcircuit import VQCircuit, DiffMethod
from pyqpanda3.hamiltonian import Hamiltonian
import numpy as np

vqc = VQCircuit()
# ... build ansatz ...

params = np.array([0.1, 0.2, 0.3, 0.4])
observable = Hamiltonian(...)

# Get gradients using adjoint differentiation
gradients = vqc.get_gradients(params, observable, DiffMethod.ADJOINT_DIFF)

# Get both gradients and expectation value simultaneously
result = vqc.get_gradients_and_expectation(params, observable, DiffMethod.ADJOINT_DIFF)
expectation = result.expectation_val()
grads = result.gradients()

Finite Differences

The simplest gradient approximation:

\frac{\partial}{\partial θ_{i}} ⟨ H ⟩ \approx \frac{⟨ H ⟩_{θ_{i} + h} - ⟨ H ⟩_{θ_{i} - h}}{2 h}

Cost: $2 p$ circuit evaluations for $p$ parameters.

Disadvantages:

Approximation error $\sim O (h^{2})$
Choosing $h$ : too small → noise dominated; too large → approximation error
Not recommended for quantum computing due to shot noise sensitivity

Linear Combination of Unitaries (LCU)

An advanced method using ancilla qubits:

\frac{\partial}{\partial θ_{i}} ⟨ H ⟩ = Re [⟨ 0 | V_{i}^{†} U^{†} H U V_{i} | 0 ⟩]

Can be implemented with Hadamard tests or iterative QPE, but requires ancilla qubits and controlled versions of gates.

Comparison Table

Method	Circuit Evals	Exact?	Hardware?	Memory
Parameter-Shift	$2 p$	Yes	Yes	$O (n)$ qubits
Adjoint (ADJOINT_DIFF)	2	Yes	No (simulation only)	$O (2^{n})$ statevector
Finite Differences	$2 p$	No ( $O (h^{2})$ )	Yes	$O (n)$ qubits
LCU	$O (1)$ + ancilla	Yes	Yes (with ancilla)	$O (n + k)$ qubits

For pyqpanda3 simulations, ADJOINT_DIFF is the recommended method due to its $O (1)$ scaling. For hardware experiments, the parameter-shift rule is the standard choice.

Parameter Management in pyqpanda3

Multi-Dimensional Parameters

pyqpanda3 supports multi-dimensional parameter arrays through the set_Param() and Param() methods:

python

vqc = VQCircuit()

# Set parameter dimensions: e.g., 3 layers × 4 parameters
vqc.set_Param([3, 4], ["layer", "param"])

# Access a specific parameter element
theta_00 = vqc.Param([0, 0], "theta_00")  # Layer 0, param 0
theta_12 = vqc.Param([1, 2], "theta_12")  # Layer 1, param 2

This is useful for:

Batch optimization: Evaluate gradients for multiple parameter sets simultaneously
Structured ansatzes: Map parameters to physical structure (e.g., layer × qubit)
Transfer learning: Share parameters between circuit blocks

Batch Gradient Computation

For $N$ sets of parameters, pyqpanda3 provides batch gradient computation:

python

# Single parameter set
grads = vqc.get_gradients(params_1d, observable, DiffMethod.ADJOINT_DIFF)
# Returns: ResGradients (1 set of gradients)

# N parameter sets (flat array, row-major order)
grads_n = vqc.get_gradients(params_flat, observable, N, DiffMethod.ADJOINT_DIFF)
# Returns: ResNGradients (N sets of gradients)

Result Classes

Class	Method	Returns
`ResGradients`	`get_gradients(params, H, diff)`	Gradients for 1 parameter set
`ResNGradients`	`get_gradients(params, H, N, diff)`	Gradients for N parameter sets
`ResGradientsAndExpectation`	`get_gradients_and_expectation(params, H, diff)`	Gradients + expectation for 1 set
`ResNResGradientsAndExpectation`	`get_gradients_and_expectation(params, H, N, diff)`	Gradients + expectation for N sets

Practical Considerations

Ansatz Selection Guide

Problem Type	Recommended Ansatz	Rationale
Molecular ground state	UCCSD / k-UpCCGSD	Chemically motivated, systematic improvability
Combinatorial optimization	QAOA	Theoretical guarantees, structured
General optimization	Hardware-efficient	Shallow depth, hardware-native
Quantum machine learning	Layered rotations + entanglement	Expressibility, trainable
Dynamical simulation	Trotterized evolution	Physically motivated

Optimizer Selection Guide

Scenario	Recommended Optimizer
Simulation (exact gradients)	L-BFGS-B with adjoint differentiation
Noisy simulation	Adam with learning rate scheduling
Hardware (shot noise)	SPSA or parameter-shift + Adam
Few parameters	COBYLA (derivative-free)
Many parameters	Adam or natural gradient methods

Performance Tips

Use adjoint differentiation when simulating: $O (1)$ vs $O (p)$ gradient evaluations
Group Pauli measurements: Commuting Pauli strings can be measured simultaneously
Use get_gradients_and_expectation: Computes both in a single call, more efficient than separate computations
Warm-start from classical solutions: Initialize parameters from classically solvable approximations
Monitor convergence: Track both energy and gradient norm; stop when gradient norm is below threshold

Variational Quantum Algorithms ​

The Variational Principle ​

Rayleigh-Ritz Variational Principle ​

Energy as a Cost Function ​

Variational Quantum Circuits as Trial States ​

Expressibility ​

VQE — Variational Quantum Eigensolver ​

Algorithm Overview ​

Hamiltonian Decomposition ​

Ansatz Design Strategies ​

Classical Optimization ​

Shot Noise and Statistical Error ​

Convergence Analysis ​

QAOA — Quantum Approximate Optimization Algorithm ​

Algorithm Overview ​

MaxCut Example ​

Approximation Ratio ​

QAOA Variants ​

Barren Plateaus ​

The Vanishing Gradient Problem ​

Causes of Barren Plateaus ​

Mitigation Strategies ​

Gradient Computation Methods ​

Overview ​

Parameter-Shift Rule ​

Adjoint Differentiation ​

Finite Differences ​

Linear Combination of Unitaries (LCU) ​

Comparison Table ​

Parameter Management in pyqpanda3 ​

Multi-Dimensional Parameters ​

Batch Gradient Computation ​

Result Classes ​

Practical Considerations ​

Ansatz Selection Guide ​

Optimizer Selection Guide ​

Performance Tips ​

See Also ​

Variational Quantum Algorithms

The Variational Principle

Rayleigh-Ritz Variational Principle

Energy as a Cost Function

Variational Quantum Circuits as Trial States

Expressibility

VQE — Variational Quantum Eigensolver

Algorithm Overview

Hamiltonian Decomposition

Ansatz Design Strategies

Classical Optimization

Shot Noise and Statistical Error

Convergence Analysis

QAOA — Quantum Approximate Optimization Algorithm

Algorithm Overview

MaxCut Example

Approximation Ratio

QAOA Variants

Barren Plateaus

The Vanishing Gradient Problem

Causes of Barren Plateaus

Mitigation Strategies

Gradient Computation Methods

Overview

Parameter-Shift Rule

Adjoint Differentiation

Finite Differences

Linear Combination of Unitaries (LCU)

Comparison Table

Parameter Management in pyqpanda3

Multi-Dimensional Parameters

Batch Gradient Computation

Result Classes

Practical Considerations

Ansatz Selection Guide

Optimizer Selection Guide

Performance Tips

See Also