Skip to content

Variational Quantum Algorithms

Theory of variational quantum algorithms including the variational principle, VQE, QAOA, barren plateaus, and gradient computation methods. This guide connects the theoretical foundations to pyqpanda3's variational quantum circuit (VQC) framework.


The Variational Principle

Rayleigh-Ritz Variational Principle

The foundation of all variational quantum algorithms is the Rayleigh-Ritz variational principle:

For any parameterized quantum state |ψ(θ) and any Hamiltonian H, the expectation value of the energy provides an upper bound on the ground state energy E0:

ψ(θ)|H|ψ(θ)E0

Equality holds if and only if |ψ(θ) is the ground state of H.

Energy as a Cost Function

The variational principle allows us to formulate the ground state problem as an optimization:

E0=minθψ(θ)|H|ψ(θ)

The loss function (energy) is:

L(θ)=ψ(θ)|H|ψ(θ)

This loss function has important properties:

  • Non-negative curvature: L(θ)E0λmin(H)
  • Smooth: L is a smooth function of θ for analytic ansatzes
  • Observable: L(θ) can be estimated from quantum measurements

Variational Quantum Circuits as Trial States

In pyqpanda3, the trial state |ψ(θ) is prepared by a variational quantum circuit (VQC) — a parameterized quantum circuit:

|ψ(θ)=U(θ)|0n

where U(θ) is a unitary circuit parameterized by θ=(θ1,θ2,,θp).

The VQC is constructed using pyqpanda3's VQCircuit class:

python
from pyqpanda3.vqcircuit import VQCircuit
from pyqpanda3.core import RY, CNOT
import numpy as np

# Build a hardware-efficient ansatz
vqc = VQCircuit()
vqc << RY(0, vqc.Param([0], "theta_0"))
vqc << RY(1, vqc.Param([1], "theta_1"))
vqc << CNOT(0, 1)
vqc << RY(0, vqc.Param([2], "theta_2"))
vqc << RY(1, vqc.Param([3], "theta_3"))

Expressibility

The expressibility of a VQC measures how well it can explore the Hilbert space. A perfectly expressible ansatz generates the Haar-uniform distribution of unitaries. Less expressible ansatzes may not be able to reach the optimal solution.

Expressibility is measured by the deviation from the Haar distribution:

E=DKL(Pansatz|PHaar)

A well-designed ansatz balances expressibility against trainability (avoiding barren plateaus).


VQE — Variational Quantum Eigensolver

Algorithm Overview

The Variational Quantum Eigensolver (VQE) is a hybrid quantum-classical algorithm for finding the ground state energy of a Hamiltonian:

Hamiltonian Decomposition

To measure the energy on a quantum computer, the Hamiltonian must be decomposed into measurable Pauli strings:

H=ihiPi

where Pi{I,X,Y,Z}n are Pauli operators and hi are coefficients.

The energy expectation is then:

H=ihiPi

Each Pi is estimated from quantum measurements. Pauli strings that commute can be measured simultaneously (grouped into the same measurement circuit).

pyqpanda3 provides:

  • expval_hamiltonian(prog, hamiltonian, qvm, shots) — direct expectation value computation
  • expval_pauli_operator(prog, pauli_op, qvm, shots) — Pauli operator expectation

Ansatz Design Strategies

Hardware-Efficient Ansatz:

Designed to match the native gate set and connectivity of the target hardware:

U(θ)=l=1L[iRY(θl,i)]ENTANGLE

where ENTANGLE is a layer of CNOT gates matching the hardware topology.

  • Pros: Shallow depth, native gates, fewer SWAP overhead
  • Cons: May require many layers, susceptible to barren plateaus

Chemically-Inspired Ansatz (Unitary Coupled Cluster):

Based on the unitary coupled cluster singles and doubles (UCCSD):

U(θ)=eT(θ)T(θ)

where T(θ)=iaθiaaaai+i<j,a<bθijabaaabajai

  • Pros: Physically motivated, few parameters for small molecules
  • Cons: Deep circuits, requires Trotterization

Adaptive Ansatz (ADAPT-VQE):

Grow the ansatz iteratively by adding the operator with the largest gradient:

Uk+1=eθk+1Ak+1Uk

where Ak+1=argmaxApool|EθA|θA=0|

  • Pros: Compact ansatz, systematic construction
  • Cons: Requires gradient pool evaluation at each step

Classical Optimization

The classical optimizer updates parameters to minimize the energy:

θk+1=θkηL(θk)

Common optimizers used with VQE:

OptimizerTypeProsCons
Gradient DescentFirst-orderSimple, stableSlow convergence
AdamFirst-order (adaptive)Fast, robustMay oscillate
L-BFGS-BQuasi-NewtonSuperlinear convergenceNoisy gradient sensitive
COBYLADerivative-freeHandles noiseSlow for many parameters
SPSAStochasticNoise-tolerant, O(1) measurements per stepSlow convergence
SNOBFITSurrogate modelGood for noisy landscapesExpensive per iteration

Shot Noise and Statistical Error

The energy estimate has statistical uncertainty from finite measurement shots:

σE=ihi2Var(Pi)Nshots

where Var(Pi)=1Pi2 for Pauli observables.

Rule of thumb: Doubling the accuracy requires 4× more shots (standard quantum limit).

Convergence Analysis

VQE convergence depends on:

  1. Ansatz quality: Can the ansatz reach the ground state?
  2. Optimizer efficiency: How quickly does the optimizer find the minimum?
  3. Noise resilience: How much does noise affect the energy estimate?

VQE is generally considered noise-resilient because:

  • The variational principle still holds (noisy energy is still an upper bound)
  • The optimizer can partially compensate for systematic noise
  • Error bars from shot noise can guide the optimizer

QAOA — Quantum Approximate Optimization Algorithm

Algorithm Overview

QAOA is designed for combinatorial optimization problems. Given a cost function C(x) on bitstrings x{0,1}n, QAOA prepares a state that approximately minimizes C:

|γ,β=p=1PeiβpHMeiγpHC|+n

where:

  • HC=xC(x)|xx| is the cost Hamiltonian (diagonal in computational basis)
  • HM=iXi is the mixer Hamiltonian
  • γ=(γ1,,γP) and β=(β1,,βP) are the variational parameters
  • P is the QAOA depth (number of alternating layers)

MaxCut Example

The canonical QAOA application is MaxCut. Given a graph G=(V,E):

Cost Hamiltonian:

HC=(i,j)E12(IZiZj)

This assigns energy 1 to edges whose vertices have different Z values (i.e., are in different partitions) and energy +1 to edges in the same partition.

Mixer Hamiltonian:

HM=iVXi

QAOA circuit construction:

Each cost layer eiγpHC decomposes into RZZ gates:

eiγpHC=(i,j)EeiγpZiZj/2

Each mixer layer eiβpHM decomposes into RX gates:

eiβpHM=ieiβpXi=iRX(2βp)i

Approximation Ratio

The approximation ratio measures QAOA quality:

α=HCQAOAopt(HC)
  • P=1: α=0.6924 for MaxCut on 3-regular graphs (guaranteed by QAOA theory)
  • P: α1 (QAOA converges to optimal)
  • In practice, P=5 to 20 often achieves α>0.9

QAOA Variants

VariantDescriptionAdvantage
Standard QAOAFixed depth P, optimize γ,βTheoretical guarantees
Warm-Start QAOAInitialize from classical solutionBetter initial point
Recursive QAOAFix qubits iterativelyReduces problem size
Multi-angle QAOADifferent angles per gateMore expressive
QAOA with constraintsPenalty terms in HCHandles constrained problems

Barren Plateaus

The Vanishing Gradient Problem

Barren plateaus occur when the gradient of the cost function vanishes exponentially with the number of qubits:

Var[Lθi]O(12n)

This means that for a system of n qubits, the gradient is exponentially small, making optimization intractable.

Causes of Barren Plateaus

1. Expressibility-induced: Highly expressible ansatzes that cover the entire Hilbert space tend to concentrate cost function values around the mean, flattening the landscape.

2. Entanglement-induced: Deep entangling circuits produce global states where local parameter changes have exponentially small effects.

3. Noise-induced: Hardware noise flattens the cost landscape by driving all outputs toward the maximally mixed state.

4. Global cost functions: Cost functions that depend on all qubits simultaneously (e.g., global fidelity) exhibit barren plateaus even for shallow circuits.

Mitigation Strategies

1. Local Cost Functions:

Replace global cost functions with local ones:

Lglobal=1ψ|UtargetOUtarget|ψLlocal=1ni=1n(1ψ|UtargetOiUtarget|ψ)

Local cost functions have gradients that vanish polynomially rather than exponentially.

2. Structured Ansatz:

Use ansatzes with built-in structure (e.g., symmetry-preserving, problem-inspired):

  • QAOA ansatz for combinatorial optimization
  • UCCSD ansatz for chemistry
  • Hardware-efficient ansatz with limited depth

3. Parameter Initialization:

  • Identity initialization: Start with parameters close to the identity circuit (θ0)
  • Layer-by-layer training: Train one layer at a time, adding layers incrementally
  • Classical pre-training: Use classical optimization to find a good initial point

4. Gradient Preserving Architectures:

  • Use circuits where the gradient magnitude is provably lower-bounded
  • Examples: quantum convolutional neural networks (QCNN), certain hierarchical circuits

Gradient Computation Methods

Overview

Computing gradients of the expectation value Hθ with respect to parameters θ is central to optimizing variational quantum algorithms. Several methods exist with different trade-offs:

Parameter-Shift Rule

The parameter-shift rule provides an exact gradient using two circuit evaluations per parameter:

θiH=Hθi+sHθis2sin(s)

where s is the shift value (typically s=π/2).

For generators G with eigenvalues ±r/2 (e.g., Pauli generators):

θiH=r(Hθi+π/(2r)Hθiπ/(2r))

Cost: 2p circuit evaluations for p parameters.

Advantages:

  • Exact gradient (no approximation error)
  • Hardware-compatible (only needs forward evaluations)
  • Works with shot noise (statistical error only)

Adjoint Differentiation

Adjoint differentiation computes the gradient in O(1) circuit evaluations regardless of the number of parameters. This is the method used by pyqpanda3's ADJOINT_DIFF:

θiH=Re[ϕL|Uθi|ϕR]

where |ϕL=U|ψmeas and |ϕR=U|ψinit.

Algorithm (Jones & Gacon, 2020):

  1. Forward pass: compute |ϕR=U(θ)|0
  2. Backward pass: apply gates in reverse, computing inner products
  3. Total: 2 circuit evaluations (one forward, one backward) regardless of parameter count

Cost: 2 circuit evaluations total.

Advantages:

  • Dramatically faster for large parameter counts: O(1) vs O(2p)
  • Exact gradient
  • Well-suited for simulation-based optimization

Limitations:

  • Requires storing intermediate state vectors (memory: O(2n))
  • Only applicable in statevector simulation, not hardware measurements
  • Implementation-dependent on the simulator

Usage in pyqpanda3:

python
from pyqpanda3.vqcircuit import VQCircuit, DiffMethod
from pyqpanda3.hamiltonian import Hamiltonian
import numpy as np

vqc = VQCircuit()
# ... build ansatz ...

params = np.array([0.1, 0.2, 0.3, 0.4])
observable = Hamiltonian(...)

# Get gradients using adjoint differentiation
gradients = vqc.get_gradients(params, observable, DiffMethod.ADJOINT_DIFF)

# Get both gradients and expectation value simultaneously
result = vqc.get_gradients_and_expectation(params, observable, DiffMethod.ADJOINT_DIFF)
expectation = result.expectation_val()
grads = result.gradients()

Finite Differences

The simplest gradient approximation:

θiHHθi+hHθih2h

Cost: 2p circuit evaluations for p parameters.

Disadvantages:

  • Approximation error O(h2)
  • Choosing h: too small → noise dominated; too large → approximation error
  • Not recommended for quantum computing due to shot noise sensitivity

Linear Combination of Unitaries (LCU)

An advanced method using ancilla qubits:

θiH=Re[0|ViUHUVi|0]

Can be implemented with Hadamard tests or iterative QPE, but requires ancilla qubits and controlled versions of gates.

Comparison Table

MethodCircuit EvalsExact?Hardware?Memory
Parameter-Shift2pYesYesO(n) qubits
Adjoint (ADJOINT_DIFF)2YesNo (simulation only)O(2n) statevector
Finite Differences2pNo (O(h2))YesO(n) qubits
LCUO(1) + ancillaYesYes (with ancilla)O(n+k) qubits

For pyqpanda3 simulations, ADJOINT_DIFF is the recommended method due to its O(1) scaling. For hardware experiments, the parameter-shift rule is the standard choice.


Parameter Management in pyqpanda3

Multi-Dimensional Parameters

pyqpanda3 supports multi-dimensional parameter arrays through the set_Param() and Param() methods:

python
vqc = VQCircuit()

# Set parameter dimensions: e.g., 3 layers × 4 parameters
vqc.set_Param([3, 4], ["layer", "param"])

# Access a specific parameter element
theta_00 = vqc.Param([0, 0], "theta_00")  # Layer 0, param 0
theta_12 = vqc.Param([1, 2], "theta_12")  # Layer 1, param 2

This is useful for:

  • Batch optimization: Evaluate gradients for multiple parameter sets simultaneously
  • Structured ansatzes: Map parameters to physical structure (e.g., layer × qubit)
  • Transfer learning: Share parameters between circuit blocks

Batch Gradient Computation

For N sets of parameters, pyqpanda3 provides batch gradient computation:

python
# Single parameter set
grads = vqc.get_gradients(params_1d, observable, DiffMethod.ADJOINT_DIFF)
# Returns: ResGradients (1 set of gradients)

# N parameter sets (flat array, row-major order)
grads_n = vqc.get_gradients(params_flat, observable, N, DiffMethod.ADJOINT_DIFF)
# Returns: ResNGradients (N sets of gradients)

Result Classes

ClassMethodReturns
ResGradientsget_gradients(params, H, diff)Gradients for 1 parameter set
ResNGradientsget_gradients(params, H, N, diff)Gradients for N parameter sets
ResGradientsAndExpectationget_gradients_and_expectation(params, H, diff)Gradients + expectation for 1 set
ResNResGradientsAndExpectationget_gradients_and_expectation(params, H, N, diff)Gradients + expectation for N sets

Practical Considerations

Ansatz Selection Guide

Problem TypeRecommended AnsatzRationale
Molecular ground stateUCCSD / k-UpCCGSDChemically motivated, systematic improvability
Combinatorial optimizationQAOATheoretical guarantees, structured
General optimizationHardware-efficientShallow depth, hardware-native
Quantum machine learningLayered rotations + entanglementExpressibility, trainable
Dynamical simulationTrotterized evolutionPhysically motivated

Optimizer Selection Guide

ScenarioRecommended Optimizer
Simulation (exact gradients)L-BFGS-B with adjoint differentiation
Noisy simulationAdam with learning rate scheduling
Hardware (shot noise)SPSA or parameter-shift + Adam
Few parametersCOBYLA (derivative-free)
Many parametersAdam or natural gradient methods

Performance Tips

  1. Use adjoint differentiation when simulating: O(1) vs O(p) gradient evaluations
  2. Group Pauli measurements: Commuting Pauli strings can be measured simultaneously
  3. Use get_gradients_and_expectation: Computes both in a single call, more efficient than separate computations
  4. Warm-start from classical solutions: Initialize parameters from classically solvable approximations
  5. Monitor convergence: Track both energy and gradient norm; stop when gradient norm is below threshold

See Also

Released under the MIT License.