Simulation

This tutorial covers all quantum simulators available in pyqpanda3, explaining when and how to use each one.

Overview

pyqpanda3 provides multiple simulator backends, each optimized for different use cases:

Simulator	State Representation	Noise Support	Best For
CPUQVM	State vector	Yes (shot-based)	General-purpose simulation
GPUQVM	State vector (GPU)	Yes	Large circuits with CUDA
DensityMatrixSimulator	Density matrix	Yes (exact)	Noise analysis, state tomography
Stabilizer	Stabilizer state	Yes	Clifford circuits, fast benchmarking
PartialAmplitudeQVM	Partial state vector	No	Large circuits, specific amplitudes

CPUQVM

The CPUQVM is the primary statevector simulator. It simulates quantum circuits by maintaining the full state vector of the quantum system and applying unitary operations directly.

Basic Usage

Create a CPUQVM instance and run a quantum program with a specified number of measurement shots:

python

from pyqpanda3 import core

# Build a Bell state circuit
prog = core.QProg()
prog << core.H(0)
prog << core.CNOT(0, 1)
prog << core.measure([0, 1], [0, 1])

# Create simulator and run
machine = core.CPUQVM()
machine.run(prog, shots=10000)

# Get measurement results
result = machine.result()
counts = result.get_counts()
print(f"Measurement counts: {counts}")
# Expected output: {'00': ~5000, '11': ~5000}

Getting Results

The QResult object provides several methods for extracting simulation data:

python

# Get measurement counts as a dictionary
counts = result.get_counts()
# Returns: {'00': 5023, '11': 4977}

# Get probabilities for specific qubits
prob_list = result.get_prob_list([0, 1])
# Returns: [0.5023, 0.0, 0.0, 0.4977]

# Get probabilities as a dictionary
prob_dict = result.get_prob_dict([0, 1])
# Returns: {'00': 0.5023, '01': 0.0, '10': 0.0, '11': 0.4977}

# Get the number of shots
num_shots = result.shots()
print(f"Shots: {num_shots}")

# Print formatted results
result.print_results()

Getting the State Vector

For circuits without measurement, the state vector represents the quantum state directly:

python

# Build a circuit without measurement
prog = core.QProg()
prog << core.H(0) << core.CNOT(0, 1)

machine = core.CPUQVM()
machine.run(prog, shots=1)

# Get the state vector
sv = machine.result().get_state_vector()
print(f"State vector: {sv}")
# Expected: [1/sqrt(2), 0, 0, 1/sqrt(2)]

The state vector $| ψ ⟩$ for $n$ qubits has $2^{n}$ complex amplitudes:

| ψ ⟩ = \sum_{i = 0}^{2^{n} - 1} α_{i} | i ⟩

where $| i ⟩$ denotes the computational basis state and $α_{i}$ are complex amplitudes satisfying:

\sum_{i = 0}^{2^{n} - 1} | α_{i} |^{2} = 1

Running with Noise

CPUQVM supports noise simulation through the NoiseModel parameter:

python

from pyqpanda3 import core

# Create a noise model
noise = core.NoiseModel()
error = core.depolarizing_error(0.01)
noise.add_all_qubit_quantum_error(error, core.GateType.CNOT)

# Build and run noisy circuit
prog = core.QProg()
prog << core.H(0) << core.CNOT(0, 1)
prog << core.measure([0, 1], [0, 1])

machine = core.CPUQVM()
machine.run(prog, shots=10000, model=noise)
print(machine.result().get_counts())
# With noise, may see small counts in '01' and '10'

Output:

{'00': 4950, '11': 4920, '01': 70, '10': 60}

CPUQVM Advanced Features

Beyond basic circuit execution, CPUQVM provides several advanced capabilities for extracting detailed information about the quantum state and optimizing circuit structure.

Flattening Quantum Programs

Complex circuits often contain nested subcircuits, control flow operations, or composite gates. Flattening a program resolves these into a linear sequence of primitive gates, which reduces simulation overhead and makes the circuit structure explicit.

python

from pyqpanda3 import core

# Build a circuit with nested structure
prog = core.QProg()
prog << core.H(0)
prog << core.H(1)
prog << core.CNOT(0, 1)
prog << core.measure([0, 1], [0, 1])

# Flatten the program to a linear gate sequence
flat_prog = prog.flatten()

# Run the flattened program
machine = core.CPUQVM()
machine.run(flat_prog, shots=10000)
print(machine.result().get_counts())

Flattening is especially useful before profiling or transpilation because it ensures that every operation is a primitive gate. The relationship between a circuit and its flattened form is:

| ψ_{flat} ⟩ = U_{flat} | 0^{n} ⟩ = U_{k} \dots U_{2} U_{1} | 0^{n} ⟩

where each $U_{i}$ is a single primitive gate applied in sequence.

Accessing the Full State Vector

When running a circuit without measurement, you can retrieve the complete state vector to inspect individual amplitudes and phases.

python

from pyqpanda3 import core
import cmath

# Prepare a superposition across 3 qubits
prog = core.QProg()
prog << core.H(0) << core.H(1) << core.H(2)

machine = core.CPUQVM()
machine.run(prog, shots=1)

# Retrieve the full state vector
sv = machine.result().get_state_vector()

# Inspect amplitude magnitudes and phases
for idx, amp in enumerate(sv):
    binary = format(idx, '03b')
    magnitude = abs(amp)
    phase = cmath.phase(amp)
    print(f"|{binary}>: amplitude = {amp:.4f}, "
          f"|amp| = {magnitude:.4f}, phase = {phase:.4f}")

For $n$ qubits in an equal superposition, each amplitude is:

α_{i} = \frac{1}{\sqrt{2^{n}}} \forall i \in {0, \dots, 2^{n} - 1}

Extracting Probability Distributions

Probability distributions can be extracted for any subset of qubits, enabling marginal analysis of multi-qubit states.

python

from pyqpanda3 import core

# Build an entangled 3-qubit state (GHZ state)
prog = core.QProg()
prog << core.H(0)
prog << core.CNOT(0, 1)
prog << core.CNOT(1, 2)

machine = core.CPUQVM()
machine.run(prog, shots=1)

# Get the full probability distribution across all 3 qubits
result = machine.result()
prob_dict = result.get_prob_dict([0, 1, 2])
print("Full probability distribution:")
for state, prob in sorted(prob_dict.items()):
    print(f"  P(|{state}>) = {prob:.6f}")

# Get marginal probability for just qubits 0 and 2
prob_marginal = result.get_prob_dict([0, 2])
print("\nMarginal distribution (qubits 0, 2):")
for state, prob in sorted(prob_marginal.items()):
    print(f"  P(|{state}>) = {prob:.6f}")

# Get probability as an ordered list
prob_list = result.get_prob_list([0, 1, 2])
print(f"\nProbability list: {[f'{p:.6f}' for p in prob_list]}")

For a GHZ state $| GHZ ⟩ = \frac{1}{\sqrt{2}} (| 000 ⟩ + | 111 ⟩)$ , the probability distribution is:

P (x) = {\begin{cases} 0.5 & if x = 000 or x = 111 \\ 0 & otherwise \end{cases}

GPUQVM

The GPUQVM provides CUDA-accelerated simulation for large quantum circuits. It uses the same interface as CPUQVM but leverages GPU parallelism for state vector operations.

Note: GPUQVM is only available when pyqpanda3 is compiled with USE_CUDA=ON.

Usage

python

from pyqpanda3 import core

prog = core.QProg()
prog << core.H(0) << core.CNOT(0, 1) << core.measure([0, 1], [0, 1])

# Create GPU simulator
gpu_machine = core.GPUQVM()
gpu_machine.run(prog, shots=10000)

result = gpu_machine.result()
print(result.get_counts())

When to Use GPUQVM

GPU simulation becomes advantageous when:

The number of qubits exceeds 20 (state vector size > $2^{20} \times 16$ bytes = 16 MB)
You need many shots for statistical accuracy
The circuit depth is moderate but the gate count is high

The memory requirement for $n$ qubits is $2^{n} \times 16$ bytes (complex double). For example:

Qubits	State Vector Size	Memory
20	$2^{20}$	16 MB
25	$2^{25}$	512 MB
30	$2^{30}$	16 GB

DensityMatrixSimulator

The DensityMatrixSimulator maintains the full density matrix $ρ$ of the quantum system, enabling exact noise simulation and mixed state analysis.

What is a Density Matrix?

A density matrix generalizes the state vector to describe both pure and mixed quantum states:

ρ = \sum_{i} p_{i} | ψ_{i} ⟩ ⟨ ψ_{i} |

For a pure state $| ψ ⟩$ , the density matrix is:

ρ = | ψ ⟩ ⟨ ψ |

Basic Usage

python

from pyqpanda3 import core

# Create density matrix simulator
dm_sim = core.DensityMatrixSimulator()

# Build circuit (no measurement needed)
prog = core.QProg()
prog << core.H(0) << core.CNOT(0, 1)

# Run simulation
dm_sim.run(prog)

# Get the density matrix
dm = dm_sim.density_matrix()
print(f"Density matrix shape: {dm.shape}")
print(f"Density matrix:\n{dm}")

Probabilities

The DensityMatrixSimulator provides several methods for computing probabilities:

python

# Probability of a specific state by index
prob_0 = dm_sim.state_prob(0)
print(f"P(|00⟩) = {prob_0}")

# Probability by binary string
prob_01 = dm_sim.state_prob("01")
print(f"P(|01⟩) = {prob_01}")

# All probabilities
all_probs = dm_sim.state_probs()
print(f"All probabilities: {all_probs}")

# Probabilities for specific qubits
qubit_probs = dm_sim.state_probs([0, 1])
print(f"Qubit [0,1] probabilities: {qubit_probs}")

The probability of measuring state $| i ⟩$ is:

P (i) = ⟨ i | ρ | i ⟩ = ρ_{i i}

Reduced Density Matrix

The partial trace operation computes the reduced density matrix for a subset of qubits:

python

# Get reduced density matrix for qubit 0
reduced = dm_sim.reduced_density_matrix([0])
print(f"Reduced density matrix for qubit 0:\n{reduced}")

The partial trace over qubit $B$ of a bipartite system:

ρ_{A} = {Tr}_{B} (ρ_{A B}) = \sum_{j} ⟨ j_{B} | ρ_{A B} | j_{B} ⟩

Noise Simulation

The DensityMatrixSimulator excels at exact noise simulation:

python

from pyqpanda3 import core

# Create noise model
noise = core.NoiseModel()
noise.add_all_qubit_quantum_error(
    core.amplitude_damping_error(0.05),
    core.GateType.H
)
noise.add_all_qubit_quantum_error(
    core.depolarizing_error(0.01),
    core.GateType.CNOT
)

# Run with noise
dm_sim = core.DensityMatrixSimulator()
prog = core.QProg()
prog << core.H(0) << core.CNOT(0, 1)
dm_sim.run(prog, noise)

# Check resulting mixed state
dm = dm_sim.density_matrix()
print(f"Trace: {dm_sim.density_matrix().trace()}")
print(f"State probabilities: {dm_sim.state_probs()}")

After noise, the evolution follows:

ρ^{'} = \sum_{i} K_{i} U ρ U^{†} K_{i}^{†}

where $K_{i}$ are the Kraus operators of the noise channel.

Density Matrix Operations

The DensityMatrixSimulator provides powerful tools for analyzing quantum states beyond simple probability extraction. These operations are essential for quantum information analysis, entanglement quantification, and noise characterization.

Partial Trace and Reduced Density Matrices

The partial trace operation lets you focus on a subsystem by tracing out (marginalizing over) the remaining qubits. This is the quantum analog of marginalizing a classical joint probability distribution.

python

from pyqpanda3 import core
import numpy as np

# Create a maximally entangled Bell state
dm_sim = core.DensityMatrixSimulator()
prog = core.QProg()
prog << core.H(0) << core.CNOT(0, 1)
dm_sim.run(prog)

# Full 2-qubit density matrix
dm_full = dm_sim.density_matrix()
print(f"Full density matrix shape: {dm_full.shape}")
# Output: (4, 4)

# Reduced density matrix for qubit 0 only
reduced_q0 = dm_sim.reduced_density_matrix([0])
print(f"Reduced density matrix for qubit 0:\n{reduced_q0}")
# For a Bell state, each single-qubit reduced state is maximally mixed

# Reduced density matrix for qubit 1 only
reduced_q1 = dm_sim.reduced_density_matrix([1])
print(f"Reduced density matrix for qubit 1:\n{reduced_q1}")

For the Bell state $| Φ^{+} ⟩ = \frac{1}{\sqrt{2}} (| 00 ⟩ + | 11 ⟩)$ , the full density matrix is:

ρ_{A B} = | Φ^{+} ⟩ ⟨ Φ^{+} | = \frac{1}{2} (\begin{matrix} 1 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 1 & 0 & 0 & 1 \end{matrix})

Tracing out qubit B yields the maximally mixed state for qubit A:

ρ_{A} = {Tr}_{B} (ρ_{A B}) = \frac{I}{2} = \frac{1}{2} (\begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix})

Purity Calculation

Purity quantifies how "mixed" a quantum state is. A pure state has purity 1, while a maximally mixed state on $d$ dimensions has purity $1 / d$ .

python

from pyqpanda3 import core
import numpy as np

dm_sim = core.DensityMatrixSimulator()

# Case 1: Pure state (no noise)
prog_pure = core.QProg()
prog_pure << core.H(0) << core.CNOT(0, 1)
dm_sim.run(prog_pure)
dm_pure = dm_sim.density_matrix()

# Purity is Tr(rho^2)
purity_pure = np.trace(dm_pure @ dm_pure).real
print(f"Pure state purity: {purity_pure:.6f}")
# Output: 1.000000

# Case 2: Mixed state (with depolarizing noise)
noise = core.NoiseModel()
noise.add_all_qubit_quantum_error(
    core.depolarizing_error(0.1),
    core.GateType.CNOT
)
dm_sim2 = core.DensityMatrixSimulator()
dm_sim2.run(prog_pure, noise)
dm_noisy = dm_sim2.density_matrix()

purity_noisy = np.trace(dm_noisy @ dm_noisy).real
print(f"Noisy state purity: {purity_noisy:.6f}")
# Output: < 1.0 (mixed state)

Purity is defined as:

Tr (ρ^{2}) = \sum_{i, j} | ρ_{i j} |^{2}

The purity satisfies $1 / d \leq Tr (ρ^{2}) \leq 1$ where $d = 2^{n}$ is the Hilbert space dimension.

Von Neumann Entropy

The von Neumann entropy quantifies the amount of uncertainty (or entanglement) in a quantum state. It is zero for pure states and maximal for maximally mixed states.

python

from pyqpanda3 import core
import numpy as np

def von_neumann_entropy(rho):
    """Compute von Neumann entropy S(rho) = -Tr(rho * log2(rho))."""
    eigenvalues = np.linalg.eigvalsh(rho.real)
    # Filter out zero eigenvalues to avoid log(0)
    eigenvalues = eigenvalues[eigenvalues > 1e-12]
    entropy = -np.sum(eigenvalues * np.log2(eigenvalues))
    return entropy

dm_sim = core.DensityMatrixSimulator()

# Pure state: entropy = 0
prog_pure = core.QProg()
prog_pure << core.H(0) << core.CNOT(0, 1)
dm_sim.run(prog_pure)
dm_pure = dm_sim.density_matrix()
print(f"Pure state entropy: {von_neumann_entropy(dm_pure):.6f}")
# Output: 0.000000

# Noisy state: entropy > 0
noise = core.NoiseModel()
noise.add_all_qubit_quantum_error(
    core.depolarizing_error(0.1),
    core.GateType.H
)
noise.add_all_qubit_quantum_error(
    core.depolarizing_error(0.05),
    core.GateType.CNOT
)
dm_sim2 = core.DensityMatrixSimulator()
dm_sim2.run(prog_pure, noise)
dm_noisy = dm_sim2.density_matrix()
print(f"Noisy state entropy: {von_neumann_entropy(dm_noisy):.6f}")
# Output: > 0 (information lost to noise)

# Entanglement entropy via reduced density matrix
reduced = dm_sim.reduced_density_matrix([0])
print(f"Entanglement entropy S(rho_A): {von_neumann_entropy(reduced):.6f}")
# For a Bell state, S(rho_A) = 1.0 (maximum for 1 qubit)

Output:

Pure state entropy: 0.000000
Noisy state entropy: 0.057234
Entanglement entropy S(rho_A): 1.000000

The von Neumann entropy is defined as:

S (ρ) = - Tr (ρ \log_{2} ρ) = - \sum_{i} λ_{i} \log_{2} λ_{i}

where $λ_{i}$ are the eigenvalues of $ρ$ . For a maximally mixed state on $n$ qubits:

S (\frac{I}{2^{n}}) = n

The following table summarizes how these quantities relate to the physical state:

Quantity	Pure State	Maximally Mixed ( $n$ qubits)	Physical Meaning
Purity $Tr (ρ^{2})$	1	$2^{- n}$	Mixedness
Von Neumann $S (ρ)$	0	$n$	Uncertainty
Trace $Tr (ρ)$	1	1	Normalization

Stabilizer

The Stabilizer simulator efficiently simulates Clifford circuits using the stabilizer formalism. It can handle very large numbers of qubits since it only tracks Pauli stabilizer generators rather than the full state vector.

Clifford Gates

The Stabilizer only supports Clifford gates, which map Pauli operators to Pauli operators under conjugation:

H (Hadamard): Maps $X \leftrightarrow Z$ , $Y \leftrightarrow - Y$
S (Phase): Maps $X \to Y$ , $Y \to - X$
CNOT (Controlled-NOT): Maps $X \otimes I \to X \otimes X$ , $I \otimes Z \to Z \otimes Z$

Gates like T, RX, RY (with non- $π / 2$ angles) are NOT Clifford gates.

Usage

python

from pyqpanda3 import core

# Build a Clifford circuit
prog = core.QProg()
prog << core.H(0)
prog << core.S(1)
prog << core.CNOT(0, 1)
prog << core.measure([0, 1], [0, 1])

# Run on Stabilizer simulator
stab = core.Stabilizer()
stab.run(prog, shots=10000)

result = stab.result()
print(result.get_counts())

With Noise

python

# Stabilizer supports noise simulation too
noise = core.NoiseModel()
noise.add_all_qubit_quantum_error(
    core.pauli_x_error(0.01),
    core.GateType.H
)

stab = core.Stabilizer()
stab.run(prog, shots=10000, model=noise)

When to Use Stabilizer

Verifying Clifford circuit correctness
Quantum error correction simulations (surface codes, Steane code)
Benchmarking with large qubit counts (1000+ qubits possible)
Fast sanity checks before running on full simulators

Stabilizer Formalism Details

The stabilizer formalism represents quantum states not by amplitudes but by the set of Pauli operators that leave the state unchanged (stabilize it). For $n$ qubits, a stabilizer state is uniquely specified by $n$ independent commuting Pauli operators.

How Stabilizer Simulation Works

A stabilizer state $| ψ ⟩$ is defined by its stabilizer group $S = {P \in P_{n} : P | ψ ⟩ = | ψ ⟩}$ , where $P_{n}$ is the $n$ -qubit Pauli group. The group is generated by $n$ independent generators $g_{1}, g_{2}, \dots, g_{n}$ :

S = ⟨ g_{1}, g_{2}, \dots, g_{n} ⟩

The key insight is that Clifford gates transform stabilizer generators into new stabilizer generators by conjugation:

g_{k} \to U g_{k} U^{†}

This update takes $O (n^{2})$ time per gate, yielding an overall $O (n^{2} m)$ complexity for a circuit with $m$ gates on $n$ qubits, compared to $O (2^{n} m)$ for state vector simulation.

Stabilizer Generator Examples

Consider the computational basis state $| 00 ⟩$ . Its stabilizer generators are $Z \otimes I$ and $I \otimes Z$ :

python

from pyqpanda3 import core

# Example 1: |00> state
# Stabilizers: ZI, IZ (convention: first letter = qubit 0)
# Means: Z_0|00> = +|00> and Z_1|00> = +|00>

prog = core.QProg()
# No gates applied - state is |00>
prog << core.measure([0, 1], [0, 1])

stab = core.Stabilizer()
stab.run(prog, shots=1000)
print(f"|00> state: {stab.result().get_counts()}")
# Output: {'00': 1000}  (deterministic)

# Example 2: Apply H to qubit 0 -> |+0>
# Stabilizers change: ZI -> XI, IZ stays IZ
# New stabilizers: XI, IZ
prog2 = core.QProg()
prog2 << core.H(0)
prog2 << core.measure([0, 1], [0, 1])

stab2 = core.Stabilizer()
stab2.run(prog2, shots=1000)
print(f"|+0> state: {stab2.result().get_counts()}")
# Output: {'00': ~500, '10': ~500}  (random on qubit 0)

# Example 3: Bell state (H + CNOT)
# |00> --H on q0--> |+0> --CNOT--> (|00>+|11>)/sqrt(2)
# Stabilizers: ZI -> XI -> XX
#              IZ -------> IZ -> ZZ
# Final stabilizers: XX, ZZ
prog3 = core.QProg()
prog3 << core.H(0) << core.CNOT(0, 1)
prog3 << core.measure([0, 1], [0, 1])

stab3 = core.Stabilizer()
stab3.run(prog3, shots=1000)
print(f"Bell state: {stab3.result().get_counts()}")
# Output: {'00': ~500, '11': ~500}  (correlated)

The following table traces the stabilizer generators through the Bell state circuit:

Step	Gate	Generator 1	Generator 2	State
Initial	--	$Z \otimes I$	$I \otimes Z$	$\| 00 ⟩$
After H(0)	H	$X \otimes I$	$I \otimes Z$	$\| + 0 ⟩$
After CNOT(0,1)	CNOT	$X \otimes X$	$Z \otimes Z$	$\frac{1}{\sqrt{2}} (\| 00 ⟩ + \| 11 ⟩)$

Scaling Advantage

The stabilizer representation uses a tableau of size $O (n^{2})$ instead of a state vector of size $O (2^{n})$ :

Qubits $n$	State Vector Size	Stabilizer Tableau Size	Speedup Factor
10	16 KB	~100 bytes	$1.6 \times 10^{5}$
20	16 MB	~400 bytes	$4 \times 10^{7}$
30	16 GB	~900 bytes	$1.8 \times 10^{10}$
100	16 GB $\times 2^{70}$	~10 KB	Intractable vs. instant

PartialAmplitudeQVM

The PartialAmplitudeQVM computes specific amplitudes of the state vector without maintaining the full $2^{n}$ vector. This is useful for large circuits where you only need certain probability amplitudes.

Usage

python

from pyqpanda3 import core

# Build a circuit
prog = core.QProg()
prog << core.H(0) << core.CNOT(0, 1)

# Create partial amplitude simulator
pavm = core.PartialAmplitudeQVM()
pavm.run(prog)

# Get specific amplitudes by basis state string
amplitudes = pavm.get_state_vector(["00", "01", "10", "11"])
print(f"Amplitude |00⟩: {amplitudes[0]}")
print(f"Amplitude |01⟩: {amplitudes[1]}")
print(f"Amplitude |10⟩: {amplitudes[2]}")
print(f"Amplitude |11⟩: {amplitudes[3]}")

When to Use PartialAmplitudeQVM

You need only a subset of amplitudes (e.g., for computing specific observables)
The circuit is too large for full state vector simulation
You want to verify specific basis state probabilities

Expectation Value Computation

pyqpanda3 provides global functions for computing expectation values of observables, which are essential for variational algorithms.

expval_hamiltonian

Compute the expectation value $⟨ ψ | H | ψ ⟩$ of a Hamiltonian:

python

from pyqpanda3 import core, hamiltonian

# Define a Hamiltonian
H = hamiltonian.PauliOperator({"Z0": 1.0, "Z0 Z1": 0.5})

# Build a state preparation circuit
prog = core.QProg()
prog << core.H(0) << core.CNOT(0, 1)

# Compute expectation value
exp_val = core.expval_hamiltonian(
    prog,
    H,
    shots=1000,
    model=core.NoiseModel(),
    used_threads=4,
    backend="CPU"
)
print(f"⟨H⟩ = {exp_val}")

expval_pauli_operator

Compute the expectation value of a Pauli operator:

python

from pyqpanda3 import core, hamiltonian

# Define a Pauli operator
pauli_op = hamiltonian.PauliOperator({"X0 Y1": 0.5, "Z0": 0.3})

# Build circuit
prog = core.QProg()
prog << core.H(0) << core.CNOT(0, 1)

# Compute expectation value with optional noise
noise = core.NoiseModel()
noise.add_all_qubit_quantum_error(
    core.depolarizing_error(0.001),
    core.GateType.CNOT
)

exp_val = core.expval_pauli_operator(
    prog,
    pauli_op,
    shots=1000,
    model=noise,
    used_threads=4,
    backend="CPU"
)
print(f"⟨P⟩ = {exp_val}")

Multi-threaded and GPU Backends

Both expectation value functions support parallelization and GPU backends:

python

# Use 8 threads for parallel computation
exp_val = core.expval_hamiltonian(
    prog, H,
    shots=10000,
    used_threads=8,
    backend="CPU"
)

# Use GPU backend (if compiled with CUDA)
exp_val_gpu = core.expval_hamiltonian(
    prog, H,
    shots=10000,
    used_threads=4,
    backend="GPU"
)

Simulator Selection Guide

Decision Criteria

Scenario	Recommended Simulator	Reason
General quantum circuit	CPUQVM	Full state vector, noise support
Large Clifford circuit	Stabilizer	Polynomial scaling, fast
Exact noise analysis	DensityMatrixSimulator	Exact mixed state evolution
Large circuit, GPU available	GPUQVM	Hardware acceleration
Large circuit, specific amplitudes	PartialAmplitudeQVM	Reduced memory
Variational algorithm	CPUQVM + expval functions	Efficient expectation values
Error correction simulation	Stabilizer	Handles many qubits

Performance Considerations

Memory Usage

For $n$ qubits, the memory requirements are:

Simulator	Memory Complexity	20 qubits	25 qubits	30 qubits
CPUQVM	$O (2^{n})$	16 MB	512 MB	16 GB
GPUQVM	$O (2^{n})$ (GPU VRAM)	16 MB	512 MB	16 GB
DensityMatrixSimulator	$O (4^{n})$	256 MB	256 GB	256 TB
Stabilizer	$O (n^{2})$	~1 KB	~1 KB	~2 KB
PartialAmplitudeQVM	$O (2^{n})$ partial	Variable	Variable	Variable

Speed Tips

Use Stabilizer for Clifford circuits - orders of magnitude faster
Reduce shot count when statistical precision allows (1000 shots often sufficient)
Use multi-threading for expectation value computation (used_threads parameter)
Use GPU backend for circuits with 25+ qubits
Flatten programs before simulation to reduce overhead:

python

from pyqpanda3 import core

prog = core.QProg()
prog << core.H(0) << core.CNOT(0, 1)
flat_prog = prog.flatten()

Circuit Depth and Gate Count Analysis

Before running a simulation, it is useful to profile the circuit to estimate resource requirements. Understanding circuit depth and gate counts helps you choose the right simulator and predict execution time.

Circuit depth is the number of time steps required when gates acting on disjoint qubits are executed in parallel. Gate count is the total number of gates in the circuit.

python

from pyqpanda3 import core

# Build a moderately complex circuit
prog = core.QProg()
prog << core.H(0)
prog << core.H(1)
prog << core.CNOT(0, 1)
prog << core.H(2)
prog << core.CNOT(1, 2)
prog << core.S(0)
prog << core.CNOT(0, 2)
prog << core.measure([0, 1, 2], [0, 1, 2])

# Flatten to get a canonical gate list
flat_prog = prog.flatten()

# The flattened circuit can be inspected to count gates and estimate depth
# Each gate in the sequence represents one operation
print(f"Circuit contains the sequence of primitive gates after flattening")

# Build a smaller sub-circuit to illustrate depth counting
small = core.QProg()
small << core.H(0)       # Layer 1
small << core.H(1)       # Layer 1 (parallel with H(0))
small << core.CNOT(0, 1) # Layer 2 (depends on qubits 0, 1)
small << core.H(2)       # Layer 2 (parallel with CNOT, different qubit)
small << core.CNOT(1, 2) # Layer 3 (depends on qubit 1 from layer 2)
print("Small circuit: depth ~3, 5 gates")

The gate count and depth together determine simulation cost. For a circuit with $m$ gates on $n$ qubits:

State vector simulators (CPUQVM, GPUQVM): Time complexity $O (m \cdot 2^{n})$
Density matrix simulators: Time complexity $O (m \cdot 4^{n})$
Stabilizer simulators: Time complexity $O (m \cdot n^{2})$

Use the following table to estimate whether your circuit fits in memory:

Gates	Qubits	CPUQVM Time Estimate	DensityMatrix Feasible?
100	10	< 1 ms	Yes (~1 MB)
100	20	~10 ms	Barely (~256 MB)
1000	25	~1 s	No (~256 GB)
10000	30	~30 s	No
1000	100	N/A	No; use Stabilizer

Thread Configuration and Performance Tuning

pyqpanda3 provides several mechanisms to control parallelism and optimize performance for your hardware. Understanding these options can significantly reduce simulation time for large circuits.

Controlling Thread Count

The used_threads parameter controls parallelism in expectation value computations and batch simulations:

python

from pyqpanda3 import core

# Build a variational-style circuit (without measurement, for expectation values)
prog = core.QProg()
prog << core.H(0)
prog << core.RY(1, 0.5)
prog << core.CNOT(0, 1)

# Use 1 thread (single-threaded, useful for debugging)
prog_meas = core.QProg()
prog_meas << core.H(0)
prog_meas << core.RY(1, 0.5)
prog_meas << core.CNOT(0, 1)
prog_meas << core.measure([0, 1], [0, 1])
machine_1t = core.CPUQVM()
machine_1t.run(prog_meas, shots=10000)

# For expectation values, control threads explicitly
from pyqpanda3 import hamiltonian
H = hamiltonian.PauliOperator({"Z0": 1.0, "Z0 Z1": 0.5})

# Single-threaded computation
exp_val_1 = core.expval_hamiltonian(
    prog, H, shots=10000, used_threads=1, backend="CPU"
)

# Multi-threaded computation (4 threads)
exp_val_4 = core.expval_hamiltonian(
    prog, H, shots=10000, used_threads=4, backend="CPU"
)

# Maximum parallelism (use all available cores)
import os
max_threads = os.cpu_count()
exp_val_max = core.expval_hamiltonian(
    prog, H, shots=10000, used_threads=max_threads, backend="CPU"
)
print(f"System has {max_threads} CPU cores available")

Choosing the Optimal Thread Count

The optimal number of threads depends on the relationship between circuit size and threading overhead:

T_{total} (p) = \frac{T_{serial}}{p} + T_{overhead} (p)

where $p$ is the number of threads, $T_{serial}$ is the single-threaded time, and $T_{overhead} (p)$ is the synchronization cost.

Follow these guidelines for thread configuration:

Scenario	Recommended Threads	Rationale
Small circuits ( $n < 15$ )	1-2	Threading overhead exceeds benefit
Medium circuits ( $15 \leq n < 25$ )	4-8	Good parallel speedup
Large circuits ( $n \geq 25$ )	All available cores	Maximize throughput
GPU available	Use GPU backend	GPU parallelism far exceeds CPU
Expectation values with many Pauli terms	All cores	Each term can be evaluated independently

Multi-Simulator Comparison Workflow

When developing quantum algorithms, it is valuable to compare results across multiple simulators to validate correctness and understand the impact of noise. This section provides a systematic benchmarking workflow.

Benchmark Design

A good benchmark measures:

Correctness: All simulators produce consistent results for ideal circuits
Performance: Execution time scales as expected with circuit size
Noise impact: Noisy simulators show the expected deviation from ideal results

python

from pyqpanda3 import core
import time

def benchmark_simulator(simulator_cls, prog, shots, label, **kwargs):
    """Run a benchmark on a single simulator and return timing data."""
    start = time.perf_counter()
    sim = simulator_cls()
    sim.run(prog, shots=shots, **kwargs)
    elapsed = time.perf_counter() - start
    result = sim.result()
    return {
        "label": label,
        "time": elapsed,
        "counts": result.get_counts() if hasattr(result, 'get_counts') else None,
    }

# Build a 4-qubit GHZ state for benchmarking
prog_ghz = core.QProg()
prog_ghz << core.H(0)
for i in range(3):
    prog_ghz << core.CNOT(i, i + 1)
prog_ghz << core.measure([0, 1, 2, 3], [0, 1, 2, 3])

# Version without measurement (for density matrix simulator)
prog_ghz_no_meas = core.QProg()
prog_ghz_no_meas << core.H(0)
for i in range(3):
    prog_ghz_no_meas << core.CNOT(i, i + 1)

print("=== Ideal Circuit Benchmark ===")
shots = 10000

# Benchmark CPUQVM
cpu_result = benchmark_simulator(core.CPUQVM, prog_ghz, shots, "CPUQVM")
print(f"CPUQVM: {cpu_result['time']:.4f}s, counts: {cpu_result['counts']}")

# Benchmark Stabilizer
stab_result = benchmark_simulator(core.Stabilizer, prog_ghz, shots, "Stabilizer")
print(f"Stabilizer: {stab_result['time']:.4f}s, counts: {stab_result['counts']}")

# Benchmark DensityMatrixSimulator
start_dm = time.perf_counter()
dm_sim = core.DensityMatrixSimulator()
dm_sim.run(prog_ghz_no_meas)
dm_time = time.perf_counter() - start_dm
dm_probs = dm_sim.state_probs()
print(f"DensityMatrix: {dm_time:.4f}s, probs: {[f'{p:.4f}' for p in dm_probs]}")

Comparing Ideal vs. Noisy Results

python

from pyqpanda3 import core

# Define a noise model with realistic error rates
noise = core.NoiseModel()
noise.add_all_qubit_quantum_error(
    core.depolarizing_error(0.005), core.GateType.H
)
noise.add_all_qubit_quantum_error(
    core.depolarizing_error(0.01), core.GateType.CNOT
)

# Build a 3-qubit circuit
prog = core.QProg()
prog << core.H(0)
prog << core.CNOT(0, 1)
prog << core.CNOT(1, 2)
prog << core.measure([0, 1, 2], [0, 1, 2])

# Ideal simulation
cpu = core.CPUQVM()
cpu.run(prog, shots=10000)
ideal_counts = cpu.result().get_counts()
print(f"Ideal counts:   {ideal_counts}")

# Noisy simulation (CPUQVM, shot-based noise)
cpu_noisy = core.CPUQVM()
cpu_noisy.run(prog, shots=10000, model=noise)
noisy_counts = cpu_noisy.result().get_counts()
print(f"Noisy counts:   {noisy_counts}")

# Exact noise simulation (DensityMatrixSimulator)
prog_no_meas = core.QProg()
prog_no_meas << core.H(0) << core.CNOT(0, 1) << core.CNOT(1, 2)
dm_sim = core.DensityMatrixSimulator()
dm_sim.run(prog_no_meas, noise)
exact_probs = dm_sim.state_probs()
print(f"Exact probs:    {[f'{p:.4f}' for p in exact_probs]}")

# Compute total variation distance between ideal and noisy
ideal_p = {k: v / 10000 for k, v in ideal_counts.items()}
noisy_p = {k: v / 10000 for k, v in noisy_counts.items()}
all_keys = set(list(ideal_p.keys()) + list(noisy_p.keys()))
tvd = sum(abs(ideal_p.get(k, 0) - noisy_p.get(k, 0)) for k in all_keys) / 2
print(f"Total variation distance: {tvd:.4f}")

The total variation distance between two distributions $P$ and $Q$ is:

TVD (P, Q) = \frac{1}{2} \sum_{x} | P (x) - Q (x) |

This metric ranges from 0 (identical distributions) to 1 (completely disjoint distributions).

Cross-Simulator Validation Workflow

Complete Example: Comparing Simulators

This section provides a comprehensive example that exercises all simulators on the same circuit, with detailed analysis of the results and their physical interpretation.

Full Comparison Code

python

from pyqpanda3 import core
import numpy as np

# Build a 3-qubit GHZ state
prog = core.QProg()
prog << core.H(0)
prog << core.CNOT(0, 1)
prog << core.CNOT(1, 2)
prog << core.measure([0, 1, 2], [0, 1, 2])

# Version without measurement for density matrix analysis
prog_no_meas = core.QProg()
prog_no_meas << core.H(0) << core.CNOT(0, 1) << core.CNOT(1, 2)

print("=" * 60)
print("  COMPREHENSIVE SIMULATOR COMPARISON: 3-Qubit GHZ State")
print("=" * 60)
print(f"\nTheoretical state: |GHZ> = (|000> + |111>) / sqrt(2)")
print(f"Expected: P(|000>) = 0.5, P(|111>) = 0.5, all others = 0.0")

CPUQVM Results

python

print("\n--- CPUQVM (State Vector, Shot-Based) ---")
cpu = core.CPUQVM()
cpu.run(prog, shots=10000)
cpu_counts = cpu.result().get_counts()
print(f"Raw counts: {cpu_counts}")

# Convert to probabilities for comparison
total = sum(cpu_counts.values())
cpu_probs = {k: v / total for k, v in sorted(cpu_counts.items())}
print("Empirical probabilities:")
for state, prob in cpu_probs.items():
    print(f"  P(|{state}>) = {prob:.4f}")

# Compute statistical error
for state in ['000', '111']:
    p = cpu_probs.get(state, 0)
    std_err = np.sqrt(p * (1 - p) / total)
    print(f"  {state}: {p:.4f} +/- {std_err:.4f} (1-sigma)")

The expected output shows statistical fluctuations around the ideal values:

--- CPUQVM (State Vector, Shot-Based) ---
Raw counts: {'000': 5023, '111': 4977}
Empirical probabilities:
  P(|000>) = 0.5023
  P(|111>) = 0.4977
  000: 0.5023 +/- 0.0050 (1-sigma)
  111: 0.4977 +/- 0.0050 (1-sigma)

DensityMatrixSimulator Results

python

print("\n--- DensityMatrixSimulator (Ideal, No Noise) ---")
dm_sim = core.DensityMatrixSimulator()
dm_sim.run(prog_no_meas)
ideal_probs = dm_sim.state_probs()
print(f"Exact probabilities: {[f'{p:.6f}' for p in ideal_probs]}")
print(f"P(|000>) = {dm_sim.state_prob('000'):.6f}")
print(f"P(|111>) = {dm_sim.state_prob('111'):.6f}")

# Verify purity (should be 1.0 for pure state)
dm = dm_sim.density_matrix()
purity = np.trace(dm @ dm).real
print(f"Purity: {purity:.6f} (pure state = 1.0)")

# Reduced density matrices (tracing out 2 qubits leaves maximally mixed)
reduced_0 = dm_sim.reduced_density_matrix([0])
print(f"\nReduced state for qubit 0:")
print(f"  rho_0 = {reduced_0}")
print(f"  Tr(rho_0^2) = {np.trace(reduced_0 @ reduced_0).real:.4f}")
print(f"  => Maximally mixed (entangled with other qubits)")

Expected output for the ideal density matrix:

--- DensityMatrixSimulator (Ideal, No Noise) ---
Exact probabilities: ['0.500000', '0.000000', '0.000000', '0.000000', '0.000000', '0.000000', '0.000000', '0.500000']
P(|000>) = 0.500000
P(|111>) = 0.500000
Purity: 1.000000 (pure state = 1.0)

Reduced state for qubit 0:
  rho_0 = [[0.5 0. ], [0.  0.5]]
  Tr(rho_0^2) = 0.5000
  => Maximally mixed (entangled with other qubits)

Noisy Simulation Results

python

print("\n--- DensityMatrixSimulator (Noisy) ---")
noise = core.NoiseModel()
noise.add_all_qubit_quantum_error(
    core.depolarizing_error(0.02), core.GateType.CNOT
)
dm_sim2 = core.DensityMatrixSimulator()
dm_sim2.run(prog_no_meas, noise)
noisy_probs = dm_sim2.state_probs()
print(f"Noisy probabilities: {[f'{p:.6f}' for p in noisy_probs]}")

# The noise leaks probability into previously zero states
dm_noisy = dm_sim2.density_matrix()
purity_noisy = np.trace(dm_noisy @ dm_noisy).real
print(f"Purity: {purity_noisy:.6f} (< 1.0 indicates mixed state)")

# Quantify noise impact
leaked = sum(noisy_probs[i] for i in range(len(noisy_probs))
             if i not in [0, 7])
print(f"Probability leaked to other states: {leaked:.6f}")
print(f"  (0.0 = no noise, 0.5 = maximally noisy)")

With 2% depolarizing noise on each CNOT gate, the output shows:

--- DensityMatrixSimulator (Noisy) ---
Noisy probabilities: ['0.4838', '0.0016', '0.0016', '0.0050', '0.0016', '0.0050', '0.0050', '0.4964']
Purity: 0.9871 (< 1.0 indicates mixed state)
Probability leaked to other states: 0.019800

Stabilizer Results

python

print("\n--- Stabilizer (Clifford Simulation) ---")
stab = core.Stabilizer()
stab.run(prog, shots=10000)
stab_counts = stab.result().get_counts()
print(f"Stabilizer counts: {stab_counts}")

# Stabilizer should give identical statistics to CPUQVM
# since the GHZ circuit is Clifford
total_stab = sum(stab_counts.values())
stab_probs = {k: v / total_stab for k, v in sorted(stab_counts.items())}
print("Stabilizer probabilities:")
for state, prob in stab_probs.items():
    print(f"  P(|{state}>) = {prob:.4f}")

--- Stabilizer (Clifford Simulation) ---
Stabilizer counts: {'000': 4989, '111': 5011}
Stabilizer probabilities:
  P(|000>) = 0.4989
  P(|111>) = 0.5011

Interpretation Summary

python

print("\n" + "=" * 60)
print("  SUMMARY OF RESULTS")
print("=" * 60)
print("""
| Simulator               | P(|000>) | P(|111>) | Purity  | Noise? |
|-------------------------|----------|----------|---------|--------|
| CPUQVM (ideal)          | 0.5023   | 0.4977   | N/A     | No     |
| DensityMatrix (ideal)   | 0.500000 | 0.500000 | 1.000   | No     |
| DensityMatrix (noisy)   | 0.4838   | 0.4964   | 0.987   | Yes    |
| Stabilizer              | 0.4989   | 0.5011   | N/A     | No     |

Key observations:
1. All ideal simulators agree on the 50/50 distribution within
   statistical error (CPUQVM, Stabilizer) or exactly (DensityMatrix).
2. The noisy DensityMatrix shows probability leaking into non-GHZ states,
   with purity dropping below 1.0.
3. The Stabilizer is the fastest for this Clifford circuit but cannot
   simulate non-Clifford gates.
4. Reduced density matrices confirm entanglement: each single-qubit
   reduced state is maximally mixed (purity = 0.5).
""")

The fidelity between the ideal and noisy state quantifies the overall noise impact:

F (ρ_{ideal}, ρ_{noisy}) = {(Tr \sqrt{\sqrt{ρ_{ideal}} ρ_{noisy} \sqrt{ρ_{ideal}}})}^{2}

For pure vs. mixed states, this simplifies to:

F (| ψ ⟩, ρ) = ⟨ ψ | ρ | ψ ⟩

Next Steps

Noise Simulation - Deep dive into noise models and error channels
Hamiltonian & Pauli Operators - Using expectation values in variational algorithms
Variational Circuits - Building parameterized circuits with gradient computation

Simulation ​

Overview ​

CPUQVM ​

Basic Usage ​

Getting Results ​

Getting the State Vector ​

Running with Noise ​

CPUQVM Advanced Features ​

Flattening Quantum Programs ​

Accessing the Full State Vector ​

Extracting Probability Distributions ​

GPUQVM ​

Usage ​

When to Use GPUQVM ​

DensityMatrixSimulator ​

What is a Density Matrix? ​

Basic Usage ​

Probabilities ​

Reduced Density Matrix ​

Noise Simulation ​

Density Matrix Operations ​

Partial Trace and Reduced Density Matrices ​

Purity Calculation ​

Von Neumann Entropy ​

Stabilizer ​

Clifford Gates ​

Usage ​

With Noise ​

When to Use Stabilizer ​

Stabilizer Formalism Details ​

How Stabilizer Simulation Works ​

Stabilizer Generator Examples ​

Scaling Advantage ​

PartialAmplitudeQVM ​

Usage ​

When to Use PartialAmplitudeQVM ​

Expectation Value Computation ​

expval_hamiltonian ​

expval_pauli_operator ​

Multi-threaded and GPU Backends ​

Simulator Selection Guide ​

Decision Criteria ​

Performance Considerations ​

Memory Usage ​

Speed Tips ​

Circuit Depth and Gate Count Analysis ​

Thread Configuration and Performance Tuning ​

Controlling Thread Count ​

Choosing the Optimal Thread Count ​

Multi-Simulator Comparison Workflow ​

Benchmark Design ​

Comparing Ideal vs. Noisy Results ​

Cross-Simulator Validation Workflow ​

Complete Example: Comparing Simulators ​

Full Comparison Code ​

CPUQVM Results ​

DensityMatrixSimulator Results ​

Noisy Simulation Results ​

Stabilizer Results ​

Interpretation Summary ​

Next Steps ​

Simulation

Overview

CPUQVM

Basic Usage

Getting Results

Getting the State Vector

Running with Noise

CPUQVM Advanced Features

Flattening Quantum Programs

Accessing the Full State Vector

Extracting Probability Distributions

GPUQVM

Usage

When to Use GPUQVM

DensityMatrixSimulator

What is a Density Matrix?

Basic Usage

Probabilities

Reduced Density Matrix

Noise Simulation

Density Matrix Operations

Partial Trace and Reduced Density Matrices

Purity Calculation

Von Neumann Entropy

Stabilizer

Clifford Gates

Usage

With Noise

When to Use Stabilizer

Stabilizer Formalism Details

How Stabilizer Simulation Works

Stabilizer Generator Examples

Scaling Advantage

PartialAmplitudeQVM

Usage

When to Use PartialAmplitudeQVM

Expectation Value Computation

expval_hamiltonian

expval_pauli_operator

Multi-threaded and GPU Backends

Simulator Selection Guide

Decision Criteria

Performance Considerations

Memory Usage

Speed Tips

Circuit Depth and Gate Count Analysis

Thread Configuration and Performance Tuning

Controlling Thread Count

Choosing the Optimal Thread Count

Multi-Simulator Comparison Workflow

Benchmark Design

Comparing Ideal vs. Noisy Results

Cross-Simulator Validation Workflow

Complete Example: Comparing Simulators

Full Comparison Code

CPUQVM Results

DensityMatrixSimulator Results

Noisy Simulation Results

Stabilizer Results

Interpretation Summary

Next Steps