Quantum Volume Benchmark

Measure holistic quantum computer performance using the Quantum Volume (QV) metric with pyqpanda3.

Problem

Quantum computers are often marketed by qubit count alone, but the number of qubits tells only part of the story. A 100-qubit machine with high error rates and limited connectivity may solve fewer useful circuits than a 20-qubit machine with high-fidelity gates and full connectivity. Engineers need a single-number metric that captures the combined effect of gate fidelity, qubit connectivity, compiler efficiency, and circuit depth.

Quantum Volume (QV) is a holistic benchmark introduced by IBM in 2019 that addresses this gap. Rather than measuring individual hardware properties in isolation, QV tests whether a quantum computer can successfully implement random circuits of a given size and produce correct results with high probability. The metric is defined as:

V_{Q} = 2^{n_{Q}}

where $n_{Q}$ is the largest number of qubits for which the device can reliably execute depth- $n_{Q}$ random circuits with a heavy output probability exceeding $2 / 3$ .

The key insight is that QV is not just a qubit count. A device with 50 qubits but a QV of $2^{5} = 32$ can only reliably handle circuits equivalent to a perfect 5-qubit machine. The bottleneck could be any combination of:

Gate fidelity: two-qubit gate errors that accumulate with circuit depth
Connectivity: overhead from SWAP operations when the hardware topology does not match the circuit's required interactions
Coherence time: decoherence that limits how deep a circuit can run before the quantum state is lost
Compilation quality: how efficiently the compiler maps abstract circuits to native hardware gates
Measurement errors: readout errors that corrupt the final result

This makes QV a practical predictor of real-world performance: if your application requires circuits of width and depth $n$ , you need a machine with QV at least $2^{n}$ .

Solution

The Quantum Volume benchmark works by testing random SU(4) circuits of increasing width. For each width $n$ , the protocol is:

Generate random circuits. Create $n$ -qubit circuits of depth $n$ where each layer consists of random SU(4) unitaries applied to pairs of qubits. The qubit pairing in each layer is randomly permuted.
Run on a classical simulator. Compute the ideal output probability distribution $p (x)$ over all $2^{n}$ bit strings. Identify the heavy outputs -- those bit strings whose ideal probability exceeds the median:

Heavy outputs = {x : p (x) > median [{p (x^{'})}_{x^{'} \in {0, 1}^{n}}]}

Run on the target device. Execute the same circuit many times (shots) and count how often heavy outputs appear. The heavy output fraction is:

h_{n} = \frac{∥ {shots producing heavy outputs} ∥}{∥ {total shots} ∥}

Check the threshold. The device passes width $n$ if $h_{n} > 2 / 3$ with sufficient statistical confidence (at least $2 σ$ above $1 / 2$ , since a random guess would produce heavy outputs half the time).
Find the maximum. The Quantum Volume is the largest $2^{n}$ for which the device passes.

pyqpanda3 provides two key functions for this workflow:

core.QV(num_qubit, depth, seed) generates a QV circuit (a QCircuit) with random SU(4) oracle layers, following the standard protocol. Each layer randomly permutes the qubits, groups them into $⌊ n / 2 ⌋$ pairs, and applies a random SU(4) unitary to each pair via QOracle.
core.random_qcircuit(qubits, depth, gate_type) generates a general random circuit from a specified gate set, useful for custom benchmarking beyond the standard QV protocol.

Code

Running a basic Quantum Volume test

Use core.QV() to generate a standard QV circuit and evaluate it on the ideal simulator.

python

"""Basic Quantum Volume test on an ideal simulator."""
from pyqpanda3 import core

# Parameters for the QV test
num_qubits = 4
depth = num_qubits  # QV uses depth == width
seed = 42

# Generate a QV circuit: random SU(4) unitaries on permuted qubit pairs
qv_circuit = core.QV(num_qubits, depth, seed)

# Wrap in a program and add measurements
prog = core.QProg()
prog.append(qv_circuit)
for q in range(num_qubits):
    prog.append(core.measure(q, q))

# Run on the ideal CPU simulator
qvm = core.CPUQVM()
shots = 10000
qvm.run(prog, shots)
result = qvm.result()

# Get measurement probabilities
prob_dict = result.get_prob_dict()
print("Ideal output distribution (top 10 outcomes):")
sorted_probs = sorted(prob_dict.items(), key=lambda x: -x[1])[:10]
for bitstring, prob in sorted_probs:
    print(f"  {bitstring}: {prob:.6f}")

Computing the heavy output fraction manually

The heavy output fraction is the core statistic of the QV test. Here is how to compute it step by step.

python

"""Compute the heavy output fraction for a QV circuit."""
from pyqpanda3 import core
import numpy as np


def compute_heavy_outputs(prob_dict: dict) -> set:
    """Identify heavy outputs: bit strings above the median probability.

    Args:
        prob_dict: Dictionary mapping bit strings to ideal probabilities.

    Returns:
        Set of heavy output bit strings.
    """
    probabilities = np.array(list(prob_dict.values()))
    median_prob = np.median(probabilities)
    heavy = {
        bitstring
        for bitstring, prob in prob_dict.items()
        if prob > median_prob
    }
    return heavy


def heavy_output_fraction(prob_dict: dict, counts_dict: dict) -> float:
    """Compute the heavy output fraction from ideal and sampled distributions.

    Args:
        prob_dict: Ideal probability distribution.
        counts_dict: Sampled measurement counts.

    Returns:
        Fraction of shots that produced heavy outputs.
    """
    heavy = compute_heavy_outputs(prob_dict)
    total_shots = sum(counts_dict.values())
    heavy_count = sum(
        count for bitstring, count in counts_dict.items()
        if bitstring in heavy
    )
    return heavy_count / total_shots


# Run QV test
num_qubits = 4
qv_circuit = core.QV(num_qubits, num_qubits, seed=123)

prog = core.QProg()
prog.append(qv_circuit)
for q in range(num_qubits):
    prog.append(core.measure(q, q))

# Get ideal distribution (high shot count for accurate probabilities)
qvm = core.CPUQVM()
qvm.run(prog, 100000)
ideal_probs = qvm.result().get_prob_dict()

# Get sampled distribution (fewer shots simulates a real device)
qvm.run(prog, 1000)
sampled_counts = qvm.result().get_counts()

# Compute heavy output fraction
hof = heavy_output_fraction(ideal_probs, sampled_counts)
print(f"QV-{num_qubits} heavy output fraction: {hof:.4f}")
print(f"Threshold for passing: 0.6667")
print(f"Result: {'PASS' if hof > 2 / 3 else 'FAIL'}")

Sweeping QV width to find the maximum

In practice, you test increasing widths until the device fails. On an ideal simulator, every width should pass.

python

"""Sweep QV widths to determine the maximum Quantum Volume on an ideal simulator."""
from pyqpanda3 import core
import numpy as np


def compute_heavy_outputs(prob_dict: dict) -> set:
    probabilities = np.array(list(prob_dict.values()))
    median_prob = np.median(probabilities)
    return {
        bs for bs, p in prob_dict.items() if p > median_prob
    }


def run_qv_test(num_qubits: int, shots: int = 10000, seed: int = 0) -> float:
    """Run a single QV test and return the heavy output fraction."""
    qv_circuit = core.QV(num_qubits, num_qubits, seed)
    prog = core.QProg()
    prog.append(qv_circuit)
    for q in range(num_qubits):
        prog.append(core.measure(q, q))

    qvm = core.CPUQVM()

    # Ideal probabilities for identifying heavy outputs
    qvm.run(prog, 100000)
    ideal_probs = qvm.result().get_prob_dict()
    heavy = compute_heavy_outputs(ideal_probs)

    # Sampled distribution
    qvm.run(prog, shots)
    counts = qvm.result().get_counts()

    total = sum(counts.values())
    heavy_count = sum(c for bs, c in counts.items() if bs in heavy)
    return heavy_count / total


# Sweep widths from 2 to 7
print(f"{'Width':>6} {'Depth':>6} {'HOF':>8} {'Pass?':>6} {'QV':>8}")
print("-" * 40)

max_passing_width = 0
for n in range(2, 8):
    # Use multiple seeds for statistical robustness
    hofs = []
    for seed in range(10):
        hof = run_qv_test(n, shots=5000, seed=seed * 100 + n)
        hofs.append(hof)
    avg_hof = np.mean(hofs)
    passed = avg_hof > 2 / 3
    if passed:
        max_passing_width = n
    print(f"{n:>6} {n:>6} {avg_hof:>8.4f} {'PASS' if passed else 'FAIL':>6} {2**n:>8}")

quantum_volume = 2 ** max_passing_width if max_passing_width > 0 else 1
print(f"\nQuantum Volume (ideal simulator): {quantum_volume}")

Generating custom random circuits with random_qcircuit

For benchmarking beyond the standard QV protocol, use core.random_qcircuit() to generate circuits from a specific gate set. The supported gate type strings are: "X", "Y", "Z", "H", "S", "T", "RX", "RY", "RZ", "U1", "U2", "U3", "U4", "P", "I", "ISWAP", "SQISWAP", "CPHASE", "RPHI", "CU", "SWAP", "X1", "Y1", "Z1", "RZZ", "RYY", "RXX", "RZX", "ECHO", "IDLE", "CNOT", "CZ", "MS". When the gate type list is empty, all types are used by default.

python

"""Generate random circuits with a custom gate set."""
from pyqpanda3 import core

# Define qubits and a custom gate set
qubits = list(range(5))
depth = 20
gate_set = ["H", "X", "RX", "RY", "RZ", "CNOT", "SWAP"]

# Generate a random circuit
circuit = core.random_qcircuit(qubits, depth, gate_set)

# Build a program with measurements
prog = core.QProg()
prog.append(circuit)
for q in qubits:
    prog.append(core.measure(q, q))

# Run and get results
qvm = core.CPUQVM()
qvm.run(prog, 5000)
result = qvm.result()
prob_dict = result.get_prob_dict()
print(f"Number of possible outcomes: {len(prob_dict)}")

Running QV with noise simulation

Real devices are noisy. Simulate the effect of noise on QV performance by applying depolarizing errors to all gates.

python

"""Quantum Volume test with a depolarizing noise model."""
from pyqpanda3 import core
import numpy as np


def compute_heavy_outputs(prob_dict: dict) -> set:
    probabilities = np.array(list(prob_dict.values()))
    median_prob = np.median(probabilities)
    return {bs for bs, p in prob_dict.items() if p > median_prob}


# --- Build QV circuit ---
num_qubits = 4
qv_circuit = core.QV(num_qubits, num_qubits, seed=42)

prog = core.QProg()
prog.append(qv_circuit)
for q in range(num_qubits):
    prog.append(core.measure(q, q))

# --- Ideal reference (no noise) ---
qvm_ideal = core.CPUQVM()
qvm_ideal.run(prog, 100000)
ideal_probs = qvm_ideal.result().get_prob_dict()
heavy = compute_heavy_outputs(ideal_probs)

# --- Noisy simulation ---
error_rate = 0.02  # 2% depolarizing error per gate
noise_model = core.NoiseModel()
dep_error = core.depolarizing_error(error_rate)
noise_model.add_all_qubit_quantum_error(dep_error, core.GateType.CNOT)

# Single-qubit gate noise
dep_error_1q = core.depolarizing_error(error_rate * 0.5)
noise_model.add_all_qubit_quantum_error(dep_error_1q, core.GateType.H)
noise_model.add_all_qubit_quantum_error(dep_error_1q, core.GateType.X)

qvm_noisy = core.CPUQVM()
qvm_noisy.run(prog, 10000, noise_model)
noisy_counts = qvm_noisy.result().get_counts()

total = sum(noisy_counts.values())
heavy_count = sum(c for bs, c in noisy_counts.items() if bs in heavy)
hof_noisy = heavy_count / total

print(f"Error rate:    {error_rate:.1%}")
print(f"HOF (noisy):   {hof_noisy:.4f}")
print(f"Threshold:     0.6667")
print(f"Result:        {'PASS' if hof_noisy > 2 / 3 else 'FAIL'}")

Sweeping noise levels to find the QV failure threshold

Determine the maximum tolerable error rate before the QV test fails. This is critical for hardware engineering: it tells you the gate fidelity targets your device must meet.

python

"""Sweep noise levels to find the QV failure threshold."""
from pyqpanda3 import core
import numpy as np


def compute_heavy_outputs(prob_dict: dict) -> set:
    probabilities = np.array(list(prob_dict.values()))
    median_prob = np.median(probabilities)
    return {bs for bs, p in prob_dict.items() if p > median_prob}


def qv_hof_with_noise(num_qubits: int, error_rate: float, seed: int) -> float:
    """Run a QV test with depolarizing noise and return HOF."""
    qv_circuit = core.QV(num_qubits, num_qubits, seed)
    prog = core.QProg()
    prog.append(qv_circuit)
    for q in range(num_qubits):
        prog.append(core.measure(q, q))

    qvm = core.CPUQVM()

    # Ideal reference
    qvm.run(prog, 100000)
    heavy = compute_heavy_outputs(qvm.result().get_prob_dict())

    # Noisy execution
    noise = core.NoiseModel()
    if error_rate > 0:
        dep_2q = core.depolarizing_error(error_rate)
        noise.add_all_qubit_quantum_error(dep_2q, core.GateType.CNOT)
        dep_1q = core.depolarizing_error(error_rate * 0.5)
        noise.add_all_qubit_quantum_error(dep_1q, core.GateType.H)

    qvm.run(prog, 10000, noise)
    counts = qvm.result().get_counts()

    total = sum(counts.values())
    return sum(c for bs, c in counts.items() if bs in heavy) / total


# Sweep error rates for QV-4
num_qubits = 4
error_rates = np.arange(0, 0.12, 0.01)
num_trials = 5

print(f"QV-{num_qubits} noise threshold sweep")
print(f"{'Error Rate':>12} {'Avg HOF':>10} {'Std Dev':>10} {'Pass?':>6}")
print("-" * 44)

threshold = None
for rate in error_rates:
    hofs = [
        qv_hof_with_noise(num_qubits, rate, seed=seed * 7 + num_qubits)
        for seed in range(num_trials)
    ]
    avg = np.mean(hofs)
    std = np.std(hofs)
    passed = avg > 2 / 3
    if not passed and threshold is None:
        threshold = rate
    print(f"{rate:>11.2%} {avg:>10.4f} {std:>10.4f} {'PASS' if passed else 'FAIL':>6}")

if threshold is not None:
    print(f"\nNoise threshold: ~{threshold:.2%} depolarizing error")
else:
    print("\nDevice passes at all tested noise levels.")

Explanation

Mathematical definition of Quantum Volume

The formal definition, as established by Cross et al. (IBM, 2019), is:

\log_{2} V_{Q} = \arg max_{n} min (n, d (n)) such that {\tilde{h}}_{n} > \frac{2}{3}

where:

$n$ is the number of qubits (circuit width)
$d (n)$ is the achievable circuit depth at width $n$
${\tilde{h}}_{n}$ is the heavy output probability, estimated with at least $2 σ$ confidence above $1 / 2$

In the standard protocol, $d = n$ (depth equals width), so the formula simplifies to:

V_{Q} = 2^{n_{Q}}, n_{Q} = max {n : h_{n} > \frac{2}{3}}

The choice of $2 / 3$ as the threshold is deliberate. Since the heavy output set contains exactly half of all possible outcomes, a device producing uniformly random outputs would achieve $h = 1 / 2$ . The threshold of $2 / 3$ provides sufficient separation from random behavior while accounting for finite-sample statistical fluctuations.

The Heavy Output Generation (HOG) problem

The QV benchmark is connected to a computational complexity problem called the Heavy Output Generation (HOG) problem. Informally:

Given a random quantum circuit, produce an output that is more likely than average (a "heavy" output).

Classically, sampling heavy outputs from a random quantum circuit is believed to be hard -- it requires simulating the circuit, which takes exponential time in the number of qubits. A quantum device that can reliably produce heavy outputs is demonstrating a computational advantage over straightforward classical simulation.

The connection to computational complexity gives QV a stronger foundation than purely empirical benchmarks. When a quantum computer achieves $V_{Q} = 2^{n}$ , it is not merely running circuits of size $n$ ; it is solving a classically hard problem at that scale.

Why QV captures more than qubit count

A device's Quantum Volume can be limited by any of several bottlenecks, which is precisely what makes it a useful holistic metric.

Gate fidelity bottleneck. Consider a 20-qubit device with $98 %$ two-qubit gate fidelity. A QV circuit of width 5 contains $\sim 12$ two-qubit gates, giving cumulative success ${0.98}^{12} \approx 0.78$ -- still passing. At width 8, $\sim 32$ two-qubit gates give ${0.98}^{32} \approx 0.52$ , which fails. The QV would be $2^{6} = 64$ despite having 20 physical qubits.

Connectivity bottleneck. On a linear-chain device, implementing arbitrary SU(4) pairs requires SWAP gates, each adding 3 extra CNOTs and accumulating errors. A fully connected device executes the same circuit with far fewer gates.

Compiler bottleneck. Different compilers decompose SU(4) unitaries into native gates with varying efficiency. Better compilation produces shorter circuits, directly improving the heavy output fraction. QV therefore measures software quality as well as hardware capability.

Relationship to circuit layer fidelity

A useful way to understand QV is through the concept of layer fidelity $e^{- λ}$ , where $λ$ is the total error per layer. For a QV circuit of width $n$ and depth $n$ , the expected heavy output fraction is approximately:

E [h_{n}] \approx \frac{1}{2} (1 + e^{- n λ})

Setting $E [h_{n}] = 2 / 3$ and solving:

\frac{2}{3} = \frac{1}{2} (1 + e^{- n λ}) ⟹ e^{- n λ} = \frac{1}{3} ⟹ n λ = \ln 3

This means a device passes QV- $n$ when its effective error rate per layer satisfies $λ < \ln 3 / n$ . The tolerable error rate decreases inversely with circuit width, which explains why QV is such a demanding benchmark.

The QV circuit structure

The core.QV(num_qubit, depth, seed) function generates a circuit with a specific structure defined by the IBM QV protocol:

Each layer:

Randomly permutes the $n$ qubits using seed-controlled shuffling
Groups qubits into $⌊ n / 2 ⌋$ pairs: $(p e r m [0], p e r m [1])$ , $(p e r m [2], p e r m [3])$ , etc.
Applies a random SU(4) unitary (a $4 \times 4$ special unitary matrix) to each pair via QOracle

If $n$ is odd, the last qubit in each layer is left idle. The random SU(4) matrix is generated by:

Drawing a random $4 \times 4$ complex matrix $A$
Computing its singular value decomposition: $A = U Σ V^{†}$
Taking $U$ as the random unitary
Normalizing the determinant to 1 (ensuring SU(4), not just U(4))

Practical interpretation of QV numbers

QV	Meaning
$2^{1} = 2$	Can reliably run 1-qubit circuits of depth 1
$2^{3} = 8$	Equivalent to a perfect 3-qubit device
$2^{5} = 32$	Can handle circuits comparable to a perfect 5-qubit machine
$2^{10} = 1024$	State-of-the-art superconducting devices (2023-2024)
$2^{20} = 1048576$	Requires extremely high-fidelity gates at scale

When choosing hardware for your application:

Optimization algorithms (QAOA): Need QV at least $2^{n}$ where $n$ is the problem size. A 10-variable QAOA needs QV $\geq 2^{10}$ .
Variational algorithms (VQE): More forgiving because shallow circuits are used, but higher QV still improves solution quality.
Quantum error correction: Requires QV well above the logical qubit count, since QEC circuits involve many ancilla qubits and deep circuits.

Multiple circuit trials and statistical confidence

In an official QV certification, you run multiple circuits (typically 100-200) per width and compute the one-sided confidence interval on the mean heavy output fraction. The device passes if the lower bound of the $2 σ$ confidence interval exceeds $1 / 2$ :

\bar{h} - 2 \cdot \frac{σ_{h}}{\sqrt{K}} > \frac{1}{2}

where $K$ is the number of circuits and $σ_{h}$ is the standard deviation of the heavy output fractions across circuits. This ensures that passing is not due to random luck on a single circuit.

python

"""Multi-trial QV test with confidence intervals."""
from pyqpanda3 import core
import numpy as np


def compute_heavy_outputs(prob_dict: dict) -> set:
    probabilities = np.array(list(prob_dict.values()))
    median_prob = np.median(probabilities)
    return {bs for bs, p in prob_dict.items() if p > median_prob}


def single_qv_hof(num_qubits: int, seed: int, shots: int = 10000) -> float:
    qv_circuit = core.QV(num_qubits, num_qubits, seed)
    prog = core.QProg()
    prog.append(qv_circuit)
    for q in range(num_qubits):
        prog.append(core.measure(q, q))

    qvm = core.CPUQVM()
    qvm.run(prog, 100000)
    heavy = compute_heavy_outputs(qvm.result().get_prob_dict())

    qvm.run(prog, shots)
    counts = qvm.result().get_counts()
    total = sum(counts.values())
    return sum(c for bs, c in counts.items() if bs in heavy) / total


# Official-style QV test with confidence intervals
num_qubits = 4
num_circuits = 50
seeds = [i * 31 + num_qubits for i in range(num_circuits)]

hofs = [single_qv_hof(num_qubits, s) for s in seeds]
mean_hof = np.mean(hofs)
std_hof = np.std(hofs, ddof=1)
ci_lower = mean_hof - 2 * std_hof / np.sqrt(num_circuits)

print(f"QV-{num_qubits} over {num_circuits} circuits:")
print(f"  Mean HOF:    {mean_hof:.4f}")
print(f"  Std dev:     {std_hof:.4f}")
print(f"  2-sigma CI:  [{ci_lower:.4f}, {mean_hof + 2 * std_hof / np.sqrt(num_circuits):.4f}]")
print(f"  CI lower > 0.5? {'PASS' if ci_lower > 0.5 else 'FAIL'}")

Comparing Compilers

Quantum Volume is not just a hardware metric. It also measures compiler quality, because the same logical circuit can be decomposed into native gates in many different ways. A better compiler produces fewer physical gates, reducing cumulative error and improving the heavy output fraction.

You can use QV to quantify compiler efficiency by generating the same logical QV circuit and comparing results under different transpilation strategies. The key insight is that the logical circuit (a sequence of SU(4) unitaries) is fixed -- only the decomposition into native gates changes.

Consider three compilation strategies:

Naive decomposition: Each SU(4) is decomposed using a fixed gate sequence (e.g., 3 CNOTs plus single-qubit rotations). This is simple but may not be optimal for the target topology.
Topology-aware mapping: The compiler inserts SWAP gates to match the circuit connectivity to the hardware coupling map. Fewer SWAPs means fewer physical gates.
Optimized synthesis: Advanced compilers use approximate synthesis, gate cancellation, and commutativity-aware routing to minimize the total gate count.

python

"""Compare compiler efficiency using QV circuits under different transpilation strategies."""
from pyqpanda3 import core
import numpy as np


def compute_heavy_outputs(prob_dict: dict) -> set:
    probabilities = np.array(list(prob_dict.values()))
    median_prob = np.median(probabilities)
    return {bs for bs, p in prob_dict.items() if p > median_prob}


def count_gates(circuit) -> dict:
    """Walk the circuit and count gate types.

    Returns a dictionary mapping gate type names to occurrence counts.
    """
    gate_counts = {}
    # Use the circuit's built-in gate counting if available,
    # otherwise parse the circuit description
    info = circuit.count_ops()
    for gate_name, count in info.items():
        gate_counts[gate_name] = gate_counts.get(gate_name, 0) + count
    return gate_counts


def run_qv_with_noise(num_qubits: int, seed: int,
                      two_qubit_error: float, one_qubit_error: float,
                      shots: int = 10000) -> float:
    """Run a QV test with specified noise levels and return HOF."""
    qv_circuit = core.QV(num_qubits, num_qubits, seed)
    prog = core.QProg()
    prog.append(qv_circuit)
    for q in range(num_qubits):
        prog.append(core.measure(q, q))

    qvm = core.CPUQVM()

    # Ideal reference for heavy output identification
    qvm.run(prog, 100000)
    heavy = compute_heavy_outputs(qvm.result().get_prob_dict())

    # Build noise model
    noise = core.NoiseModel()
    if two_qubit_error > 0:
        dep_2q = core.depolarizing_error(two_qubit_error)
        noise.add_all_qubit_quantum_error(dep_2q, core.GateType.CNOT)
    if one_qubit_error > 0:
        dep_1q = core.depolarizing_error(one_qubit_error)
        noise.add_all_qubit_quantum_error(dep_1q, core.GateType.H)

    qvm.run(prog, shots, noise)
    counts = qvm.result().get_counts()
    total = sum(counts.values())
    return sum(c for bs, c in counts.items() if bs in heavy) / total


# Compare three simulated "compiler" strategies by varying effective
# two-qubit gate counts through different noise scaling.
# Strategy A: baseline (1x two-qubit gate cost)
# Strategy B: improved routing (0.7x two-qubit gate cost via fewer SWAPs)
# Strategy C: optimized synthesis (0.5x two-qubit gate cost)
num_qubits = 5
base_2q_error = 0.03
base_1q_error = 0.005
seeds = [10, 20, 30, 40, 50]

strategies = {
    "Naive decomposition": (base_2q_error, base_1q_error),
    "Topology-aware routing": (base_2q_error * 0.7, base_1q_error * 0.8),
    "Optimized synthesis": (base_2q_error * 0.5, base_1q_error * 0.6),
}

print(f"Compiler comparison for QV-{num_qubits} (base 2Q error: {base_2q_error:.1%})")
print(f"{'Strategy':<25} {'Eff. 2Q Error':>14} {'Avg HOF':>10} {'Pass?':>6}")
print("-" * 60)

for name, (err_2q, err_1q) in strategies.items():
    hofs = [
        run_qv_with_noise(num_qubits, s, err_2q, err_1q)
        for s in seeds
    ]
    avg_hof = np.mean(hofs)
    print(f"{name:<25} {err_2q:>13.2%} {avg_hof:>10.4f} {'PASS' if avg_hof > 2/3 else 'FAIL':>6}")

The output demonstrates that compiler improvements directly translate to QV performance gains. A 50% reduction in effective two-qubit gate error (from optimized synthesis) can shift a device from failing to passing a given QV level. This is why quantum software teams invest heavily in compiler optimization.

Key observations for compiler benchmarking with QV:

Gate count matters more than circuit depth. A wider but shallower circuit can outperform a narrow deep one if the total gate count is lower. QV captures this because cumulative gate error is what determines the heavy output fraction.
SWAP overhead is the dominant cost. On limited-connectivity hardware, routing can add 3~10x more two-qubit gates than the logical circuit requires. Measuring QV with and without SWAP-aware compilation quantifies this overhead directly.
Single-qubit gate cancellation provides diminishing returns. Most modern compilers already eliminate adjacent inverse gates. The remaining gains come from cross-layer optimization, which QV can quantify by comparing HOF before and after.
Use multiple seeds. A single QV circuit may accidentally favor one compiler over another. Average over at least 10-20 random circuits to get a reliable comparison.

Application-Specific Benchmarking

The standard QV protocol uses random SU(4) unitaries, which exercise the full gate set uniformly. However, real quantum applications often use specific gate patterns: variational algorithms rely on parameterized rotations, error correction circuits are Clifford-heavy, and quantum chemistry uses many controlled rotations. You can use core.random_qcircuit() to benchmark gate sets that match your actual workload.

This approach answers a different question than standard QV. Instead of "what is the maximum general-purpose circuit this device can handle?", it asks "how well does this device run the types of circuits my application actually uses?"

Common application-specific gate sets include:

Clifford circuits: ["H", "S", "CNOT"] -- relevant for error correction, state preparation, and classical simulation benchmarks. Clifford circuits can be efficiently simulated classically (Gottesman-Knill theorem), so they test hardware without quantum advantage claims.
NISQ variational circuits: ["H", "RX", "RY", "RZ", "CNOT"] -- the bread and butter of VQE and QAOA. These mix parameterized rotations with entangling gates.
T-gate heavy circuits: ["H", "T", "CNOT"] -- relevant for fault-tolerant compilation, where T-gate count determines resource requirements.
Hardware-native gates: Use the gate set that matches your device's native operations (e.g., ["RZ", "SX", "CNOT"] for IBM, ["RX", "RZ", "ISWAP"] for Google) to measure raw hardware capability without compilation overhead.

python

"""Benchmark different gate sets using custom random circuits."""
from pyqpanda3 import core
import numpy as np


def compute_heavy_outputs(prob_dict: dict) -> set:
    probabilities = np.array(list(prob_dict.values()))
    median_prob = np.median(probabilities)
    return {bs for bs, p in prob_dict.items() if p > median_prob}


def benchmark_gate_set(qubits: list, depth: int, gate_set: list,
                       model=None, num_trials: int = 10,
                       shots: int = 10000) -> dict:
    """Run random circuit benchmarking for a specific gate set.

    Args:
        qubits: List of qubit indices to use.
        depth: Circuit depth.
        gate_set: List of gate type strings.
        noise_model: Optional noise model for realistic simulation.
        num_trials: Number of random circuits to average over.
        shots: Measurement shots per circuit.

    Returns:
        Dictionary with benchmark results.
    """
    hofs = []
    circuit_depths = []

    for seed in range(num_trials):
        circuit = core.random_qcircuit(qubits, depth, gate_set)

        prog = core.QProg()
        prog.append(circuit)
        for q in qubits:
            prog.append(core.measure(q, q))

        qvm = core.CPUQVM()

        # Ideal reference
        qvm.run(prog, 100000)
        ideal_probs = qvm.result().get_prob_dict()
        heavy = compute_heavy_outputs(ideal_probs)

        # Run with or without noise
        if noise_model is not None:
            qvm.run(prog, shots, noise_model)
        else:
            qvm.run(prog, shots)
        counts = qvm.result().get_counts()

        total = sum(counts.values())
        hof = sum(c for bs, c in counts.items() if bs in heavy) / total
        hofs.append(hof)

    return {
        "gate_set": gate_set,
        "num_qubits": len(qubits),
        "depth": depth,
        "mean_hof": np.mean(hofs),
        "std_hof": np.std(hofs),
        "min_hof": np.min(hofs),
        "max_hof": np.max(hofs),
    }


# Define gate sets representing different application domains
gate_sets = {
    "Clifford": ["H", "S", "CNOT"],
    "Variational (NISQ)": ["H", "RX", "RY", "RZ", "CNOT"],
    "T-gate heavy": ["H", "T", "CNOT"],
    "Full rotation set": ["H", "RX", "RY", "RZ", "CNOT", "SWAP"],
    "Hardware-native (IBM-like)": ["RZ", "H", "CNOT"],
}

# Benchmark parameters
num_qubits = 4
depth = 20
num_trials = 15

# Create a noise model to see differentiation between gate sets
noise = core.NoiseModel()
dep_2q = core.depolarizing_error(0.02)
noise.add_all_qubit_quantum_error(dep_2q, core.GateType.CNOT)
dep_1q = core.depolarizing_error(0.005)
noise.add_all_qubit_quantum_error(dep_1q, core.GateType.H)
noise.add_all_qubit_quantum_error(dep_1q, core.GateType.RX)

qubits = list(range(num_qubits))

print(f"Application-specific benchmarking ({num_qubits} qubits, depth {depth})")
print(f"With depolarizing noise: 2Q=2.0%, 1Q=0.5%")
print(f"{'Gate Set':<28} {'Mean HOF':>10} {'Std':>8} {'Min':>8} {'Max':>8}")
print("-" * 68)

for name, gate_set in gate_sets.items():
    result = benchmark_gate_set(
        qubits, depth, gate_set,
        model=noise, num_trials=num_trials
    )
    print(f"{name:<28} {result['mean_hof']:>10.4f} {result['std_hof']:>8.4f} "
          f"{result['min_hof']:>8.4f} {result['max_hof']:>8.4f}")

# Also run without noise to confirm all gate sets produce valid heavy outputs
print(f"\n--- Ideal (no noise) ---")
print(f"{'Gate Set':<28} {'Mean HOF':>10} {'Std':>8}")
print("-" * 48)

for name, gate_set in gate_sets.items():
    result = benchmark_gate_set(
        qubits, depth, gate_set,
        model=None, num_trials=num_trials
    )
    print(f"{name:<28} {result['mean_hof']:>10.4f} {result['std_hof']:>8.4f}")

Interpreting application-specific benchmark results:

Clifford circuits tend to produce sharper (lower-entropy) output distributions because they preserve stabilizer states. This means the heavy output set is more concentrated, and the HOF is typically higher than for general random circuits at the same depth and noise level. If your application is Clifford-based, your device may perform better than the standard QV number suggests.
T-gate heavy circuits are sensitive to single-qubit gate fidelity because T gates require precise rotation angles. A device with good two-qubit gates but poor single-qubit gates will show lower HOF on T-heavy benchmarks compared to the standard QV result.
Variational gate sets (with parameterized rotations) are the most representative of NISQ algorithm performance. They exercise both single-qubit and two-qubit gates, making the HOF a good predictor of variational algorithm convergence quality.
Hardware-native gate sets remove compilation overhead entirely. Comparing the HOF of hardware-native gates against compiled gates reveals the cost of compilation. If the gap is small, your compiler is efficient; if large, there is room for improvement.

Summary

The Quantum Volume benchmark provides a single, hardware-independent metric that reflects the real-world capability of a quantum computer. By using core.QV() to generate standard test circuits and core.random_qcircuit() for custom benchmarks, you can systematically evaluate quantum hardware and identify the bottlenecks that limit performance.

Quantum Volume Benchmark ​

Problem ​

Solution ​

Code ​

Running a basic Quantum Volume test ​

Computing the heavy output fraction manually ​

Sweeping QV width to find the maximum ​

Generating custom random circuits with random_qcircuit ​

Running QV with noise simulation ​

Sweeping noise levels to find the QV failure threshold ​

Explanation ​

Mathematical definition of Quantum Volume ​

The Heavy Output Generation (HOG) problem ​

Why QV captures more than qubit count ​

Relationship to circuit layer fidelity ​

The QV circuit structure ​

Practical interpretation of QV numbers ​

Multiple circuit trials and statistical confidence ​

Comparing Compilers ​

Application-Specific Benchmarking ​

Summary ​

Quantum Volume Benchmark

Problem

Solution

Code

Running a basic Quantum Volume test

Computing the heavy output fraction manually

Sweeping QV width to find the maximum

Generating custom random circuits with random_qcircuit

Running QV with noise simulation

Sweeping noise levels to find the QV failure threshold

Explanation

Mathematical definition of Quantum Volume

The Heavy Output Generation (HOG) problem

Why QV captures more than qubit count

Relationship to circuit layer fidelity

The QV circuit structure

Practical interpretation of QV numbers

Multiple circuit trials and statistical confidence

Comparing Compilers

Application-Specific Benchmarking

Summary