Quantum Volume Benchmark
Measure holistic quantum computer performance using the Quantum Volume (QV) metric with pyqpanda3.
Problem
Quantum computers are often marketed by qubit count alone, but the number of qubits tells only part of the story. A 100-qubit machine with high error rates and limited connectivity may solve fewer useful circuits than a 20-qubit machine with high-fidelity gates and full connectivity. Engineers need a single-number metric that captures the combined effect of gate fidelity, qubit connectivity, compiler efficiency, and circuit depth.
Quantum Volume (QV) is a holistic benchmark introduced by IBM in 2019 that addresses this gap. Rather than measuring individual hardware properties in isolation, QV tests whether a quantum computer can successfully implement random circuits of a given size and produce correct results with high probability. The metric is defined as:
where
The key insight is that QV is not just a qubit count. A device with 50 qubits but a QV of
- Gate fidelity: two-qubit gate errors that accumulate with circuit depth
- Connectivity: overhead from SWAP operations when the hardware topology does not match the circuit's required interactions
- Coherence time: decoherence that limits how deep a circuit can run before the quantum state is lost
- Compilation quality: how efficiently the compiler maps abstract circuits to native hardware gates
- Measurement errors: readout errors that corrupt the final result
This makes QV a practical predictor of real-world performance: if your application requires circuits of width and depth
Solution
The Quantum Volume benchmark works by testing random SU(4) circuits of increasing width. For each width
Generate random circuits. Create
-qubit circuits of depth where each layer consists of random SU(4) unitaries applied to pairs of qubits. The qubit pairing in each layer is randomly permuted. Run on a classical simulator. Compute the ideal output probability distribution
over all bit strings. Identify the heavy outputs -- those bit strings whose ideal probability exceeds the median:
- Run on the target device. Execute the same circuit many times (shots) and count how often heavy outputs appear. The heavy output fraction is:
Check the threshold. The device passes width
if with sufficient statistical confidence (at least above , since a random guess would produce heavy outputs half the time). Find the maximum. The Quantum Volume is the largest
for which the device passes.
pyqpanda3 provides two key functions for this workflow:
core.QV(num_qubit, depth, seed)generates a QV circuit (aQCircuit) with random SU(4) oracle layers, following the standard protocol. Each layer randomly permutes the qubits, groups them intopairs, and applies a random SU(4) unitary to each pair via QOracle.core.random_qcircuit(qubits, depth, gate_type)generates a general random circuit from a specified gate set, useful for custom benchmarking beyond the standard QV protocol.
Code
Running a basic Quantum Volume test
Use core.QV() to generate a standard QV circuit and evaluate it on the ideal simulator.
"""Basic Quantum Volume test on an ideal simulator."""
from pyqpanda3 import core
# Parameters for the QV test
num_qubits = 4
depth = num_qubits # QV uses depth == width
seed = 42
# Generate a QV circuit: random SU(4) unitaries on permuted qubit pairs
qv_circuit = core.QV(num_qubits, depth, seed)
# Wrap in a program and add measurements
prog = core.QProg()
prog.append(qv_circuit)
for q in range(num_qubits):
prog.append(core.measure(q, q))
# Run on the ideal CPU simulator
qvm = core.CPUQVM()
shots = 10000
qvm.run(prog, shots)
result = qvm.result()
# Get measurement probabilities
prob_dict = result.get_prob_dict()
print("Ideal output distribution (top 10 outcomes):")
sorted_probs = sorted(prob_dict.items(), key=lambda x: -x[1])[:10]
for bitstring, prob in sorted_probs:
print(f" {bitstring}: {prob:.6f}")Computing the heavy output fraction manually
The heavy output fraction is the core statistic of the QV test. Here is how to compute it step by step.
"""Compute the heavy output fraction for a QV circuit."""
from pyqpanda3 import core
import numpy as np
def compute_heavy_outputs(prob_dict: dict) -> set:
"""Identify heavy outputs: bit strings above the median probability.
Args:
prob_dict: Dictionary mapping bit strings to ideal probabilities.
Returns:
Set of heavy output bit strings.
"""
probabilities = np.array(list(prob_dict.values()))
median_prob = np.median(probabilities)
heavy = {
bitstring
for bitstring, prob in prob_dict.items()
if prob > median_prob
}
return heavy
def heavy_output_fraction(prob_dict: dict, counts_dict: dict) -> float:
"""Compute the heavy output fraction from ideal and sampled distributions.
Args:
prob_dict: Ideal probability distribution.
counts_dict: Sampled measurement counts.
Returns:
Fraction of shots that produced heavy outputs.
"""
heavy = compute_heavy_outputs(prob_dict)
total_shots = sum(counts_dict.values())
heavy_count = sum(
count for bitstring, count in counts_dict.items()
if bitstring in heavy
)
return heavy_count / total_shots
# Run QV test
num_qubits = 4
qv_circuit = core.QV(num_qubits, num_qubits, seed=123)
prog = core.QProg()
prog.append(qv_circuit)
for q in range(num_qubits):
prog.append(core.measure(q, q))
# Get ideal distribution (high shot count for accurate probabilities)
qvm = core.CPUQVM()
qvm.run(prog, 100000)
ideal_probs = qvm.result().get_prob_dict()
# Get sampled distribution (fewer shots simulates a real device)
qvm.run(prog, 1000)
sampled_counts = qvm.result().get_counts()
# Compute heavy output fraction
hof = heavy_output_fraction(ideal_probs, sampled_counts)
print(f"QV-{num_qubits} heavy output fraction: {hof:.4f}")
print(f"Threshold for passing: 0.6667")
print(f"Result: {'PASS' if hof > 2 / 3 else 'FAIL'}")Sweeping QV width to find the maximum
In practice, you test increasing widths until the device fails. On an ideal simulator, every width should pass.
"""Sweep QV widths to determine the maximum Quantum Volume on an ideal simulator."""
from pyqpanda3 import core
import numpy as np
def compute_heavy_outputs(prob_dict: dict) -> set:
probabilities = np.array(list(prob_dict.values()))
median_prob = np.median(probabilities)
return {
bs for bs, p in prob_dict.items() if p > median_prob
}
def run_qv_test(num_qubits: int, shots: int = 10000, seed: int = 0) -> float:
"""Run a single QV test and return the heavy output fraction."""
qv_circuit = core.QV(num_qubits, num_qubits, seed)
prog = core.QProg()
prog.append(qv_circuit)
for q in range(num_qubits):
prog.append(core.measure(q, q))
qvm = core.CPUQVM()
# Ideal probabilities for identifying heavy outputs
qvm.run(prog, 100000)
ideal_probs = qvm.result().get_prob_dict()
heavy = compute_heavy_outputs(ideal_probs)
# Sampled distribution
qvm.run(prog, shots)
counts = qvm.result().get_counts()
total = sum(counts.values())
heavy_count = sum(c for bs, c in counts.items() if bs in heavy)
return heavy_count / total
# Sweep widths from 2 to 7
print(f"{'Width':>6} {'Depth':>6} {'HOF':>8} {'Pass?':>6} {'QV':>8}")
print("-" * 40)
max_passing_width = 0
for n in range(2, 8):
# Use multiple seeds for statistical robustness
hofs = []
for seed in range(10):
hof = run_qv_test(n, shots=5000, seed=seed * 100 + n)
hofs.append(hof)
avg_hof = np.mean(hofs)
passed = avg_hof > 2 / 3
if passed:
max_passing_width = n
print(f"{n:>6} {n:>6} {avg_hof:>8.4f} {'PASS' if passed else 'FAIL':>6} {2**n:>8}")
quantum_volume = 2 ** max_passing_width if max_passing_width > 0 else 1
print(f"\nQuantum Volume (ideal simulator): {quantum_volume}")Generating custom random circuits with random_qcircuit
For benchmarking beyond the standard QV protocol, use core.random_qcircuit() to generate circuits from a specific gate set. The supported gate type strings are: "X", "Y", "Z", "H", "S", "T", "RX", "RY", "RZ", "U1", "U2", "U3", "U4", "P", "I", "ISWAP", "SQISWAP", "CPHASE", "RPHI", "CU", "SWAP", "X1", "Y1", "Z1", "RZZ", "RYY", "RXX", "RZX", "ECHO", "IDLE", "CNOT", "CZ", "MS". When the gate type list is empty, all types are used by default.
"""Generate random circuits with a custom gate set."""
from pyqpanda3 import core
# Define qubits and a custom gate set
qubits = list(range(5))
depth = 20
gate_set = ["H", "X", "RX", "RY", "RZ", "CNOT", "SWAP"]
# Generate a random circuit
circuit = core.random_qcircuit(qubits, depth, gate_set)
# Build a program with measurements
prog = core.QProg()
prog.append(circuit)
for q in qubits:
prog.append(core.measure(q, q))
# Run and get results
qvm = core.CPUQVM()
qvm.run(prog, 5000)
result = qvm.result()
prob_dict = result.get_prob_dict()
print(f"Number of possible outcomes: {len(prob_dict)}")Running QV with noise simulation
Real devices are noisy. Simulate the effect of noise on QV performance by applying depolarizing errors to all gates.
"""Quantum Volume test with a depolarizing noise model."""
from pyqpanda3 import core
import numpy as np
def compute_heavy_outputs(prob_dict: dict) -> set:
probabilities = np.array(list(prob_dict.values()))
median_prob = np.median(probabilities)
return {bs for bs, p in prob_dict.items() if p > median_prob}
# --- Build QV circuit ---
num_qubits = 4
qv_circuit = core.QV(num_qubits, num_qubits, seed=42)
prog = core.QProg()
prog.append(qv_circuit)
for q in range(num_qubits):
prog.append(core.measure(q, q))
# --- Ideal reference (no noise) ---
qvm_ideal = core.CPUQVM()
qvm_ideal.run(prog, 100000)
ideal_probs = qvm_ideal.result().get_prob_dict()
heavy = compute_heavy_outputs(ideal_probs)
# --- Noisy simulation ---
error_rate = 0.02 # 2% depolarizing error per gate
noise_model = core.NoiseModel()
dep_error = core.depolarizing_error(error_rate)
noise_model.add_all_qubit_quantum_error(dep_error, core.GateType.CNOT)
# Single-qubit gate noise
dep_error_1q = core.depolarizing_error(error_rate * 0.5)
noise_model.add_all_qubit_quantum_error(dep_error_1q, core.GateType.H)
noise_model.add_all_qubit_quantum_error(dep_error_1q, core.GateType.X)
qvm_noisy = core.CPUQVM()
qvm_noisy.run(prog, 10000, noise_model)
noisy_counts = qvm_noisy.result().get_counts()
total = sum(noisy_counts.values())
heavy_count = sum(c for bs, c in noisy_counts.items() if bs in heavy)
hof_noisy = heavy_count / total
print(f"Error rate: {error_rate:.1%}")
print(f"HOF (noisy): {hof_noisy:.4f}")
print(f"Threshold: 0.6667")
print(f"Result: {'PASS' if hof_noisy > 2 / 3 else 'FAIL'}")Sweeping noise levels to find the QV failure threshold
Determine the maximum tolerable error rate before the QV test fails. This is critical for hardware engineering: it tells you the gate fidelity targets your device must meet.
"""Sweep noise levels to find the QV failure threshold."""
from pyqpanda3 import core
import numpy as np
def compute_heavy_outputs(prob_dict: dict) -> set:
probabilities = np.array(list(prob_dict.values()))
median_prob = np.median(probabilities)
return {bs for bs, p in prob_dict.items() if p > median_prob}
def qv_hof_with_noise(num_qubits: int, error_rate: float, seed: int) -> float:
"""Run a QV test with depolarizing noise and return HOF."""
qv_circuit = core.QV(num_qubits, num_qubits, seed)
prog = core.QProg()
prog.append(qv_circuit)
for q in range(num_qubits):
prog.append(core.measure(q, q))
qvm = core.CPUQVM()
# Ideal reference
qvm.run(prog, 100000)
heavy = compute_heavy_outputs(qvm.result().get_prob_dict())
# Noisy execution
noise = core.NoiseModel()
if error_rate > 0:
dep_2q = core.depolarizing_error(error_rate)
noise.add_all_qubit_quantum_error(dep_2q, core.GateType.CNOT)
dep_1q = core.depolarizing_error(error_rate * 0.5)
noise.add_all_qubit_quantum_error(dep_1q, core.GateType.H)
qvm.run(prog, 10000, noise)
counts = qvm.result().get_counts()
total = sum(counts.values())
return sum(c for bs, c in counts.items() if bs in heavy) / total
# Sweep error rates for QV-4
num_qubits = 4
error_rates = np.arange(0, 0.12, 0.01)
num_trials = 5
print(f"QV-{num_qubits} noise threshold sweep")
print(f"{'Error Rate':>12} {'Avg HOF':>10} {'Std Dev':>10} {'Pass?':>6}")
print("-" * 44)
threshold = None
for rate in error_rates:
hofs = [
qv_hof_with_noise(num_qubits, rate, seed=seed * 7 + num_qubits)
for seed in range(num_trials)
]
avg = np.mean(hofs)
std = np.std(hofs)
passed = avg > 2 / 3
if not passed and threshold is None:
threshold = rate
print(f"{rate:>11.2%} {avg:>10.4f} {std:>10.4f} {'PASS' if passed else 'FAIL':>6}")
if threshold is not None:
print(f"\nNoise threshold: ~{threshold:.2%} depolarizing error")
else:
print("\nDevice passes at all tested noise levels.")Explanation
Mathematical definition of Quantum Volume
The formal definition, as established by Cross et al. (IBM, 2019), is:
where:
is the number of qubits (circuit width) is the achievable circuit depth at width is the heavy output probability, estimated with at least confidence above
In the standard protocol,
The choice of
The Heavy Output Generation (HOG) problem
The QV benchmark is connected to a computational complexity problem called the Heavy Output Generation (HOG) problem. Informally:
Given a random quantum circuit, produce an output that is more likely than average (a "heavy" output).
Classically, sampling heavy outputs from a random quantum circuit is believed to be hard -- it requires simulating the circuit, which takes exponential time in the number of qubits. A quantum device that can reliably produce heavy outputs is demonstrating a computational advantage over straightforward classical simulation.
The connection to computational complexity gives QV a stronger foundation than purely empirical benchmarks. When a quantum computer achieves
Why QV captures more than qubit count
A device's Quantum Volume can be limited by any of several bottlenecks, which is precisely what makes it a useful holistic metric.
Gate fidelity bottleneck. Consider a 20-qubit device with
Connectivity bottleneck. On a linear-chain device, implementing arbitrary SU(4) pairs requires SWAP gates, each adding 3 extra CNOTs and accumulating errors. A fully connected device executes the same circuit with far fewer gates.
Compiler bottleneck. Different compilers decompose SU(4) unitaries into native gates with varying efficiency. Better compilation produces shorter circuits, directly improving the heavy output fraction. QV therefore measures software quality as well as hardware capability.
Relationship to circuit layer fidelity
A useful way to understand QV is through the concept of layer fidelity
Setting
This means a device passes QV-
The QV circuit structure
The core.QV(num_qubit, depth, seed) function generates a circuit with a specific structure defined by the IBM QV protocol:
Each layer:
- Randomly permutes the
qubits using seed-controlled shuffling - Groups qubits into
pairs: , , etc. - Applies a random SU(4) unitary (a
special unitary matrix) to each pair via QOracle
If
- Drawing a random
complex matrix - Computing its singular value decomposition:
- Taking
as the random unitary - Normalizing the determinant to 1 (ensuring SU(4), not just U(4))
Practical interpretation of QV numbers
| QV | Meaning |
|---|---|
| Can reliably run 1-qubit circuits of depth 1 | |
| Equivalent to a perfect 3-qubit device | |
| Can handle circuits comparable to a perfect 5-qubit machine | |
| State-of-the-art superconducting devices (2023-2024) | |
| Requires extremely high-fidelity gates at scale |
When choosing hardware for your application:
- Optimization algorithms (QAOA): Need QV at least
where is the problem size. A 10-variable QAOA needs QV . - Variational algorithms (VQE): More forgiving because shallow circuits are used, but higher QV still improves solution quality.
- Quantum error correction: Requires QV well above the logical qubit count, since QEC circuits involve many ancilla qubits and deep circuits.
Multiple circuit trials and statistical confidence
In an official QV certification, you run multiple circuits (typically 100-200) per width and compute the one-sided confidence interval on the mean heavy output fraction. The device passes if the lower bound of the
where
"""Multi-trial QV test with confidence intervals."""
from pyqpanda3 import core
import numpy as np
def compute_heavy_outputs(prob_dict: dict) -> set:
probabilities = np.array(list(prob_dict.values()))
median_prob = np.median(probabilities)
return {bs for bs, p in prob_dict.items() if p > median_prob}
def single_qv_hof(num_qubits: int, seed: int, shots: int = 10000) -> float:
qv_circuit = core.QV(num_qubits, num_qubits, seed)
prog = core.QProg()
prog.append(qv_circuit)
for q in range(num_qubits):
prog.append(core.measure(q, q))
qvm = core.CPUQVM()
qvm.run(prog, 100000)
heavy = compute_heavy_outputs(qvm.result().get_prob_dict())
qvm.run(prog, shots)
counts = qvm.result().get_counts()
total = sum(counts.values())
return sum(c for bs, c in counts.items() if bs in heavy) / total
# Official-style QV test with confidence intervals
num_qubits = 4
num_circuits = 50
seeds = [i * 31 + num_qubits for i in range(num_circuits)]
hofs = [single_qv_hof(num_qubits, s) for s in seeds]
mean_hof = np.mean(hofs)
std_hof = np.std(hofs, ddof=1)
ci_lower = mean_hof - 2 * std_hof / np.sqrt(num_circuits)
print(f"QV-{num_qubits} over {num_circuits} circuits:")
print(f" Mean HOF: {mean_hof:.4f}")
print(f" Std dev: {std_hof:.4f}")
print(f" 2-sigma CI: [{ci_lower:.4f}, {mean_hof + 2 * std_hof / np.sqrt(num_circuits):.4f}]")
print(f" CI lower > 0.5? {'PASS' if ci_lower > 0.5 else 'FAIL'}")Comparing Compilers
Quantum Volume is not just a hardware metric. It also measures compiler quality, because the same logical circuit can be decomposed into native gates in many different ways. A better compiler produces fewer physical gates, reducing cumulative error and improving the heavy output fraction.
You can use QV to quantify compiler efficiency by generating the same logical QV circuit and comparing results under different transpilation strategies. The key insight is that the logical circuit (a sequence of SU(4) unitaries) is fixed -- only the decomposition into native gates changes.
Consider three compilation strategies:
- Naive decomposition: Each SU(4) is decomposed using a fixed gate sequence (e.g., 3 CNOTs plus single-qubit rotations). This is simple but may not be optimal for the target topology.
- Topology-aware mapping: The compiler inserts SWAP gates to match the circuit connectivity to the hardware coupling map. Fewer SWAPs means fewer physical gates.
- Optimized synthesis: Advanced compilers use approximate synthesis, gate cancellation, and commutativity-aware routing to minimize the total gate count.
"""Compare compiler efficiency using QV circuits under different transpilation strategies."""
from pyqpanda3 import core
import numpy as np
def compute_heavy_outputs(prob_dict: dict) -> set:
probabilities = np.array(list(prob_dict.values()))
median_prob = np.median(probabilities)
return {bs for bs, p in prob_dict.items() if p > median_prob}
def count_gates(circuit) -> dict:
"""Walk the circuit and count gate types.
Returns a dictionary mapping gate type names to occurrence counts.
"""
gate_counts = {}
# Use the circuit's built-in gate counting if available,
# otherwise parse the circuit description
info = circuit.count_ops()
for gate_name, count in info.items():
gate_counts[gate_name] = gate_counts.get(gate_name, 0) + count
return gate_counts
def run_qv_with_noise(num_qubits: int, seed: int,
two_qubit_error: float, one_qubit_error: float,
shots: int = 10000) -> float:
"""Run a QV test with specified noise levels and return HOF."""
qv_circuit = core.QV(num_qubits, num_qubits, seed)
prog = core.QProg()
prog.append(qv_circuit)
for q in range(num_qubits):
prog.append(core.measure(q, q))
qvm = core.CPUQVM()
# Ideal reference for heavy output identification
qvm.run(prog, 100000)
heavy = compute_heavy_outputs(qvm.result().get_prob_dict())
# Build noise model
noise = core.NoiseModel()
if two_qubit_error > 0:
dep_2q = core.depolarizing_error(two_qubit_error)
noise.add_all_qubit_quantum_error(dep_2q, core.GateType.CNOT)
if one_qubit_error > 0:
dep_1q = core.depolarizing_error(one_qubit_error)
noise.add_all_qubit_quantum_error(dep_1q, core.GateType.H)
qvm.run(prog, shots, noise)
counts = qvm.result().get_counts()
total = sum(counts.values())
return sum(c for bs, c in counts.items() if bs in heavy) / total
# Compare three simulated "compiler" strategies by varying effective
# two-qubit gate counts through different noise scaling.
# Strategy A: baseline (1x two-qubit gate cost)
# Strategy B: improved routing (0.7x two-qubit gate cost via fewer SWAPs)
# Strategy C: optimized synthesis (0.5x two-qubit gate cost)
num_qubits = 5
base_2q_error = 0.03
base_1q_error = 0.005
seeds = [10, 20, 30, 40, 50]
strategies = {
"Naive decomposition": (base_2q_error, base_1q_error),
"Topology-aware routing": (base_2q_error * 0.7, base_1q_error * 0.8),
"Optimized synthesis": (base_2q_error * 0.5, base_1q_error * 0.6),
}
print(f"Compiler comparison for QV-{num_qubits} (base 2Q error: {base_2q_error:.1%})")
print(f"{'Strategy':<25} {'Eff. 2Q Error':>14} {'Avg HOF':>10} {'Pass?':>6}")
print("-" * 60)
for name, (err_2q, err_1q) in strategies.items():
hofs = [
run_qv_with_noise(num_qubits, s, err_2q, err_1q)
for s in seeds
]
avg_hof = np.mean(hofs)
print(f"{name:<25} {err_2q:>13.2%} {avg_hof:>10.4f} {'PASS' if avg_hof > 2/3 else 'FAIL':>6}")The output demonstrates that compiler improvements directly translate to QV performance gains. A 50% reduction in effective two-qubit gate error (from optimized synthesis) can shift a device from failing to passing a given QV level. This is why quantum software teams invest heavily in compiler optimization.
Key observations for compiler benchmarking with QV:
Gate count matters more than circuit depth. A wider but shallower circuit can outperform a narrow deep one if the total gate count is lower. QV captures this because cumulative gate error is what determines the heavy output fraction.
SWAP overhead is the dominant cost. On limited-connectivity hardware, routing can add 3~10x more two-qubit gates than the logical circuit requires. Measuring QV with and without SWAP-aware compilation quantifies this overhead directly.
Single-qubit gate cancellation provides diminishing returns. Most modern compilers already eliminate adjacent inverse gates. The remaining gains come from cross-layer optimization, which QV can quantify by comparing HOF before and after.
Use multiple seeds. A single QV circuit may accidentally favor one compiler over another. Average over at least 10-20 random circuits to get a reliable comparison.
Application-Specific Benchmarking
The standard QV protocol uses random SU(4) unitaries, which exercise the full gate set uniformly. However, real quantum applications often use specific gate patterns: variational algorithms rely on parameterized rotations, error correction circuits are Clifford-heavy, and quantum chemistry uses many controlled rotations. You can use core.random_qcircuit() to benchmark gate sets that match your actual workload.
This approach answers a different question than standard QV. Instead of "what is the maximum general-purpose circuit this device can handle?", it asks "how well does this device run the types of circuits my application actually uses?"
Common application-specific gate sets include:
- Clifford circuits:
["H", "S", "CNOT"]-- relevant for error correction, state preparation, and classical simulation benchmarks. Clifford circuits can be efficiently simulated classically (Gottesman-Knill theorem), so they test hardware without quantum advantage claims. - NISQ variational circuits:
["H", "RX", "RY", "RZ", "CNOT"]-- the bread and butter of VQE and QAOA. These mix parameterized rotations with entangling gates. - T-gate heavy circuits:
["H", "T", "CNOT"]-- relevant for fault-tolerant compilation, where T-gate count determines resource requirements. - Hardware-native gates: Use the gate set that matches your device's native operations (e.g.,
["RZ", "SX", "CNOT"]for IBM,["RX", "RZ", "ISWAP"]for Google) to measure raw hardware capability without compilation overhead.
"""Benchmark different gate sets using custom random circuits."""
from pyqpanda3 import core
import numpy as np
def compute_heavy_outputs(prob_dict: dict) -> set:
probabilities = np.array(list(prob_dict.values()))
median_prob = np.median(probabilities)
return {bs for bs, p in prob_dict.items() if p > median_prob}
def benchmark_gate_set(qubits: list, depth: int, gate_set: list,
model=None, num_trials: int = 10,
shots: int = 10000) -> dict:
"""Run random circuit benchmarking for a specific gate set.
Args:
qubits: List of qubit indices to use.
depth: Circuit depth.
gate_set: List of gate type strings.
noise_model: Optional noise model for realistic simulation.
num_trials: Number of random circuits to average over.
shots: Measurement shots per circuit.
Returns:
Dictionary with benchmark results.
"""
hofs = []
circuit_depths = []
for seed in range(num_trials):
circuit = core.random_qcircuit(qubits, depth, gate_set)
prog = core.QProg()
prog.append(circuit)
for q in qubits:
prog.append(core.measure(q, q))
qvm = core.CPUQVM()
# Ideal reference
qvm.run(prog, 100000)
ideal_probs = qvm.result().get_prob_dict()
heavy = compute_heavy_outputs(ideal_probs)
# Run with or without noise
if noise_model is not None:
qvm.run(prog, shots, noise_model)
else:
qvm.run(prog, shots)
counts = qvm.result().get_counts()
total = sum(counts.values())
hof = sum(c for bs, c in counts.items() if bs in heavy) / total
hofs.append(hof)
return {
"gate_set": gate_set,
"num_qubits": len(qubits),
"depth": depth,
"mean_hof": np.mean(hofs),
"std_hof": np.std(hofs),
"min_hof": np.min(hofs),
"max_hof": np.max(hofs),
}
# Define gate sets representing different application domains
gate_sets = {
"Clifford": ["H", "S", "CNOT"],
"Variational (NISQ)": ["H", "RX", "RY", "RZ", "CNOT"],
"T-gate heavy": ["H", "T", "CNOT"],
"Full rotation set": ["H", "RX", "RY", "RZ", "CNOT", "SWAP"],
"Hardware-native (IBM-like)": ["RZ", "H", "CNOT"],
}
# Benchmark parameters
num_qubits = 4
depth = 20
num_trials = 15
# Create a noise model to see differentiation between gate sets
noise = core.NoiseModel()
dep_2q = core.depolarizing_error(0.02)
noise.add_all_qubit_quantum_error(dep_2q, core.GateType.CNOT)
dep_1q = core.depolarizing_error(0.005)
noise.add_all_qubit_quantum_error(dep_1q, core.GateType.H)
noise.add_all_qubit_quantum_error(dep_1q, core.GateType.RX)
qubits = list(range(num_qubits))
print(f"Application-specific benchmarking ({num_qubits} qubits, depth {depth})")
print(f"With depolarizing noise: 2Q=2.0%, 1Q=0.5%")
print(f"{'Gate Set':<28} {'Mean HOF':>10} {'Std':>8} {'Min':>8} {'Max':>8}")
print("-" * 68)
for name, gate_set in gate_sets.items():
result = benchmark_gate_set(
qubits, depth, gate_set,
model=noise, num_trials=num_trials
)
print(f"{name:<28} {result['mean_hof']:>10.4f} {result['std_hof']:>8.4f} "
f"{result['min_hof']:>8.4f} {result['max_hof']:>8.4f}")
# Also run without noise to confirm all gate sets produce valid heavy outputs
print(f"\n--- Ideal (no noise) ---")
print(f"{'Gate Set':<28} {'Mean HOF':>10} {'Std':>8}")
print("-" * 48)
for name, gate_set in gate_sets.items():
result = benchmark_gate_set(
qubits, depth, gate_set,
model=None, num_trials=num_trials
)
print(f"{name:<28} {result['mean_hof']:>10.4f} {result['std_hof']:>8.4f}")Interpreting application-specific benchmark results:
Clifford circuits tend to produce sharper (lower-entropy) output distributions because they preserve stabilizer states. This means the heavy output set is more concentrated, and the HOF is typically higher than for general random circuits at the same depth and noise level. If your application is Clifford-based, your device may perform better than the standard QV number suggests.
T-gate heavy circuits are sensitive to single-qubit gate fidelity because T gates require precise rotation angles. A device with good two-qubit gates but poor single-qubit gates will show lower HOF on T-heavy benchmarks compared to the standard QV result.
Variational gate sets (with parameterized rotations) are the most representative of NISQ algorithm performance. They exercise both single-qubit and two-qubit gates, making the HOF a good predictor of variational algorithm convergence quality.
Hardware-native gate sets remove compilation overhead entirely. Comparing the HOF of hardware-native gates against compiled gates reveals the cost of compilation. If the gap is small, your compiler is efficient; if large, there is room for improvement.
Summary
The Quantum Volume benchmark provides a single, hardware-independent metric that reflects the real-world capability of a quantum computer. By using core.QV() to generate standard test circuits and core.random_qcircuit() for custom benchmarks, you can systematically evaluate quantum hardware and identify the bottlenecks that limit performance.