Transpilation Theory

Quantum circuit transpilation is the process of transforming an abstract, hardware-independent quantum program into a form that can be faithfully executed on a specific quantum device. This involves four deeply interconnected problems: optimizing the circuit to reduce gate count and depth, mapping logical qubits onto physical qubits that satisfy connectivity constraints, synthesizing arbitrary unitary operations from a limited native gate set, and routing qubits through SWAP insertions when connectivity requirements are violated.

pyqpanda3 provides a complete transpilation pipeline through the Transpiler class, the decompose function for gate synthesis, and generate_topology for constructing chip coupling maps. This guide explains the theoretical foundations behind each stage.

The Transpilation Pipeline

Before diving into individual topics, it is helpful to understand how the transpilation stages relate to one another. pyqpanda3 applies a sequence of passes, each responsible for one aspect of the transformation.

Each pass reads from and writes to a DAG (Directed Acyclic Graph) representation of the circuit. The DAG captures gate dependencies: a gate can only execute once all of its predecessors on the same qubit wire have completed. The default pass ordering, defined in DefaultTranspilationPasses, is: Init, Layout, Routing, Translation, Optimization (pre- and post-), and Scheduling. The optimization_level parameter controls how aggressively the optimization and routing passes operate.

Circuit Optimization

Why Optimization Matters

Quantum gates are imperfect. Every gate applied introduces error, and two-qubit gates such as CNOT or CZ are typically one to two orders of magnitude noisier than single-qubit gates. Circuit depth is equally critical because qubits suffer decoherence over time. Reducing gate count and circuit depth directly improves the fidelity of computation results.

The goal of circuit optimization is to produce a functionally equivalent circuit that is shorter, shallower, and uses fewer expensive gates, all while preserving the unitary transformation that the original circuit implements.

Optimization Levels in pyqpanda3

The Transpiler accepts an optimization_level parameter that controls the aggressiveness of optimization:

Level	Behavior
0	No optimization. The circuit is translated and routed but not optimized.
1	Basic optimization. Gate merging and simple cancellation are applied.
2	Aggressive optimization. Includes commutativity-aware optimization, peephole analysis, and the decay-based SABRE routing heuristic.

At level 2, the transpiler additionally uses a decay-based cost function during routing (see the Sabre Algorithm section) and applies more thorough gate merging during the translation phase.

Gate Cancellation

The simplest optimization is gate cancellation: when two adjacent gates are inverses of each other, both can be removed. For example, applying an $H$ gate followed immediately by another $H$ gate yields the identity:

H \cdot H = I

Similarly, $X \cdot X = I$ , $C N O T \cdot C N O T = I$ (on the same qubit pair), and $R_{z} (θ) \cdot R_{z} (- θ) = I$ .

pyqpanda3 implements this through the OptimizationPass methods remove_h and remove_cx, which scan the DAG for cancelable pairs. The cancellation is applied iteratively because removing one pair may expose new cancelable pairs.

Gate Merging

Consecutive rotation gates about the same axis on the same qubit can be merged into a single rotation. For example:

R_{x} (α) \cdot R_{x} (β) = R_{x} (α + β)

This generalizes to any rotation axis. pyqpanda3 implements this through merge_rx, merge_ry, merge_rz, and the general merge_q1_gate method. The merging process:

Scans each qubit wire in the DAG sequentially.
When two consecutive single-qubit gates act on the same axis, combines their parameters.
If the combined angle is zero (or below a numerical tolerance), removes the gate entirely.

The merging is particularly effective after routing, because SWAP decomposition introduces sequences of CNOT gates that can cascade into redundant single-qubit operations on adjacent wires.

Commutativity-Aware Optimization

Not all gate pairs that appear adjacent in a circuit actually need to be adjacent. Two gates $G_{1}$ and $G_{2}$ commute if $G_{1} \cdot G_{2} = G_{2} \cdot G_{1}$ . When gates commute, the optimizer can reorder them to create new cancellation or merging opportunities.

pyqpanda3 implements commutativity analysis through the commutation_cancel and check_commutation methods in OptimizationPass. The analysis works on the DAG representation:

For each pair of gates that share a qubit wire, check whether they commute.
If they do, allow the DAG to be restructured so that cancelable or mergeable gates become adjacent.
Apply cancellation and merging on the restructured DAG.

This is especially powerful for circuits that mix single-qubit and two-qubit gates, because many single-qubit rotations commute with the control leg of a CNOT gate. For instance, $R_{z}$ on the control qubit commutes with CNOT:

R_{z} (θ) \otimes I \cdot CNOT = CNOT \cdot R_{z} (θ) \otimes I

Peephole Optimization

Peephole optimization examines small windows (sub-circuits of typically 2-3 gates) and replaces them with equivalent but shorter sequences. This is analogous to the peephole optimization technique used in classical compilers.

In pyqpanda3, peephole optimization is integrated into the BasicTranslationPass. During the merage_and_translation step, small sub-circuits are matched against a library of known equivalences (the EquivalenceLibrary). When a match is found, the sub-circuit is replaced with the more efficient equivalent.

For example, the sequence $H \cdot R_{z} (θ) \cdot H$ can be replaced with $R_{x} (θ)$ :

H \cdot R_{z} (θ) \cdot H = R_{x} (θ)

Template Matching

Template matching generalizes peephole optimization by maintaining a library of circuit identities (templates). Each template is a pair of equivalent circuits: a pattern to search for and a replacement to substitute. The template library includes identities such as:

Gate decomposition templates: $S = R_{z} (π / 2)$ , $T = R_{z} (π / 4)$
Gate equivalence templates: $CNOT = (I \otimes H) \cdot CZ \cdot (I \otimes H)$
Rotation simplification templates: $R_{z} (θ_{1}) \cdot R_{z} (θ_{2}) = R_{z} (θ_{1} + θ_{2})$

The EquivalenceLibrary used by BasicTranslationPass stores these templates and performs efficient matching during the translation phase.

Unitary Synthesis Optimization

The unitary_synthesis method in OptimizationPass takes a different approach: it extracts small sub-circuits (typically 1-2 qubits), computes their combined unitary matrix, and then re-synthesizes the unitary using the minimum number of native gates. This can be more powerful than peephole optimization because it is not limited to known templates; it can find optimal decompositions for arbitrary sub-circuits.

The process:

Identify a sub-circuit on 1 or 2 qubits in the DAG.
Compute the combined unitary $U = G_{n} \dots G_{2} \cdot G_{1}$ .
Decompose $U$ using the KAK or Euler decomposition (see Gate Synthesis).
Replace the original sub-circuit with the optimized decomposition.

This approach can reduce multi-gate sub-circuits to just a few native gates, even when no individual cancellation or merging is apparent.

Optimization Pass Flow

In pyqpanda3, optimization is applied at multiple points in the transpilation pipeline. The DefaultTranspilationPasses class defines the pass ordering, which includes both a pre-optimization pass and a post-optimization pass:

Pre-optimization (OptimizationPass with "pre" mode): Applied before routing to simplify the circuit and reduce the number of gates that the routing algorithm must process. This includes gate cancellation, merging, and oracle decomposition via decompose_oracle.
Post-optimization: Applied after routing and translation to clean up redundant gates introduced by SWAP decomposition. This is where commutativity-aware optimization is most effective, because SWAP insertion often creates patterns like:

CNOT (q_{a}, q_{b}) \cdot R_{z} (θ) (q_{b}) \cdot CNOT (q_{a}, q_{b})

which can be simplified if the $R_{z}$ commutes through the CNOT.

Translation-phase optimization: The BasicTranslationPass applies gate merging during the translation from circuit gates to native gates. The merage_and_translation method combines consecutive single-qubit gates and chooses the most efficient decomposition for each two-qubit gate.

The interaction between these optimization stages creates a compounding effect: pre-optimization reduces circuit size, which speeds up routing; routing produces fewer SWAPs on smaller circuits; and post-optimization cleans up any remaining redundancy.

Cost Model for Optimization Decisions

Optimization decisions are guided by a cost model that weights different gate types. Two-qubit gates are significantly more expensive than single-qubit gates on real hardware:

cost (CNOT) \approx 10 \times cost (R_{z})

cost (CZ) \approx 10 \times cost (R_{z})

This cost asymmetry motivates the optimizer to prefer transformations that reduce two-qubit gate count even at the expense of adding single-qubit gates. For example, the optimizer might replace a CNOT with two single-qubit gates if the overall circuit becomes more faithful.

When a ChipBackend is provided to the transpiler, the cost model can incorporate device-specific calibration data. The compensate_angle_map in ChipBackend stores per-edge angle corrections for two-qubit gates, and the compensate_pulse flag enables pulse-level compensation during the translation phase.

Topology Mapping

The Qubit Mapping Problem

Real quantum processors do not have all-to-all connectivity. A superconducting quantum chip typically arranges qubits in a 2D grid or similar topology where each qubit is directly connected to only a few neighbors. A two-qubit gate can only be applied to a pair of qubits that are physically adjacent in this connectivity graph.

The qubit mapping problem is the task of assigning logical (virtual) qubits from the circuit to physical qubits on the chip such that every two-qubit gate in the circuit can be executed on physically adjacent qubits. When this is not possible with the current assignment, SWAP gates must be inserted to move qubit states to adjacent positions.

Coupling Maps

A coupling map (or topology graph) describes the physical connectivity of a quantum chip. It is represented as a graph $G = (V, E)$ where $V$ is the set of physical qubits and $E$ is the set of edges, each indicating that a two-qubit gate can be directly applied between the connected qubits.

pyqpanda3 represents coupling maps as a list of edges, where each edge is a pair [u, v]:

python

# A linear topology for 4 qubits: 0 -- 1 -- 2 -- 3
topology = [[0, 1], [1, 2], [2, 3]]

The generate_topology function creates common topology patterns:

Topology Type	Description	Example (4 qubits)
`"linear"`	Nearest-neighbor chain	`[[0,1], [1,2], [2,3]]`
`"circular"`	Ring (adds wraparound edge)	`[[0,1], [1,2], [2,3], [3,0]]`
`"square"`	2D grid	Grid connectivity

Why Topology Matters

Consider a simple circuit that applies a CNOT between qubits 0 and 3 on a linear topology 0 -- 1 -- 2 -- 3. Qubits 0 and 3 are not directly connected, so the CNOT cannot be executed as-is. The transpiler must insert SWAP gates to bring the logical states onto adjacent physical qubits.

A SWAP gate itself decomposes into three CNOT gates:

SWAP (q_{i}, q_{j}) = CNOT (q_{i}, q_{j}) \cdot CNOT (q_{j}, q_{i}) \cdot CNOT (q_{i}, q_{j})

Each inserted SWAP adds three CNOT gates to the circuit, increasing both depth and error rate. Therefore, finding a good initial mapping and minimizing the number of SWAP insertions is critical for producing efficient hardware-compatible circuits.

Distance Matrix

Given a coupling map $G$ , the transpiler computes a distance matrix $D$ where $D [i] [j]$ is the shortest-path distance between physical qubits $i$ and $j$ in the topology graph. This matrix is computed using Dijkstra's algorithm when the TranspilationParameter is constructed:

D [i] [j] = min_{paths} | p (i \to j) |

The distance matrix is used extensively by the routing algorithm to evaluate how far the current qubit mapping is from satisfying each gate's connectivity requirement. A gate $g$ acting on logical qubits $q_{a}$ and $q_{b}$ is executable if and only if $D [π (q_{a})] [π (q_{b})] = 1$ , where $π$ is the current mapping from logical to physical qubits.

The distance matrix also plays a role in the initial layout. The transpiler uses DAG::get_largest_path() to find the longest shortest path in the topology graph. Circuit qubits are then mapped along this path to maximize the number of directly adjacent pairs available before any SWAP insertion.

Adjacency and Connectivity Properties

The coupling map has several important properties that affect transpilation:

Connectedness: The topology graph must be connected. If the graph has disconnected components, the transpiler detects this via largest_connected_component() and throws an error if the circuit requires qubit interactions across components.

Largest path: The number of qubits in the circuit cannot exceed the length of the longest shortest path in the topology graph. pyqpanda3 validates this constraint:

| Q_{circuit} | \leq | p_{longest} (G) |

If this is violated, the transpiler raises an error, because no mapping can satisfy all connectivity requirements without exceeding the available physical qubits.

Edge direction: The coupling map is treated as undirected. An edge [u, v] means a two-qubit gate can be applied between qubits $u$ and $v$ in either orientation (control or target).

Initial Mapping Strategies

The quality of the initial mapping has a large impact on the number of SWAP gates needed during routing. pyqpanda3 supports several strategies:

Trivial mapping: Logical qubit $i$ maps to physical qubit $i$ . Simple but often suboptimal.
Largest-path mapping: Circuit qubits are mapped to the longest path in the topology graph. This maximizes the number of directly connected qubit pairs available.
SabrePreLayout: A pre-layout optimization that uses graph matching to find an initial mapping minimizing the number of extra edges needed. It works by:
- Extracting the interaction graph of the circuit (which pairs of qubits share two-qubit gates).
- Finding a subgraph isomorphism between the circuit's interaction graph and the topology graph.
- Using the minimize_extra_edges method to refine the mapping.
User-specified mapping: The caller can provide init_mapping as a dictionary {virtual_qubit: physical_qubit} to the transpile() method.

When no initial mapping is provided, pyqpanda3 runs SABRE's forward-backward-forward traversal (see below) to discover a good mapping automatically.

Routing Algorithms Overview

Routing is the process of inserting SWAP gates to satisfy connectivity constraints. Several approaches exist:

Algorithm	Strategy	Complexity	Quality
Basic SWAP	Greedy: insert SWAPs to satisfy one gate at a time	$O (\| G \|^{2})$	Low
Stochastic SWAP	Randomized SWAP insertion with simulated annealing	$O (\| G \|^{2} \cdot k)$	Medium
SABRE	Heuristic bidirectional search with lookahead	$O (\| G \|^{2})$	High

pyqpanda3 uses the SABRE algorithm as its default routing method. SABRE is described in detail in the Sabre Algorithm section below.

Gate Synthesis

The Native Gate Set Problem

Every quantum processor supports a finite set of native gates. For superconducting qubits, this is typically a set like ${R_{z}, X_{π / 2}, CZ}$ or ${U_{3}, CNOT}$ . However, quantum algorithms are designed using a richer gate set that may include arbitrary rotations, controlled operations, and multi-qubit unitaries.

Gate synthesis is the process of decomposing an arbitrary unitary operation into a sequence of gates from the target native gate set. The decompose function in pyqpanda3 performs this decomposition.

Euler Decomposition for Single-Qubit Gates

Any single-qubit unitary $U \in S U (2)$ can be decomposed into three rotations. The most common decompositions are:

ZYZ Decomposition

U = e^{i γ} R_{z} (ϕ) \cdot R_{y} (θ) \cdot R_{z} (λ)

where $γ$ is a global phase, and $θ$ , $ϕ$ , $λ$ are rotation angles determined by the matrix entries:

θ = 2 \arctan (\frac{| U_{10} |}{| U_{00} |})

ϕ = \arg (U_{11}) + \arg (U_{10}) - \arg (det U)

λ = \arg (U_{11}) - \arg (U_{10})

γ = \frac{1}{2} \arg (det U)

pyqpanda3 implements this in params_zyz_inner within the KAK synthesis module.

ZXZ Decomposition

Similarly, any single-qubit unitary can be written as:

U = e^{i γ} R_{z} (α) \cdot R_{x} (β) \cdot R_{z} (γ)

The ZXZ decomposition is useful when the target gate set prefers $R_{x}$ rotations over $R_{y}$ rotations.

The choice of decomposition affects the number of native gates needed. For a gate set ${R_{z}, X_{π / 2}}$ where $R_{x} (θ) = R_{z} (- π / 2) \cdot R_{y} (θ) \cdot R_{z} (π / 2)$ , the ZYZ decomposition is more efficient because it uses $R_{y}$ rotations directly, which can be converted to $R_{z}$ and $X_{π / 2}$ sequences with minimal overhead.

KAK Decomposition for Two-Qubit Gates

Any two-qubit unitary $U \in S U (4)$ can be decomposed using the KAK decomposition (also known as the Cartan or Weyl decomposition):

U = (K_{1}^{L} \otimes K_{1}^{R}) \cdot U_{d} (a, b, c) \cdot (K_{2}^{L} \otimes K_{2}^{R})

where:

$K_{1}^{L}, K_{1}^{R}, K_{2}^{L}, K_{2}^{R} \in S U (2)$ are single-qubit unitaries.
$U_{d} (a, b, c)$ is the non-local part parameterized by three real numbers $a \geq b \geq | c | \geq 0$ .

The non-local interaction matrix is:

U_{d} (a, b, c) = \exp [i (c \cdot I \otimes Z + b \cdot Z \otimes I + a \cdot Z \otimes Z)]

More explicitly:

U_{d} (a, b, c) = (\begin{matrix} e^{i c} \cos (a - b) & 0 & 0 & i e^{i c} \sin (a - b) \\ 0 & e^{- i c} \cos (a + b) & i e^{- i c} \sin (a + b) & 0 \\ 0 & i e^{- i c} \sin (a + b) & e^{- i c} \cos (a + b) & 0 \\ i e^{i c} \sin (a - b) & 0 & 0 & e^{i c} \cos (a - b) \end{matrix})

The triple $(a, b, c)$ lies in the Weyl chamber, a tetrahedral region that classifies all equivalence classes of two-qubit entangling operations.

Weyl Chamber Classification

The values of $(a, b, c)$ determine the entangling power of the operation:

Region	$(a, b, c)$	Gate Type
Identity-like	$(0, 0, 0)$	Product (no entanglement)
CNOT-like	$(π / 4, 0, 0)$	Locally equivalent to CNOT
SWAP-like	$(π / 2, 0, 0)$	Locally equivalent to SWAP
Double-CNOT	$(π / 4, π / 4, 0)$	Locally equivalent to two CNOTs
General	$(a, b, c)$	Requires up to 3 CNOT + CZ gates

pyqpanda3 implements the TwoQubitWeylDecomposition class and the TwoQubitBasisDecomposer singleton. The decomposer uses the Weyl decomposition to determine the minimum number of basis gates needed:

0 basis gates: The target is a product operation (no entanglement).
1 basis gate: The target is locally equivalent to the basis gate.
2 basis gates: The target can be reached using two applications of the basis gate (for supercontrolled basis gates like CNOT or CZ).
3 basis gates: The most general case, requiring three applications.

Specialization

pyqpanda3 further optimizes by specializing the decomposition for specific gate types. The TwoQubitWeylDecomposition class includes specialized methods:

IdEquivSpecialize: For near-identity operations (0 basis gates).
SWAPEquivSpecialize: For operations near SWAP.
ControlledEquivSpecialize: For operations near controlled gates.
fSimaabEquivSpecialize, fSimabbEquivSpecialize, fSimabmbEquivSpecialize: For fSim-type operations.
GeneralSpecialize: For the general case.

Quantum Shannon Decomposition (QSD)

For unitaries on more than two qubits, pyqpanda3 uses Quantum Shannon Decomposition (QSD). QSD recursively decomposes an $n$ -qubit unitary into smaller unitaries, controlled gates, and multiplexed rotations.

The key identity underlying QSD is the Cosine-Sine Decomposition (CSD). Given a $2^{n} \times 2^{n}$ unitary $U$ , it can be partitioned as:

U = (\begin{matrix} U_{1} & 0 \\ 0 & U_{2} \end{matrix}) (\begin{matrix} \cos Θ & - \sin Θ \\ \sin Θ & \cos Θ \end{matrix}) (\begin{matrix} V_{1} & 0 \\ 0 & V_{2} \end{matrix})

where $U_{1}, U_{2}, V_{1}, V_{2}$ are $2^{n - 1} \times 2^{n - 1}$ unitaries and $Θ$ is a diagonal matrix of angles. The CSD is implemented in _cosine_sine_decomposition.

The recursive structure of QSD is:

Apply CSD to decompose the $n$ -qubit unitary into two $(n - 1)$ -qubit unitaries, a multiplexed rotation, and two more $(n - 1)$ -qubit unitaries.
Recursively decompose each $(n - 1)$ -qubit unitary.
Decompose the multiplexed rotation using _demultiplex (or _qs_demultiplex / _cs_demultiplex).
The base case is a 2-qubit unitary, handled by the KAK decomposition.

pyqpanda3 supports several decomposition modes via the DecompositionMode enum:

Mode	Description
`QSD`	Quantum Shannon Decomposition (default, most general)
`CSD`	Cosine-Sine Decomposition
`QR`	QR-based decomposition
`HOUSEHOLDER_QR`	Householder QR-based decomposition

The decompose function for matrices uses QSD internally:

python

from pyqpanda3.transpilation import decompose
import numpy as np

# Decompose a 2-qubit unitary into native gates
U = np.array([[1, 0, 0, 0],
              [0, 1, 0, 0],
              [0, 0, 0, 1],
              [0, 0, 1, 0]], dtype=complex)

circuit = decompose(U, qubits=[0, 1], basic_gates=["rx", "ry", "rz", "cx"])

Isometry Decomposition

When the input is not a square unitary but an isometry (a matrix with orthonormal columns), pyqpanda3 provides the IsometryDecomposition class. This is useful for state preparation, where the goal is to prepare a specific quantum state from $| 0 ⟩^{\otimes n}$ .

Two schemes are supported:

CCD (Column-by-Column Decomposition): Decomposes the isometry one column at a time using multi-controlled gates.
Knill: Uses Knill's theorem to decompose the isometry by extending it to a full unitary and then applying QSD.

Solovay-Kitaev Approximation

For gate sets that are not universal in the exact sense (e.g., the Clifford+T set), the Solovay-Kitaev theorem guarantees that any single-qubit unitary can be approximated to precision $ϵ$ using $O (\log^{c} (1 / ϵ))$ gates from the set, where $c \approx 3.97$ . While pyqpanda3 focuses on exact synthesis for continuous gate sets like ${R_{z}, R_{x}, CZ}$ , the Solovay-Kitaev approach becomes relevant when targeting discrete gate sets.

The `basic_gates` Parameter

Both the Transpiler and the decompose function accept a basic_gates parameter specifying the target gate set. For example:

python

# Decompose into the RX, RY, RZ, CX gate set
result = decompose(prog, basic_gates=["rx", "ry", "rz", "cx"])

# Transpile targeting CZ, X1, RZ (typical superconducting gate set)
result = transpiler.transpile(prog, chip_topology_edges=topo,
                              basic_gates=["cz", "x1", "rz"])

When basic_gates is empty, pyqpanda3 uses the default gate set ${CZ, X_{π / 2}, R_{z}}$ , which corresponds to the native gate set of typical superconducting quantum processors.

Gate Count and Circuit Depth

The quality of gate synthesis directly impacts the final circuit metrics. The decompose function for QCircuit applies an optimization pass (OptimizationPass with "pre" mode) before translation to minimize the output gate count. The synthesis pipeline for circuits follows these steps:

Convert the circuit to a DAG representation.
Apply the InitPass to normalize the circuit structure.
Apply the OptimizationPass to cancel and merge gates.
Extract the gate list and collect gate type statistics.
Apply the BasicTranslationPass to translate each gate into the target gate set using the equivalence library and merge consecutive single-qubit gates.

The resulting circuit's gate count depends on both the input unitary and the target gate set. Some gate sets allow more efficient representations than others. For instance, the gate set ${U_{3}, CX}$ can represent any single-qubit operation with a single $U_{3}$ gate, while ${R_{z}, R_{x}, CX}$ may require up to three rotation gates per single-qubit operation.

Matrix Decomposition Validation

The decompose function for matrices performs several validation checks before synthesis:

Square matrix: The input must be a square matrix ( $m = n$ ).
Power of 2: The dimension must be a positive power of 2, i.e., $n = 2^{k}$ for some integer $k \geq 1$ .
Qubit count: If qubits is provided, its length must match $k = \log_{2} (n)$ . If not provided, qubits are auto-assigned as $[0, 1, \dots, k - 1]$ .
Gate name validity: Each name in basic_gates must correspond to a supported gate type. Unknown gate names raise a runtime error.

Sabre Algorithm

Overview

The SABRE (SWAP-based BidiREctional heuristic search) algorithm, introduced by Li et al. in 2019, is the primary routing algorithm used in pyqpanda3. It is a scalable, heuristic approach to the qubit routing problem that combines forward-backward circuit traversal with a lookahead cost function to find good SWAP insertion patterns.

SABRE is the default routing method because it produces high-quality results with polynomial time complexity and scales well to circuits with hundreds of qubits and thousands of gates.

Problem Formulation

Given:

A quantum circuit $C = {g_{1}, g_{2}, \dots, g_{N}}$ where each gate $g_{i}$ acts on qubits from a set $Q$ .
A coupling graph $G = (V, E)$ with $| V | \geq | Q |$ .
A mapping $π : Q \to V$ from logical to physical qubits.

Find a sequence of SWAP operations such that for every two-qubit gate $g_{i}$ acting on qubits $q_{a}, q_{b}$ :

(π (q_{a}), π (q_{b})) \in E

at the time $g_{i}$ is executed, while minimizing the total number of SWAP gates inserted.

The Front Layer

SABRE operates on the concept of a front layer $F$ , which is the set of gates that can be executed immediately (all their predecessors in the DAG have been executed). The algorithm iterates:

Remove from $F$ all gates whose qubit pairs are adjacent in the coupling graph (they can be executed now).
Add new gates to $F$ that become executable after the removed gates.
If $F$ is non-empty, select and apply a SWAP gate that minimizes a heuristic cost function.
Repeat until $F$ is empty.

The Heuristic Cost Function

The core of SABRE is its heuristic cost function for evaluating candidate SWAP gates. pyqpanda3 implements three variants:

Basic Cost Function

The basic cost measures the change in total distance between front-layer gate qubits:

H_{basic} (SWAP (q_{1}, q_{2})) = δ \cdot \sum_{g \in F} [D (π^{'} (f (g)), π^{'} (w (g))) - D (π (f (g)), π (w (g)))]

where $π^{'}$ is the mapping after applying the SWAP, $f (g)$ and $w (g)$ are the first and second qubits of gate $g$ , and $D$ is the distance matrix. The coefficient $δ$ is the maximum decay factor of the two SWAP qubits:

δ = max (decay [q_{1}], decay [q_{2}])

Lookahead Cost Function

The lookahead cost augments the basic cost with information from the extended set $E$ , which contains gates that are not yet in the front layer but will be soon:

H (SWAP) = \frac{1}{| F |} \sum_{g \in F} D (π^{'} (f (g)), π^{'} (w (g))) + W \cdot \frac{1}{| E |} \sum_{g \in E} D (π^{'} (f (g)), π^{'} (w (g)))

where $W$ is the extended set weight (0.1 in pyqpanda3). This lookahead mechanism prevents the algorithm from making locally greedy choices that lead to poor global outcomes.

Decay Cost Function

The decay cost function (used at optimization_level >= 2) adds a decay mechanism to prevent the same qubits from being swapped repeatedly:

H_{decay} (SWAP) = max (decay [q_{1}], decay [q_{2}]) \cdot H_{lookahead} (SWAP)

Each time a SWAP involves qubit $q_{i}$ , its decay factor is multiplied by $decay_rate$ :

decay [q_{i}] \leftarrow decay [q_{i}] + decay_rate

The decay values are reset periodically (every 5 SWAP iterations in pyqpanda3, controlled by DECAY_RESET_INTERVAL) to prevent permanent penalization.

The Extended Set

The extended set is a bounded lookahead window that includes upcoming two-qubit gates. pyqpanda3 constructs it using the get_extended_set method, which:

Examines the dependency layers of each qubit wire.
Collects upcoming two-qubit gate partners up to a maximum of EXTEND_SET_SIZE (20 in pyqpanda3) entries.
Computes the total distance $E_{sum}$ between qubit pairs in the extended set under the current mapping.

The extended set enables the routing algorithm to "see ahead" and avoid SWAP choices that would help the current front layer but create problems for upcoming gates.

Bidirectional Traversal

A key innovation of SABRE is the bidirectional (forward-backward-forward) traversal:

Forward pass: Route the circuit from beginning to end. This produces a mapping $π_{fwd}$ .
Backward pass: Route the reversed circuit from end to beginning using $π_{fwd}$ as the initial mapping. This produces $π_{bwd}$ .
Final forward pass: Route the circuit again from beginning to end using $π_{bwd}$ as the initial mapping.

The idea is that the forward pass generates a rough mapping, the backward pass refines it by considering the circuit from the opposite direction, and the final forward pass produces the routed circuit with the improved mapping. This three-pass approach often finds significantly better mappings than a single pass.

Why Bidirectional Traversal Works

Consider a circuit where the last gate requires qubits 0 and 3 to be adjacent, but the initial mapping places them far apart. A single forward pass would gradually move qubits into position, but the early gates would "waste" SWAPs that do not help the later gates. By first traversing backward, the algorithm learns that qubits 0 and 3 should be adjacent at the end, and the final forward pass can plan SWAPs earlier in the circuit to satisfy both early and late requirements.

In pyqpanda3, this bidirectional traversal is implemented in RoutingPass::operate. When no initial mapping is provided, the algorithm:

Assigns circuit qubits to the longest path in the topology graph.
Runs SABRE forward then backward to learn a good initial mapping.
Runs SABRE forward one final time to produce the routed circuit.

Worked Example

Consider routing a simple circuit on a 4-qubit linear topology 0 -- 1 -- 2 -- 3:

Circuit: CX(0,2) --> CX(1,3) --> CX(0,3)

Initial mapping: $π = {0 \to 0, 1 \to 1, 2 \to 2, 3 \to 3}$

Step 1: Front layer $F = {CX (0, 2)}$ .

Distance $D [0] [2] = 2 > 1$ , so this gate is not executable.
SWAP candidates: $(0, 1)$ and $(1, 2)$ (edges adjacent to qubits 0 and 2).
Evaluate SWAP $(0, 1)$ : new distance $D [1] [2] = 1$ . Score decreases.
Evaluate SWAP $(1, 2)$ : new distance $D [0] [1] = 1$ . Score also decreases.
Select best SWAP. Result: SWAP $(0, 1)$ applied. Mapping becomes $π = {0 \to 1, 1 \to 0, 2 \to 2, 3 \to 3}$ .

Step 2: Front layer $F = {CX (0, 2)}$ .

Distance $D [π (0)] [π (2)] = D [1] [2] = 1$ . Gate is executable.
Execute $CX (0, 2)$ . Add $CX (1, 3)$ to $F$ .

Step 3: Front layer $F = {CX (1, 3)}$ .

Distance $D [π (1)] [π (3)] = D [0] [3] = 3 > 1$ . Not executable.
SWAP candidates include $(0, 1), (2, 3)$ (edges adjacent to qubits $π (1) = 0$ and $π (3) = 3$ ).
The algorithm selects SWAPs that move qubits 0 and 3 closer together.

This process continues until all gates are executed. The total number of SWAPs inserted depends on the cost function used and the initial mapping quality.

Parallel Execution

pyqpanda3 parallelizes the routing search by running multiple SABRE instances with different initial mappings simultaneously. By default, 2 parallel workers (PARALLEL_SLOVER_CORE = 2) are used:

Worker 0 starts with circuit qubits mapped to the longest path in forward order.
Worker 1 starts with circuit qubits mapped to the longest path in reverse order.

Each worker runs the forward-backward-forward traversal independently. The first worker to complete provides the result, and the other workers are stopped early via an atomic flag.

Detailed SABRE Pseudocode

The following pseudocode describes the SABRE routing algorithm as implemented in pyqpanda3:

Algorithm: SABRE Routing
Input:  Circuit gates G, coupling graph topology T,
        initial mapping pi, routing method m
Output: Routed circuit C'

1.  Compute distance matrix D from T using Dijkstra
2.  F <-- front layer of G (gates with no predecessors)
3.  decay[i] <-- 1 for all physical qubits i
4.  C' <-- empty circuit

5.  while F is not empty:
6.      executable <-- {g in F : D[pi(q_a)][pi(q_b)] == 1}
7.      for each g in executable:
8.          Append g to C'
9.          Remove g from F
10.         Add successors of g to F (if all predecessors done)

11.     if F is not empty:
12.         // Compute extended set E for lookahead
13.         (E, E_sum) <-- get_extended_set(F, pi, D)
14.
15.         // Generate SWAP candidates
16.         candidates <-- all SWAPs on edges adjacent to
17.                        qubits in F
18.
19.         // Evaluate each candidate
20.         best_swap <-- null
21.         best_score <-- +infinity
22.         for each (q1, q2) in candidates:
23.             pi' <-- pi with q1 and q2 swapped
24.             if method == "basic":
25.                 score <-- basic_cost(q1, q2, D, F)
26.             else if method == "lookahead":
27.                 score <-- lookahead_cost(q1, q2, D, F, E)
28.             else if method == "decay":
29.                 score <-- decay_cost(q1, q2, D, F, E, decay)
30.             if score < best_score:
31.                 best_score <-- score
32.                 best_swap <-- (q1, q2)
33.
34.         // Apply the best SWAP
35.         Append SWAP(best_swap) to C'
36.         Update pi by swapping q1 and q2
37.         Update decay for q1 and q2
38.         Reset decay every DECAY_RESET_INTERVAL iterations

39. return C'

Comparison with Other Routing Algorithms

Feature	SABRE	Stochastic SWAP	Basic Greedy
Determinism	Heuristic (deterministic given same initial mapping)	Randomized	Deterministic
Lookahead	Yes (extended set)	No	No
Bidirectional	Yes (fwd-bwd-fwd)	No	No
Decay mechanism	Yes (prevents cycling)	No	No
Parallelism	Yes (multiple initial mappings)	Yes (multiple trials)	No
SWAP count	Low	Medium	High
Runtime	$O (\| G \|^{2} \cdot \| V \|)$	$O (\| G \|^{2} \cdot \| V \| \cdot k)$	$O (\| G \|^{2} \cdot \| V \|)$
Scalability	Good to 100+ qubits	Moderate	Good

SABRE's advantages come from its combination of:

Lookahead via the extended set, preventing myopic SWAP choices.
Bidirectional traversal, finding better initial mappings.
Decay mechanism, preventing repetitive SWAP patterns.
Parallel execution, exploring multiple initial mapping candidates.

Routing in pyqpanda3

The routing is invoked automatically when calling Transpiler.transpile() with a non-empty chip_topology_edges parameter. When the topology is empty, routing is skipped entirely:

python

from pyqpanda3.transpilation import Transpiler, generate_topology

topo = generate_topology(5, "linear")
transpiler = Transpiler()

# Routing is applied because topology is non-empty
result = transpiler.transpile(prog, chip_topology_edges=topo,
                              optimization_level=2,
                              basic_gates=["RX", "RY", "RZ", "CNOT"])

# Routing is skipped because no topology is specified
result = transpiler.transpile(prog, optimization_level=1)

The optimization_level parameter influences routing:

At level 0 or 1, the default SABRE routing method is used.
At level 2, the "decay" routing method is used, which adds the decay cost function to prevent suboptimal SWAP patterns.

Handling Dynamic Circuits

pyqpanda3's transpiler supports dynamic circuits containing classical control flow (if-else branches, while loops). When a conditional branch is encountered, the transpiler:

Processes the main sequence of gates before the branch.
Recursively transpiles each branch of the conditional independently, preserving the current qubit mapping across branches.
Continues with the remaining gates after the branch.

This ensures that qubit mapping consistency is maintained across all execution paths of the dynamic circuit.

Putting It All Together

The following diagram shows how all four components of transpilation theory work together in pyqpanda3:

Practical Example

A complete transpilation workflow using pyqpanda3:

python

import numpy as np
from pyqpanda3.core import QProg, H, CNOT, RX
from pyqpanda3.transpilation import Transpiler, generate_topology, decompose

# Build a quantum program
prog = QProg()
prog << H(0) << CNOT(0, 1) << RX(1, np.pi/4) << CNOT(1, 2) << H(2)

# Inspect the original circuit
print("Original depth:", prog.depth())
print("Original gate counts:", prog.count_ops())

# Generate a chip topology
topology = generate_topology(5, "linear")

# Transpile with optimization level 2 and target gate set
transpiler = Transpiler()
result = transpiler.transpile(
    prog,
    chip_topology_edges=topology,
    optimization_level=2,
    basic_gates=["RX", "RY", "RZ", "CNOT"]
)

# Inspect the transpiled circuit
print("Transpiled depth:", result.depth())
print("Transpiled gate counts:", result.count_ops())
circ = result.to_circuit()
print("Two-qubit gates:", circ.num_2q_gate())

Inspecting Intermediate Results

pyqpanda3 provides several methods for analyzing circuits before and after transpilation:

Method	Available On	Description
`depth(include_barrier)`	QProg, QCircuit	Circuit depth (longest path in the DAG)
`count_ops()`	QProg, QCircuit	Dictionary of gate type counts
`gate_operations()`	QProg	List of all gate operations
`num_2q_gate()`	QCircuit	Number of two-qubit gates
`flatten()`	QProg	Flatten nested circuit structures

These can be used to compare the original and transpiled circuits, verifying that the transpilation reduced gate count and depth while preserving correctness.

References

Li, G., Ding, Y., & Xie, Y. (2019). Tackling the Qubit Mapping Problem for NISQ-Era Quantum Devices. ASPLOS 2019. (SABRE algorithm)
Shende, T. S., Bullock, S. S., & Markov, I. L. (2006). Synthesis of Quantum Logic Circuits. IEEE Transactions on CAD. (Quantum Shannon Decomposition)
Khaneja, N., & Glaser, S. J. (2001). Cartan Decomposition of SU(2^n) and Control of Spin Systems. Chemical Physics. (KAK decomposition)
Dawson, C. M., & Nielsen, M. A. (2006). The Solovay-Kitaev Algorithm. Quantum Information and Computation.
Iten, R., et al. (2016). Quantum Circuits for Isometries. Physical Review A. (Isometry decomposition)

Transpilation Theory ​

The Transpilation Pipeline ​

Circuit Optimization ​

Why Optimization Matters ​

Optimization Levels in pyqpanda3 ​

Gate Cancellation ​

Gate Merging ​

Commutativity-Aware Optimization ​

Peephole Optimization ​

Template Matching ​

Unitary Synthesis Optimization ​

Optimization Pass Flow ​

Cost Model for Optimization Decisions ​

Topology Mapping ​

The Qubit Mapping Problem ​

Coupling Maps ​

Why Topology Matters ​

Distance Matrix ​

Adjacency and Connectivity Properties ​

Initial Mapping Strategies ​

Routing Algorithms Overview ​

Gate Synthesis ​

The Native Gate Set Problem ​

Euler Decomposition for Single-Qubit Gates ​

ZYZ Decomposition ​

ZXZ Decomposition ​

KAK Decomposition for Two-Qubit Gates ​

Weyl Chamber Classification ​

Specialization ​

Quantum Shannon Decomposition (QSD) ​

Isometry Decomposition ​

Solovay-Kitaev Approximation ​

The basic_gates Parameter ​

Gate Count and Circuit Depth ​

Matrix Decomposition Validation ​

Sabre Algorithm ​

Overview ​

Problem Formulation ​

The Front Layer ​

The Heuristic Cost Function ​

Basic Cost Function ​

Lookahead Cost Function ​

Decay Cost Function ​

The Extended Set ​

Bidirectional Traversal ​

Why Bidirectional Traversal Works ​

Worked Example ​

Parallel Execution ​

Detailed SABRE Pseudocode ​

Comparison with Other Routing Algorithms ​

Routing in pyqpanda3 ​

Handling Dynamic Circuits ​

Putting It All Together ​

Practical Example ​

Inspecting Intermediate Results ​

References ​

See Also ​

Transpilation Theory

The Transpilation Pipeline

Circuit Optimization

Why Optimization Matters

Optimization Levels in pyqpanda3

Gate Cancellation

Gate Merging

Commutativity-Aware Optimization

Peephole Optimization

Template Matching

Unitary Synthesis Optimization

Optimization Pass Flow

Cost Model for Optimization Decisions

Topology Mapping

The Qubit Mapping Problem

Coupling Maps

Why Topology Matters

Distance Matrix

Adjacency and Connectivity Properties

Initial Mapping Strategies

Routing Algorithms Overview

Gate Synthesis

The Native Gate Set Problem

Euler Decomposition for Single-Qubit Gates

ZYZ Decomposition

ZXZ Decomposition

KAK Decomposition for Two-Qubit Gates

Weyl Chamber Classification

Specialization

Quantum Shannon Decomposition (QSD)

Isometry Decomposition

Solovay-Kitaev Approximation

The `basic_gates` Parameter

Gate Count and Circuit Depth

Matrix Decomposition Validation

Sabre Algorithm

Overview

Problem Formulation

The Front Layer

The Heuristic Cost Function

Basic Cost Function

Lookahead Cost Function

Decay Cost Function

The Extended Set

Bidirectional Traversal

Why Bidirectional Traversal Works

Worked Example

Parallel Execution

Detailed SABRE Pseudocode

Comparison with Other Routing Algorithms

Routing in pyqpanda3

Handling Dynamic Circuits

Putting It All Together

Practical Example

Inspecting Intermediate Results

References

See Also