Warm-starting QAOA with the Optimization Mapper Qiskit addon
Usage estimate: 9 minutes on a Heron r3 (NOTE: This is an estimate only. Your runtime may vary.)
Learning outcomes
After completing this tutorial, you can expect to understand the following information:
- How to map a max-cut problem to a quantum Quadratic Unconstrained Binary Optimization (QUBO) formulation using
qiskit-addon-opt-mapper - How to implement and run standard QAOA on a simulator
- How to apply WS-QAOA by computing the quadratic program (QP) relaxation and building the warm-start circuit
- How to compare energy convergence and solution quality between standard QAOA and WS-QAOA
Prerequisites
It is recommended that you familiarize yourself with these topics:
Background
The Quantum Approximate Optimization Algorithm (QAOA) is a hybrid quantum-classical algorithm designed to solve combinatorial optimization problems such as max-cut and general QUBO formulations. For a foundational introduction to QAOA in Qiskit, see the QAOA tutorial; for more advanced circuit-building techniques, see the advanced QAOA tutorial.
In standard QAOA:
- The initial state is the uniform superposition .
- Variational parameters are randomly initialized.
- A classical optimizer searches for parameters that minimize the cost function.
However, for practical problem sizes and noisy quantum hardware, random initialization can lead to slow convergence, poor local minima, and increased optimization cost.
Warm-start QAOA (WS-QAOA) improves this by incorporating classical optimization insights directly into the quantum circuit. This tutorial follows the methods introduced by Egger, Mareček, and Woerner in Warm-starting quantum optimization. The key idea is to:
- Solve a continuous relaxation of the original binary problem (a quadratic program over instead of ).
- Encode the relaxed solution into a custom initial state by using -rotation angles , so that qubit starts in a state whose probability of measuring is .
- Replace the standard -mixer with a custom mixer whose ground state is the warm-start initial state, ensuring the algorithm starts near the classical solution and can explore the neighborhood.
A regularization parameter clips away from 0 and 1 to avoid reachability issues; qubits initialized in or cannot be moved by the cost Hamiltonian. At , WS-QAOA reduces exactly to standard QAOA.
Problem modeling uses the qiskit-addon-opt-mapper package, which provides the OptimizationProblem class and converters for mapping combinatorial problems to quantum Hamiltonians.
Requirements
Before starting this tutorial, be sure you have the following installed:
- Qiskit SDK v2.0 or later, with visualization support
- Qiskit Runtime v0.43 or later (
pip install qiskit-ibm-runtime) - Optimization Mapper Qiskit addon (
pip install qiskit-addon-opt-mapper) - SciPy (
pip install scipy) - NetworkX (
pip install networkx)
Setup
Import all required libraries and define helper functions used throughout this tutorial.
import numpy as np
import matplotlib.pyplot as plt
import networkx as nx
from scipy.optimize import minimize
from qiskit.circuit import QuantumCircuit, ParameterVector
from qiskit.quantum_info import Statevector
from qiskit.primitives import StatevectorEstimator, StatevectorSampler
from qiskit.transpiler.preset_passmanagers import generate_preset_pass_manager
from qiskit_ibm_runtime import (
QiskitRuntimeService,
Session,
EstimatorOptions,
EstimatorV2 as Estimator,
SamplerV2 as Sampler,
)
from qiskit_addon_opt_mapper import OptimizationProblem
from qiskit_addon_opt_mapper.converters import OptimizationProblemToQubo
from qiskit_addon_opt_mapper.translators import to_isingSmall-scale simulator example
We use a small max-cut problem on a weighted graph as our running example. Max-cut asks: given a graph with edge weights , find a partition of the vertices into two sets and that maximizes the total weight of edges crossing the cut.
As a QUBO minimization problem, max-cut can be written as:
We work with a four-node graph for tractability on a simulator.
Step 1: Map classical inputs to a quantum problem
We define the max-cut problem using the OptimizationProblem class from qiskit-addon-opt-mapper, convert it to a QUBO, and translate it to an Ising Hamiltonian (SparsePauliOp) suitable for QAOA. We also solve the continuous relaxation of the QUBO — replacing the binary constraint with — to obtain the warm-start initial point .
# Define a 4-node weighted graph for the max-cut problem
n_nodes = 4
edges = [(0, 1, 1.0), (0, 2, 1.0), (1, 2, 1.0), (1, 3, 1.0), (2, 3, 1.0)]
G = nx.Graph()
G.add_nodes_from(range(n_nodes))
G.add_weighted_edges_from(edges)
pos = nx.spring_layout(G, seed=42)
edge_labels = {(u, v): d["weight"] for u, v, d in G.edges(data=True)}
fig, ax = plt.subplots(figsize=(4, 3))
nx.draw(G, pos, with_labels=True, node_color="lightblue", ax=ax)
nx.draw_networkx_edge_labels(G, pos, edge_labels=edge_labels, ax=ax)
ax.set_title("Max-Cut graph")
plt.tight_layout()
plt.show()Output:
The graph has five edges. The optimal max-cut partitions the nodes into and (or its complement), severing four of the five edges for a cut value of 4.
# Build the max-cut QUBO using OptimizationProblem
#
# Max-Cut as a minimization QUBO:
# minimize -sum_{(i,j) in E} w_ij * (x_i + x_j - 2*x_i*x_j)
#
# Expanding: each edge contributes -w*(x_i + x_j) to the linear terms
# and +2w*x_i*x_j to the quadratic terms.
prob = OptimizationProblem("max_cut")
x_vars = [prob.binary_var(name=f"x{i}") for i in range(n_nodes)]
linear_terms = {}
quadratic_terms = {}
for u, v, w in edges:
linear_terms[f"x{u}"] = linear_terms.get(f"x{u}", 0.0) - w
linear_terms[f"x{v}"] = linear_terms.get(f"x{v}", 0.0) - w
quadratic_terms[(f"x{u}", f"x{v}")] = (
quadratic_terms.get((f"x{u}", f"x{v}"), 0.0) + 2 * w
)
prob.minimize(linear=linear_terms, quadratic=quadratic_terms)
print(prob.prettyprint())Output:
Problem name: max_cut
Minimize
2*x0*x1 + 2*x0*x2 + 2*x1*x2 + 2*x1*x3 + 2*x2*x3 - 2*x0 - 3*x1 - 3*x2 - 2*x3
Subject to
No constraints
Binary variables (4)
x0 x1 x2 x3
The OptimizationProblem class from qiskit-addon-opt-mapper provides a flexible model for binary, integer, continuous, and spin variables. Here all variables are binary and the problem is already in unconstrained form (QUBO), so no constraint-to-penalty conversion is needed. The printed objective shows each variable's linear coefficient (how much it individually contributes to the cut) and each cross-term's quadratic coefficient (the penalty for putting two adjacent nodes on the same side).
# Convert the OptimizationProblem to a QUBO, then translate to an Ising Hamiltonian
#
# The substitution x_i = (1 - z_i)/2 maps binary variables to spin operators,
# yielding a Hamiltonian H_C = sum_i h_i Z_i + sum_{i<j} J_ij Z_i Z_j + constant.
# QAOA minimizes <H_C> to find the ground state, which encodes the optimal cut.
converter = OptimizationProblemToQubo()
qubo = converter.convert(prob)
cost_operator, offset = to_ising(qubo)
n_qubits = cost_operator.num_qubits
print(f"Cost Hamiltonian H_C ({n_qubits} qubits):")
print(cost_operator)
print(f"\nOffset (constant shift): {offset}")
print(" QUBO value = Ising energy + offset")Output:
Cost Hamiltonian H_C (4 qubits):
SparsePauliOp(['IIZZ', 'IZIZ', 'IZZI', 'ZIZI', 'ZZII'],
coeffs=[0.5+0.j, 0.5+0.j, 0.5+0.j, 0.5+0.j, 0.5+0.j])
Offset (constant shift): -2.5
QUBO value = Ising energy + offset
The to_ising translator returns a SparsePauliOp representing and a scalar offset such that . For this max-cut problem with all unit weights, for all qubits (the graph is symmetric in linear terms after the substitution), and each edge contributes a coupling of strength . The minimum eigenvalue of corresponds to the maximum cut.
# Solve the continuous (QP) relaxation to obtain the warm-start point c*
#
# The QP relaxation replaces the binary constraint x_i in {0,1} with x_i in [0,1]
# and minimizes the same quadratic objective. Its solution c*_i gives the
# probability that variable i should be 1 according to the classical relaxation.
#
# The max-cut QUBO has a non-convex quadratic matrix (negative eigenvalues),
# so the relaxed problem has multiple local minima. A naive single start from
# [0.5,...,0.5] converges to the symmetric saddle point c* = [0.5,...,0.5],
# which carries no useful structural information about the problem.
# Multi-start optimization is used to reliably find the global minimum.
Q = qubo.objective.quadratic.to_array(symmetric=True)
mu = qubo.objective.linear.to_array()
def qp_objective(x_cont):
"""Continuous relaxation of the QUBO objective."""
return x_cont @ Q @ x_cont + mu @ x_cont + qubo.objective.constant
bounds = [(0.0, 1.0)] * n_qubits
rng = np.random.default_rng(42)
best_val = np.inf
c_star = None
for _ in range(200):
x0 = rng.uniform(0.0, 1.0, n_qubits)
result = minimize(qp_objective, x0, method="L-BFGS-B", bounds=bounds)
if result.fun < best_val:
best_val = result.fun
c_star = result.x
print(f"QP relaxation solution c* = {np.round(c_star, 4)}")
print(f"QP objective value = {best_val:.4f}")Output:
QP relaxation solution c* = [1. 0. 0. 1.]
QP objective value = -4.0000
The multi-start solver finds (or its complement ), which is the actual optimal binary solution. For this problem the QP relaxation is tight, the continuous minimum coincides with the integer optimum, meaning the relaxation immediately identifies the best cut. After regularization with in Step 2, this solution will be encoded into the warm-start initial state.
Step 2: Optimize problem for quantum hardware execution
We build two QAOA circuits and prepare the warm-start angles from the QP solution.
Standard QAOA uses the uniform superposition as the initial state and the standard -mixer , implemented as per layer.
Warm-start QAOA (WS-QAOA) from [1] makes two structural changes per qubit :
- Initial state: with , so the probability of measuring equals .
- Custom mixer: , which has as its ground state. This means WS-QAOA starts in the ground state of its own mixer, the same property that standard QAOA satisfies with and the -mixer.
Note on layers: At p=1(a single QAOA layer), standard QAOA is analytically limited to ~49% of the optimal energy on graphs containing triangles (this graph has the triangle 0-1-2). The warm start bypasses this limitation by encoding prior knowledge of the solution directly into the initial state.
# Number of QAOA layers (each layer = one cost unitary + one mixer unitary)
p = 1
# Regularization: clip c* to [epsilon, 1-epsilon] so no qubit is initialized
# in |0> or |1>, which would freeze it under the cost Hamiltonian.
epsilon = 0.25
c_clipped = np.clip(c_star, epsilon, 1 - epsilon)
thetas = 2 * np.arcsin(np.sqrt(c_clipped))
print(f"Continuous relaxation c* = {np.round(c_star, 4)}")
print(f"After regularization = {np.round(c_clipped, 4)}")
print(f"Warm-start angles theta = {np.round(thetas, 4)} radians")
print()
print("Angle interpretation:")
print(" theta = 0 <-> c* = 0 (qubit points toward |0>)")
print(
" theta = pi/2 <-> c* = 0.5 (qubit in equal superposition, like |+>)"
)
print(" theta = pi <-> c* = 1 (qubit points toward |1>)")Output:
Continuous relaxation c* = [1. 0. 0. 1.]
After regularization = [0.75 0.25 0.25 0.75]
Warm-start angles theta = [2.0944 1.0472 1.0472 2.0944] radians
Angle interpretation:
theta = 0 <-> c* = 0 (qubit points toward |0>)
theta = pi/2 <-> c* = 0.5 (qubit in equal superposition, like |+>)
theta = pi <-> c* = 1 (qubit points toward |1>)
After clipping, becomes and becomes . The resulting angles radians rotate qubits 0 and 3 strongly toward and qubits 1 and 2 toward , directly encoding the structure of the optimal cut into the initial quantum state.
def apply_cost_unitary(qc, cost_op, gamma):
"""Apply exp(-i * gamma * H_C) to the circuit.
Each Pauli term in H_C contributes a rotation gate:
- Single-Z term h_i * Z_i -> RZ(2 * gamma * h_i) on qubit i
- Two-Z term J_ij * Z_i Z_j -> CNOT, RZ(2 * gamma * J_ij), CNOT
"""
for pauli_term, coeff in zip(cost_op.paulis, cost_op.coeffs):
indices = [
j for j, q in enumerate(pauli_term.to_label()[::-1]) if q == "Z"
]
if len(indices) == 1:
qc.rz(2 * gamma * coeff.real, indices[0])
elif len(indices) == 2:
qc.cx(indices[0], indices[1])
qc.rz(2 * gamma * coeff.real, indices[1])
qc.cx(indices[0], indices[1])
def build_standard_qaoa(cost_op, n_layers, n_qubits):
"""Standard QAOA: uniform superposition initial state + X-mixer."""
gammas = ParameterVector("γ", n_layers)
betas = ParameterVector("β", n_layers)
qc = QuantumCircuit(n_qubits)
qc.h(range(n_qubits)) # |+>^n
for k in range(n_layers):
apply_cost_unitary(qc, cost_op, gammas[k])
# H_M = -sum X_i => exp(-i*beta*H_M) = prod_i RX(-2*beta)
qc.rx(-2 * betas[k], range(n_qubits))
return qc, gammas, betas
def build_ws_qaoa(cost_op, n_layers, n_qubits, thetas):
"""WS-QAOA: warm-start initial state + custom per-qubit mixer.
Per Egger et al. (2021) Eq. (1)-(2):
Initial state per qubit i: R_Y(theta_i) |0>
Mixer gate per qubit i: R_Y(theta_i) R_Z(-2*beta) R_Y(-theta_i)
"""
gammas = ParameterVector("γ", n_layers)
betas = ParameterVector("β", n_layers)
qc = QuantumCircuit(n_qubits)
for i, theta in enumerate(thetas):
qc.ry(theta, i) # warm-start initial state
for k in range(n_layers):
apply_cost_unitary(qc, cost_op, gammas[k])
for i, theta in enumerate(thetas):
qc.ry(theta, i)
qc.rz(-2 * betas[k], i)
qc.ry(-theta, i)
return qc, gammas, betas
std_qc, std_gammas, std_betas = build_standard_qaoa(
cost_operator, p, n_qubits
)
ws_qc, ws_gammas, ws_betas = build_ws_qaoa(cost_operator, p, n_qubits, thetas)The apply_cost_unitary helper reads directly from the SparsePauliOp Hamiltonian, so it automatically handles any QUBO problem without manual circuit construction. The WS-QAOA mixer applies three rotations per qubit per layer (, , ), compared to a single in the standard mixer — a modest overhead that is constant in the number of qubits.
print("Standard QAOA circuit (p=1):")
std_qc.draw("mpl", fold=-1)Output:
Standard QAOA circuit (p=1):
print("\nWS-QAOA circuit (p=1):")
ws_qc.draw("mpl", fold=-1)Output:
WS-QAOA circuit (p=1):
Both circuits follow the same structure: an initial state preparation layer, then alternating cost-unitary and mixer-unitary layers. In the WS-QAOA circuit, the opening gates encode , and the mixer replaces each with a conjugated –– triplet. The circuit depth difference between the two grows linearly with , but remains manageable at low depth.
Step 3: Execute using Qiskit primitives
We use StatevectorEstimator for exact, noiseless simulation. The minimize function from SciPy with the COBYLA optimizer drives the variational loop, calling the estimator at each iteration to evaluate for a given parameter set .
The two algorithms use different initial parameters that reflect what each knows before optimization:
- Standard QAOA: Random initialization in — appropriate since no structural information is available.
- WS-QAOA: , — at the cost unitary is the identity, so the very first circuit evaluation samples directly from the warm-start initial state. This gives COBYLA a strong starting signal aligned with the classical solution.
estimator = StatevectorEstimator()
def make_cost_fn(circuit, param_order, cost_op, estimator, history):
"""Return a scalar cost function compatible with scipy.optimize.minimize."""
def cost_fn(params):
bound = circuit.assign_parameters(dict(zip(param_order, params)))
job = estimator.run([(bound, cost_op)])
energy = job.result()[0].data.evs.real
history.append(energy)
return energy
return cost_fn
# Standard QAOA: random initialization
np.random.seed(42)
std_params0 = np.random.uniform(0, np.pi, 2 * p)
std_history = []
std_param_order = list(std_gammas) + list(std_betas)
std_result = minimize(
make_cost_fn(
std_qc, std_param_order, cost_operator, estimator, std_history
),
std_params0,
method="COBYLA",
options={"maxiter": 300, "rhobeg": 0.5},
)
print(f"Standard QAOA optimal energy : {std_result.fun:.4f}")
print(
f" optimal params: gamma={std_result.x[:p].round(4)}, beta={std_result.x[p:].round(4)}"
)
print(f" optimizer calls: {len(std_history)}")
# WS-QAOA: informed initialization
ws_params0 = np.concatenate([np.zeros(p), np.full(p, np.pi / 4)])
ws_history = []
ws_param_order = list(ws_gammas) + list(ws_betas)
ws_result = minimize(
make_cost_fn(ws_qc, ws_param_order, cost_operator, estimator, ws_history),
ws_params0,
method="COBYLA",
options={"maxiter": 300, "rhobeg": 0.5},
)
print(f"\nWS-QAOA optimal energy : {ws_result.fun:.4f}")
print(
f" optimal params: gamma={ws_result.x[:p].round(4)}, beta={ws_result.x[p:].round(4)}"
)
print(f" optimizer calls: {len(ws_history)}")Output:
Standard QAOA optimal energy : -0.5859
optimal params: gamma=[2.0534], beta=[2.4612]
optimizer calls: 36
WS-QAOA optimal energy : -1.5000
optimal params: gamma=[-0.0001], beta=[1.5708]
optimizer calls: 42
WS-QAOA's informed starting point means COBYLA begins with a meaningful energy value near the warm-start solution, while standard QAOA starts from an essentially random point on the energy landscape. This difference in starting quality is the main driver of the convergence gap visible in Step 4.
# Compute the exact optimal energy by brute-force over all 2^n bitstrings
all_energies = [
Statevector.from_label(format(k, f"0{n_qubits}b"))
.expectation_value(cost_operator)
.real
for k in range(2**n_qubits)
]
optimal_energy = min(all_energies)
print(f"Exact optimal energy : {optimal_energy:.4f}")
print(f"Standard QAOA approx. ratio : {std_result.fun / optimal_energy:.4f}")
print(f"WS-QAOA approx. ratio : {ws_result.fun / optimal_energy:.4f}")Output:
Exact optimal energy : -1.5000
Standard QAOA approx. ratio : 0.3906
WS-QAOA approx. ratio : 1.0000
The approximation ratio is defined as . For minimization problems where , a ratio closer to 1 means the algorithm found a lower energy (a better solution). The brute-force search over all basis states is only feasible for small and serves as a ground-truth reference.
Step 4: Post-process and return result in desired classical format
We visualize convergence, sample the optimized circuits for bitstring solutions, decode those bitstrings back to max-cut partitions, and summarize the final results.
fig, ax = plt.subplots(figsize=(7, 4))
ax.plot(std_history, label="Standard QAOA", alpha=0.85)
ax.plot(ws_history, label="WS-QAOA", alpha=0.85)
ax.axhline(
optimal_energy,
color="k",
linestyle="--",
label=f"Exact optimal ({optimal_energy:.2f})",
)
ax.set_xlabel("Optimizer call")
ax.set_ylabel(r"$\langle H_C \rangle$")
ax.set_title("Convergence: Standard QAOA vs. WS-QAOA")
ax.legend()
plt.tight_layout()
plt.show()Output:
The convergence plot shows the energy at each COBYLA function evaluation. Standard QAOA at is limited to ~49% of the optimal energy on this graph (the theoretical maximum for QAOA on graphs with triangles), settling around . WS-QAOA, initialized close to the optimal solution, converges quickly to near (the exact optimum) with far fewer iterations. This demonstrates the key advantage of the warm start: at the same circuit depth, it reaches a significantly better solution.
# Sample the optimized circuits to recover the most probable bitstring solutions
sampler = StatevectorSampler()
shots = 1024
def get_best_bitstring(circuit, param_order, optimal_params, sampler, shots):
bound = circuit.assign_parameters(dict(zip(param_order, optimal_params)))
bound.measure_all()
job = sampler.run([bound], shots=shots)
counts = job.result()[0].data.meas.get_counts()
return max(counts, key=counts.get), counts
def evaluate_cut(bitstring, G):
"""Compute the Max-Cut value for a bitstring node assignment."""
x = [int(b) for b in bitstring]
cut_val = sum(
w for u, v, w in G.edges.data("weight", default=1) if x[u] != x[v]
)
set0 = [i for i, b in enumerate(bitstring) if b == "0"]
set1 = [i for i, b in enumerate(bitstring) if b == "1"]
return cut_val, set0, set1
# Qiskit bitstring ordering: rightmost character = qubit 0
def decode_bitstring(bs):
return bs[::-1]
std_best, _ = get_best_bitstring(
std_qc, std_param_order, std_result.x, sampler, shots
)
ws_best, _ = get_best_bitstring(
ws_qc, ws_param_order, ws_result.x, sampler, shots
)
std_cut, std_s0, std_s1 = evaluate_cut(decode_bitstring(std_best), G)
ws_cut, ws_s0, ws_s1 = evaluate_cut(decode_bitstring(ws_best), G)
print(f"Standard QAOA most-probable bitstring : {std_best}")
print(f" Partition: S={std_s0}, S̄={std_s1} | cut value = {std_cut}")
print()
print(f"WS-QAOA most-probable bitstring : {ws_best}")
print(f" Partition: S={ws_s0}, S̄={ws_s1} | cut value = {ws_cut}")Output:
Standard QAOA most-probable bitstring : 0110
Partition: S=[0, 3], S̄=[1, 2] | cut value = 4.0
WS-QAOA most-probable bitstring : 0110
Partition: S=[0, 3], S̄=[1, 2] | cut value = 4.0
Bitstrings from Sampler are returned with qubit 0 at the rightmost position, so reversing the string maps index to variable . The cut value is the total weight of edges that cross the partition, which is what the max-cut problem aims to maximize. A cut value of 4 uses four of the five available edges, which is the theoretical maximum for this graph.
# Visualize the WS-QAOA solution on the graph
fig, axes = plt.subplots(1, 2, figsize=(8, 3))
for ax, s0, s1, cut, title in [
(axes[0], std_s0, std_s1, std_cut, f"Standard QAOA (cut = {std_cut})"),
(axes[1], ws_s0, ws_s1, ws_cut, f"WS-QAOA (cut = {ws_cut})"),
]:
colors = ["skyblue" if i in s0 else "salmon" for i in G.nodes()]
nx.draw(G, pos, with_labels=True, node_color=colors, ax=ax)
nx.draw_networkx_edge_labels(G, pos, edge_labels=edge_labels, ax=ax)
ax.set_title(title)
plt.tight_layout()
plt.show()
# Summary
# to_ising offset: QUBO value = Ising energy + offset, so Max-Cut value = -(Ising energy + offset)
optimal_cut = -(optimal_energy + offset)
print("=== Summary ===")
print(
f"{'Method':<20} {'Ising energy':>14} {'Cut value':>12} {'Approx. ratio':>15}"
)
print("-" * 65)
print(
f"{'Standard QAOA':<20} {std_result.fun:>14.4f} {std_cut:>12} {std_result.fun/optimal_energy:>15.4f}"
)
print(
f"{'WS-QAOA':<20} {ws_result.fun:>14.4f} {ws_cut:>12} {ws_result.fun/optimal_energy:>15.4f}"
)
print(
f"{'Exact optimal':<20} {optimal_energy:>14.4f} {optimal_cut:>12.0f} {'1.0000':>15}"
)Output:
=== Summary ===
Method Ising energy Cut value Approx. ratio
-----------------------------------------------------------------
Standard QAOA -0.5859 4.0 0.3906
WS-QAOA -1.5000 4.0 1.0000
Exact optimal -1.5000 4 1.0000
The graph visualization colors each node by its partition assignment (blue = , orange = ). Edges that cross the partition (connecting differently-colored nodes) are the ones counted in the cut.
Both methods find a bitstring with cut value 4, but for very different reasons. It is important to note that the convergence plot and the sampled bitstring measure two different things:
- The convergence plot tracks the average energy of the full quantum state, a weighted average over all bitstrings in the superposition. Standard QAOA converges to ~, well above the optimal , meaning its quantum state is spread across many suboptimal bitstrings and only occasionally includes the correct answer.
- The sampled bitstring is a single draw from that state. Standard QAOA got lucky here; the optimal partition happened to be the most frequently sampled outcome even from a diffuse state. On harder problems, noisier hardware, or with more candidate solutions competing, this luck runs out.
WS-QAOA, by contrast, converges its average energy all the way to , meaning its quantum state is concentrated on the optimal bitstrings. Nearly every shot returns the correct answer, so the solution is found reliably rather than by chance.
The practical consequence: on this small noiseless simulator the difference may seem minor, but at larger problem sizes or on real hardware, a state with average energy close to optimal is far more robust than one that only occasionally samples the right answer from a diffuse distribution.
Large Scale Hardware Example
Steps 1-4 compress into single code block
# Selecting a backend using real hardware
service = QiskitRuntimeService()
backend = service.least_busy(
operational=True, simulator=False, min_num_qubits=127
)
print(f"Using backend: {backend.name}")Output:
Using backend: ibm_boston
# ── Step 1a: Build the 40-node Max-Cut problem ─────────────────────────────
# A 3-regular graph (every node has exactly 3 neighbors) is a standard QAOA
N_LARGE = 40
G_large = nx.random_regular_graph(d=3, n=N_LARGE, seed=0)
edges_large = list(G_large.edges())
print(f"Graph: {N_LARGE} nodes, {len(edges_large)} edges (3-regular)")
# Visualize the graph so it is clear what problem we are solving before any
# quantum work. Nodes in a circular layout; each edge contributes +1 to the
# cut value when its endpoints land in different partitions.
pos_large = nx.circular_layout(G_large)
fig, ax = plt.subplots(figsize=(6, 6))
nx.draw(
G_large,
pos_large,
with_labels=True,
node_color="lightblue",
node_size=400,
font_size=7,
ax=ax,
)
ax.set_title(f"40-node 3-regular Max-Cut graph ({len(edges_large)} edges)")
plt.tight_layout()
plt.show()
prob_large = OptimizationProblem("max_cut_40")
x_large = [prob_large.binary_var(name=f"x{i}") for i in range(N_LARGE)]
lin_large, quad_large = {}, {}
for u, v in edges_large:
lin_large[f"x{u}"] = lin_large.get(f"x{u}", 0.0) - 1.0
lin_large[f"x{v}"] = lin_large.get(f"x{v}", 0.0) - 1.0
quad_large[(f"x{u}", f"x{v}")] = (
quad_large.get((f"x{u}", f"x{v}"), 0.0) + 2.0
)
prob_large.minimize(linear=lin_large, quadratic=quad_large)
converter_large = OptimizationProblemToQubo()
qubo_large = converter_large.convert(prob_large)
cost_op_large, offset_large = to_ising(qubo_large)
n_qubits_large = cost_op_large.num_qubits
print(
f"Cost operator: {n_qubits_large} qubits, {len(cost_op_large)} Pauli terms"
)
# ── Step 1b: QP relaxation (multi-start L-BFGS-B) ─────────────────────────
# Same multi-start approach as the small example. At 40 qubits the relaxed
# landscape has many more local minima, so 200 random starts are essential
# to find a low-energy warm-start point.
Q_large = qubo_large.objective.quadratic.to_array(symmetric=True)
mu_large = qubo_large.objective.linear.to_array()
def qp_obj_large(x):
return x @ Q_large @ x + mu_large @ x + qubo_large.objective.constant
bounds_large = [(0.0, 1.0)] * n_qubits_large
rng_qp = np.random.default_rng(42)
best_val_large, c_star_large = np.inf, None
for _ in range(200):
x0 = rng_qp.uniform(0.0, 1.0, n_qubits_large)
res = minimize(qp_obj_large, x0, method="L-BFGS-B", bounds=bounds_large)
if res.fun < best_val_large:
best_val_large, c_star_large = res.fun, res.x
# Regularize and convert to rotation angles (same formula as small example)
epsilon_large = 0.25
c_clipped_large = np.clip(c_star_large, epsilon_large, 1 - epsilon_large)
thetas_large = 2 * np.arcsin(np.sqrt(c_clipped_large))
print(
f"c* range: [{c_star_large.min():.3f}, {c_star_large.max():.3f}] "
f"theta range: [{thetas_large.min():.3f}, {thetas_large.max():.3f}] rad"
)
# Plot the distribution of c* values to see how much structure the relaxation
# extracted. Values near 0/1 mean confident assignments; values near 0.5 mean
# the classical solver was uncertain and quantum exploration is most needed there.
fig, ax = plt.subplots(figsize=(6, 3))
ax.hist(c_star_large, bins=20, color="steelblue", edgecolor="white")
ax.axvline(0.5, color="k", linestyle="--", label="Uniform prior (std QAOA)")
ax.set_xlabel(r"$c^*_i$")
ax.set_ylabel("Count")
ax.set_title(r"Distribution of warm-start values $c^*_i$ (40-node graph)")
ax.legend()
plt.tight_layout()
plt.show()
# ── Step 1c: Build WS-QAOA circuit ─────────────────────────────────────────
# Reuse build_ws_qaoa from the small-scale section unchanged; the helper
# scales automatically with n_qubits and the cost operator size.
p_large = 1
ws_qc_large, ws_gammas_large, ws_betas_large = build_ws_qaoa(
cost_op_large, p_large, n_qubits_large, thetas_large
)
ws_qc_large.measure_all()
# ── Step 2: Transpile to hardware-native gates ──────────────────────────
# generate_preset_pass_manager compiles the abstract circuit to th
# gate set of the backend and inserts SWAP gates wherever the cost Hamiltonian
# couples qubits that are not directly connected on the processor.
pm = generate_preset_pass_manager(optimization_level=3, backend=backend)
ws_isa_large = pm.run(ws_qc_large)
ecr_count = ws_isa_large.count_ops().get("ecr", 0)
print(
f"\nTranspiled circuit: 2Q depth={ws_isa_large.depth(lambda x: x.operation.num_qubits == 2)}"
)
ws_isa_large.draw("mpl", fold=-1)Output:
Graph: 40 nodes, 60 edges (3-regular)
Cost operator: 40 qubits, 60 Pauli terms
c* range: [0.000, 1.000] theta range: [1.047, 2.094] rad
Transpiled circuit: 2Q depth=94
# ── Classical baseline via simulated annealing ────────────────────
# Run SA before any hardware calls to get a strong classical reference cut
# value. SA is fast (seconds), needs no solver license, and reliably finds
# near-optimal solutions on 40-node graphs. We use sa_cut as the denominator
# for the approximation ratio instead of the looser QP upper bound.
#
# At each step we flip a random node and accept the move if it improves the
# cut, or with probability exp(delta/T) otherwise. Temperature T decays
# geometrically, allowing uphill moves early on to escape local minima.
def simulated_annealing_maxcut(
G, seed=0, T0=2.0, T_min=1e-4, alpha=0.995, n_steps=100_000
):
rng_sa = np.random.default_rng(seed)
n = G.number_of_nodes()
x = rng_sa.integers(0, 2, n)
best_x = x.copy()
best_cut = sum(1 for u, v in G.edges() if x[u] != x[v])
T = T0
for _ in range(n_steps):
i = rng_sa.integers(0, n)
delta = sum((-1 if x[i] != x[nb] else 1) for nb in G.neighbors(i))
if delta > 0 or rng_sa.random() < np.exp(delta / T):
x[i] ^= 1
cut = sum(1 for u, v in G.edges() if x[u] != x[v])
if cut > best_cut:
best_cut, best_x = cut, x.copy()
T = max(T * alpha, T_min)
return best_x, best_cut
sa_solution, sa_cut = simulated_annealing_maxcut(G_large)
print(f"Simulated annealing cut value: {sa_cut} (classical reference)")
# ── Step 3: Execution on hardware ───────────────────────────
# A Session reserves the backend so the COBYLA iterations and final sampling
# run back-to-back without re-queuing between jobs — important when the
# optimizer submits many short jobs sequentially. All jobs are tagged with
# "TUT_WSQAOA" for traceability in the IBM Quantum dashboard.
#
# EstimatorV2 with resilience_level=1 enables twirled readout error extinction
# (TREX), which corrects systematic measurement bit-flip errors without extra
# circuit overhead. 4096 shots per call balances estimation noise vs. job time.
estimator_options = EstimatorOptions()
estimator_options.resilience_level = 1
estimator_options.default_shots = 4096
estimator_options.environment.job_tags = ["TUT_WSQAOA"]
# Align the cost observable with the physical qubit layout chosen by the transpiler
cost_op_isa = cost_op_large.apply_layout(ws_isa_large.layout)
ws_param_order_isa = list(ws_isa_large.parameters)
ws_history_hw = []
with Session(backend=backend) as session:
estimator_hw = Estimator(mode=session, options=estimator_options)
def hw_cost_fn(params):
bound = ws_isa_large.assign_parameters(
dict(zip(ws_param_order_isa, params))
)
energy = (
estimator_hw.run([(bound, cost_op_isa)]).result()[0].data.evs.real
)
ws_history_hw.append(float(energy))
print(
f" iter {len(ws_history_hw):>3d} <H_C> = {energy:.4f}", end="\r"
)
return float(energy)
# Warm-start initialization: gamma=0 means the cost unitary is the identity on
# the first call, so COBYLA immediately evaluates the warm-start state itself —
# a much better starting signal than a random point.
ws_params0_hw = np.concatenate(
[np.zeros(p_large), np.full(p_large, np.pi / 4)]
)
ws_result_hw = minimize(
hw_cost_fn,
ws_params0_hw,
method="COBYLA",
options={"maxiter": 150, "rhobeg": 0.3},
)
print(
f"\nOptimization complete: energy={ws_result_hw.fun:.4f}, "
f"iterations={len(ws_history_hw)}"
)
# ── Step 3b: Sample the optimized circuit ──────────────────────────────────
# Use 8192 shots for the final sample to get a reliable mode estimate.
sampler_hw = Sampler(
mode=session,
options={"environment": {"job_tags": ["TUT_WSQAOA"]}},
)
ws_bound_hw = ws_isa_large.assign_parameters(
dict(zip(ws_param_order_isa, ws_result_hw.x))
)
counts_hw = (
sampler_hw.run([ws_bound_hw], shots=8192)
.result()[0]
.data.meas.get_counts()
)
best_bs_hw = max(counts_hw, key=counts_hw.get)
best_count = counts_hw[best_bs_hw]
total_shots = sum(counts_hw.values())
# Decode: Qiskit returns bitstrings with qubit 0 at the rightmost position,
# so reversing the string maps character index i to variable x_i.
cut_val_hw, s0_hw, s1_hw = evaluate_cut(best_bs_hw[::-1], G_large)
# Compare against simulated annealing.
# A ratio >= 1.0 means WS-QAOA matched or beat the classical SA solution.
# A ratio close to 1.0 (e.g. > 0.95) shows the quantum result is competitive.
approx_ratio_hw = cut_val_hw / sa_cut
print(
f"Most-probable bitstring frequency: {best_count}/{total_shots} "
f"({100*best_count/total_shots:.1f}%)"
)
print(
f"WS-QAOA cut: {cut_val_hw} | SA cut: {sa_cut} "
f"| Approximation ratio vs SA: {approx_ratio_hw:.4f}"
)
# Visualize both solutions side-by-side on the graph.
# Blue = partition S, orange = partition S-bar.
# Edges crossing between colors are the ones counted in the cut.
fig, axes = plt.subplots(1, 2, figsize=(14, 6))
for ax, assignment, cut, title in [
(
axes[0],
list(sa_solution),
sa_cut,
f"Simulated Annealing (cut={sa_cut})",
),
(
axes[1],
[int(b) for b in best_bs_hw[::-1]],
cut_val_hw,
f"WS-QAOA hardware (cut={cut_val_hw})",
),
]:
colors = [
"skyblue" if assignment[i] == 0 else "salmon" for i in G_large.nodes()
]
nx.draw(
G_large,
pos_large,
with_labels=True,
node_color=colors,
node_size=400,
font_size=7,
ax=ax,
)
ax.set_title(title)
plt.suptitle("Max-Cut partitions: SA vs WS-QAOA", fontsize=13)
plt.tight_layout()
plt.show()
# ── Step 4: Convergence plot and summary ──────────────────────────────────
# On real hardware the trace will be noisy (shot noise + gate errors), but the
# overall downward trend confirms that COBYLA is making progress despite noise.
fig, ax = plt.subplots(figsize=(7, 4))
ax.plot(ws_history_hw, color="tab:orange", label="WS-QAOA (hardware)")
ax.axhline(
ws_result_hw.fun,
color="tab:orange",
linestyle=":",
label=f"Final energy ({ws_result_hw.fun:.3f})",
)
ax.set_xlabel("Optimizer call")
ax.set_ylabel(r"$\langle H_C \rangle$")
ax.set_title(f"WS-QAOA convergence on {backend.name} (40 qubits, p=1)")
ax.legend()
plt.tight_layout()
plt.show()
print("\n=== Large Scale Summary ===")
print(f"{'Metric':<38} {'Value':>10}")
print("-" * 50)
print(f"{'Nodes / Edges':<38} {N_LARGE:>5} / {len(edges_large):<4}")
print(f"{'QAOA layers (p)':<38} {p_large:>10}")
print(f"{'Transpiled ECR gate count':<38} {ecr_count:>10}")
print(f"{'Transpiled circuit depth':<38} {ws_isa_large.depth():>10}")
print(f"{'Optimizer iterations':<38} {len(ws_history_hw):>10}")
print(f"{'WS-QAOA energy (hardware)':<38} {ws_result_hw.fun:>10.4f}")
print(f"{'Cut value':<38} {cut_val_hw:>10}")
print(f"{'Simulated annealing cut value':<38} {sa_cut:>10}")
print(f"{'Approximation ratio (vs SA)':<38} {approx_ratio_hw:>10.4f}")Output:
Simulated annealing cut value: 53 (classical reference)
iter 30 <H_C> = -12.8021
Optimisation complete: energy=-12.8904, iterations=30
Most-probable bitstring frequency: 6/8192 (0.1%)
WS-QAOA cut: 53 | SA cut: 53 | Approximation ratio vs SA: 1.0000
=== Large Scale Summary ===
Metric Value
--------------------------------------------------
Nodes / Edges 40 / 60
QAOA layers (p) 1
Transpiled ECR gate count 0
Transpiled circuit depth 283
Optimizer iterations 30
WS-QAOA energy (hardware) -12.8904
Cut value 53
Simulated annealing cut value 53
Approximation ratio (vs SA) 1.0000
What to look into next
If you found this work interesting you may be interested in the following material:
- Higher QAOA layers: Increase
pto see how both algorithms improve with more circuit layers, and whether the WS-QAOA advantage at low depth persists. - Qiskit addon optimization mapper: Explore the documentation and try modeling different combinatorial problems, or using different solvers for the continuous relaxation.
References
[1] D. J. Egger, J. Mareček, and S. Woerner, "Warm-starting quantum optimization," Quantum, vol. 5, p. 479, 2021. arXiv:2009.10095
[2] E. Farhi, J. Goldstone, and S. Gutmann, "A quantum approximate optimization algorithm," arXiv:1411.4028, 2014.
© IBM Corp. 2026