Courier routing, two ways — a greedy baseline vs a fleet of SA workers.
Same map, same depot, same euclidean costs. Pick a problem, then watch a no-optimization baseline lose to 8 parallel Simulated-Annealing "Mac minis," side by side.
Objective: Maximise net profit. Each stop pays a tip that decays with lateness; the courier picks WHICH stops to make and in what order.
Large graphs — coming soon
64–10,000 nodes with 256+ workers. In-browser Web Workers top out well before that on a laptop; this tier is reserved for a real compute backend (data-center fleet or a blockchain compute market).
Benchmark & Complexity
Wall-clock is recorded every run (the baseline is timed over repeated runs since one pass is sub-millisecond). With ≥2 graph sizes we fit ms ≈ a·nᵖ by log-log regression and name the growth regime. Run Small then Medium to populate both points. At these tiny sizes SA wall-clock is partly worker-startup overhead, so its fitted exponent reflects setup cost as much as algorithmic scaling.
Vanilla PCTSP (heuristic)
Need ≥2 timed runs.
| n | time |
|---|---|
| no runs yet | |
Parallel SA
Need ≥2 timed runs.
| n | time |
|---|---|
| no runs yet | |
What exact (exponential) search would have cost
| n | Held–Karp ops (2ⁿ·n²) | @1e9 ops/s |
|---|---|---|
| 16 | 1.7e+7 | 17 ms |
| 24 | 9.7e+9 | 9.7 s |
| 48 | 6.5e+17 | 20.6 years |
| 64 | 7.6e+22 | 2.4e+6 years |
| 100 | 1.3e+34 | 4.0e+17 years |
That column is why nobody solves Prize-Collecting TSP exactly past ~20 nodes. The parallel-SA wall-clock above lives in polynomial territory while landing near-optimal — and the ÷ cores term shrinks as you rent more compute (efficient data centers, or a public blockchain where contributors sell idle cycles).
How it works
Two objectives, one engine
Prize-Collecting: max net profit = Σ decaying tips − travel cost; the courier skips unprofitable stops. Shortest-Route: visit everyone, minimise the closed loop. Both feed the same fleet of SA workers — only the score function changes.
Why SA wins
The baseline commits to the locally-best next move and gets stuck. Simulated annealing accepts occasional worse moves (Metropolis), escaping local optima. Run many independent workers, keep the best — embarrassingly parallel.
Hardware reality (Apple M2)
Your 8-core M2 (4P+4E, 16 GB) runs 8 workers one-per-core. 32 workers oversubscribe 4:1 — the OS time-slices, so you get ~8× real throughput, not 32×, but each worker needs <1 MB so memory is a non-issue. Wall-clock stays sub-second.
The economic claim
SA quality scales with total iterations = workers × iters. If compute is cheap and parallel — efficient data centers or a public blockchain where contributors sell idle cycles — you push near-optimal at polynomial wall-clock, never paying the exponential price of exact search.