Why are Asio deadlocks harder than synchronous ones?

Callback chains, thread pools, and strands hide lock order. Draw which mutex each completion handler takes.

Mutex on io_context threads—never?

Not never—but if two async handlers take the same mutex in different orders, you can deadlock. Prefer strands to serialize per connection.

post vs dispatch and deadlocks?

Dispatch may run immediately on the same thread; post queues. Re-entrancy and lock ordering differ—design carefully.

Timeouts, logging, thread dumps, TSan stress tests.

[2026] Asio Deadlock Debugging: Async Callbacks, Locks, and Strands [#49-3]

2026년 3월 12일 · 22분 읽기 · 수정 2026년 3월 12일 Advanced Troubleshooting

이 글의 핵심

Hidden deadlocks in Boost.Asio: mutex + condition_variable with async completion, lock ordering, and fixes with strands, std::lock, and thread dumps.

Introduction: “The Asio server sometimes hangs”

Asio servers can deadlock when a mutex is held while waiting for another async operation whose completion handler needs the same mutex. Multi-threaded io_context::run() makes this timing-dependent and “hidden.” Topics:

Pattern: lock → *async_ → cv.wait while handler needs the lock
Different lock orders across threads
Strands, minimal lock scope, uniform ordering
gdb thread apply all bt, logging, TSan See also: Multithreaded Asio, Strand.

Scenarios (short)

Chat server: sync wait for async_write completion under a lock.
Session pool: lock A then B vs B then A.
HTTP proxy: blocking wait for upstream under lock.
Async logging flush under lock.
Timer vs I/O callbacks taking locks in opposite order.

Pattern 1: Lock held while waiting for completion

아래 코드는 cpp를 사용한 구현 예제입니다. 비동기 처리를 통해 효율적으로 작업을 수행합니다. 각 부분의 역할을 이해하면서 코드를 살펴보시기 바랍니다.

// Dangerous (deadlock)
void on_send() {
    std::unique_lock<std::mutex> lock(mtx);
    socket.async_write(..., [&](...) {
        std::lock_guard<std::mutex> lk(mtx);  // may block forever
        done = true;
        cv.notify_one();
    });
    cv.wait(lock, [&] { return done; });
}

아래 코드는 mermaid를 사용한 구현 예제입니다. 비동기 처리를 통해 효율적으로 작업을 수행합니다, 에러 처리를 통해 안정성을 확보합니다. 각 부분의 역할을 이해하면서 코드를 살펴보시기 바랍니다.

// 실행 예제
sequenceDiagram
    participant T1 as Thread 1 (on_send)
    participant T2 as Thread 2 (io.run)
    participant Mtx as mutex
    T1->>Mtx: lock
    T1->>T2: async_write scheduled
    T1->>T1: cv.wait (holds mtx)
    T2->>Mtx: try lock in handler → blocks
    Note over T1,T2: Deadlock

Fix: Continue work in the completion handler, or post to a strand; do not wait on async completion while holding the mutex the handler needs.

Pattern 2: Lock order inversion

Thread A: `mtx1` then `mtx2`. Thread B: `mtx2` then `mtx1`. → Cycle. Fix: Global order for all mutexes, or `std::lock` / `std::scoped_lock` to acquire both atomically.

Solutions: strand, small critical sections, ordering

Per-connection strand serializes handlers for that connection—often no mutex for session state.

boost::asio::bind_executor(strand_, [self = shared_from_this()](auto ec, auto n) {
    self->on_write_done();
});

Rules:

Do not wait for async completion while holding locks the handler needs.
If multiple mutexes: fixed order or std::scoped_lock.

Debugging

Hang + ~0% CPU → suspect deadlock.
gdb -p → thread apply all bt full
Look for pthread_mutex_lock, cond_wait, and cross-thread cycles.

gdb -p <pid> -batch -ex "thread apply all bt"

TSan (`-fsanitize=thread`) may report lock-order inversion.

Production patterns

Document lock hierarchy (e.g. session → cache → log).
Strand-first design for connection state.
Optional watchdog timer if progress stalls.
SIGUSR1 handler for stack dump (limited; use gdb for all threads).

Checklist

No cv.wait under mutex also taken in async handlers for the same operation.
Consistent lock order or std::scoped_lock
Per-session strand where possible
Small lock scope; no unknown callbacks under lock

FAQ

Q. When to use this?
A. Any multi-threaded Asio app with mutexes + condition variables + async I/O. Q. Next reads?
A. Series index, strand and executor docs.

Summary

Deadlock: lock + wait for async completion whose handler needs the same lock.
Fix: async chaining, strands, lock ordering, std::scoped_lock.
Debug: all-thread backtraces, logging, TSan. Previous: CMake link errors (#49-2) Related: High-performance networking guide index