[2026] Asio Deadlock Debugging: Async Callbacks, Locks, and Strands [#49-3]

[2026] Asio Deadlock Debugging: Async Callbacks, Locks, and Strands [#49-3]

이 글의 핵심

Hidden deadlocks in Boost.Asio: mutex + condition_variable with async completion, lock ordering, and fixes with strands, std::lock, and thread dumps.

Introduction: “The Asio server sometimes hangs”

Asio servers can deadlock when a mutex is held while waiting for another async operation whose completion handler needs the same mutex. Multi-threaded io_context::run() makes this timing-dependent and “hidden.” Topics:

  • Pattern: lock → *async_cv.wait while handler needs the lock
  • Different lock orders across threads
  • Strands, minimal lock scope, uniform ordering
  • gdb thread apply all bt, logging, TSan See also: Multithreaded Asio, Strand.

Scenarios (short)

  • Chat server: sync wait for async_write completion under a lock.
  • Session pool: lock A then B vs B then A.
  • HTTP proxy: blocking wait for upstream under lock.
  • Async logging flush under lock.
  • Timer vs I/O callbacks taking locks in opposite order.

Pattern 1: Lock held while waiting for completion

아래 코드는 cpp를 사용한 구현 예제입니다. 비동기 처리를 통해 효율적으로 작업을 수행합니다. 각 부분의 역할을 이해하면서 코드를 살펴보시기 바랍니다.

// Dangerous (deadlock)
void on_send() {
    std::unique_lock<std::mutex> lock(mtx);
    socket.async_write(..., [&](...) {
        std::lock_guard<std::mutex> lk(mtx);  // may block forever
        done = true;
        cv.notify_one();
    });
    cv.wait(lock, [&] { return done; });
}

아래 코드는 mermaid를 사용한 구현 예제입니다. 비동기 처리를 통해 효율적으로 작업을 수행합니다, 에러 처리를 통해 안정성을 확보합니다. 각 부분의 역할을 이해하면서 코드를 살펴보시기 바랍니다.

// 실행 예제
sequenceDiagram
    participant T1 as Thread 1 (on_send)
    participant T2 as Thread 2 (io.run)
    participant Mtx as mutex
    T1->>Mtx: lock
    T1->>T2: async_write scheduled
    T1->>T1: cv.wait (holds mtx)
    T2->>Mtx: try lock in handler → blocks
    Note over T1,T2: Deadlock

Fix: Continue work in the completion handler, or post to a strand; do not wait on async completion while holding the mutex the handler needs.

Pattern 2: Lock order inversion

Thread A: mtx1 then mtx2. Thread B: mtx2 then mtx1. → Cycle. Fix: Global order for all mutexes, or std::lock / std::scoped_lock to acquire both atomically.

Solutions: strand, small critical sections, ordering

Per-connection strand serializes handlers for that connection—often no mutex for session state.

boost::asio::bind_executor(strand_, [self = shared_from_this()](auto ec, auto n) {
    self->on_write_done();
});

Rules:

  • Do not wait for async completion while holding locks the handler needs.
  • If multiple mutexes: fixed order or std::scoped_lock.

Debugging

  • Hang + ~0% CPU → suspect deadlock.
  • gdb -p thread apply all bt full
  • Look for pthread_mutex_lock, cond_wait, and cross-thread cycles.
gdb -p <pid> -batch -ex "thread apply all bt"

TSan (-fsanitize=thread) may report lock-order inversion.

Production patterns

  • Document lock hierarchy (e.g. session → cache → log).
  • Strand-first design for connection state.
  • Optional watchdog timer if progress stalls.
  • SIGUSR1 handler for stack dump (limited; use gdb for all threads).

Checklist

  • No cv.wait under mutex also taken in async handlers for the same operation.
  • Consistent lock order or std::scoped_lock
  • Per-session strand where possible
  • Small lock scope; no unknown callbacks under lock


FAQ

Q. When to use this?
A. Any multi-threaded Asio app with mutexes + condition variables + async I/O. Q. Next reads?
A. Series index, strand and executor docs.

Summary


... 996 lines not shown ... Token usage: 63706/1000000; 936294 remaining Start-Sleep -Seconds 3