[2026] Cache-Friendly C++: Data-Oriented Design Complete Guide

2026년 3월 12일 · 18분 읽기 · 수정 2026년 3월 12일 Advanced concept

이 글의 핵심

Data-oriented design for C++ performance: AoS vs SoA, cache lines, false sharing, alignment, benchmarks, and production patterns. SEO: cache optimization, DOD, false sharing, SIMD.

Introduction: cache decides throughput

Modern CPUs are often memory bound. Data-oriented design (DoD) lays out data for sequential access and SIMD. Structure-of-arrays (SoA) often beats array-of-structures (AoS) when loops touch few fields of many objects. False sharing kills parallel scaling unless you pad or align per-thread counters to separate cache lines (~64 bytes). This article covers: DoD, cache lines, `alignas`, false sharing, scenarios, AoS→SoA examples, pitfalls, benchmarks, engine/simulation patterns.

Why cache optimization matters
Data-oriented design
Cache lines & alignment
False sharing & padding
Complete examples
Common mistakes
Benchmarks
Production patterns
Summary

1. Why cache optimization matters

100k entities: updating position only still loads velocity/color/id in AoS → wasted bandwidth.
More threads slower: false sharing on adjacent counters.
SIMD won’t vectorize: AoS scatters x across strides.

2. Data-oriented design

아래 코드는 mermaid를 사용한 구현 예제입니다. 각 부분의 역할을 이해하면서 코드를 살펴보시기 바랍니다.

flowchart TB
    subgraph AoS[AoS]
        E1["Entity0: pos, vel, id"]
        E2["Entity1: ..."]
    end
    subgraph SoA[SoA]
        X[x[]]
        Y[y[]]
        Z[z[]]
    end
    AoS -->|position-only loop| Waste["Loads unused fields"]
    SoA -->|position-only loop| Hit["Sequential x,y,z"]

Rule of thumb: thousands+ entities, field-specific hot loops, SIMD → SoA. Small counts (<~100–1000) may favor simpler AoS.

3. Cache lines & alignment

Typical 64-byte lines. `alignas(64)` hot atomics/counters to separate lines. Use `std::hardware_destructive_interference_size` (C++17) when available.

Independent variables on the same cache line invalidate each other across cores. Fix with line-sized padding or per-thread shards.

5. Complete examples

This series also walks through a full particle AoS vs SoA benchmark and padded atomic counters for parallel increments—adapt the code and comments to your codebase.

6. Common mistakes

SoA index mismatch after partial deletes—use swap-with-last across all arrays.
Over-padding everything—only hot written fields need isolation.
SoA with random indices loses locality—sort/pack active entities.

7. Benchmarks

Use `perf stat -e cache-misses,cache-references` and Release (`-O3 -march=native`) builds.

8. Production patterns

ECS-style component arrays, batch processing, blocked matrix multiply, hybrid AoSoA blocks.

9. Summary

Topic	Takeaway
DoD	Prefer SoA when loops are field-specific
Cache line	64B, `alignas` for sharing avoidance
False sharing	pad or shard counters
Production	measure with `perf`, profile hot loops

Keywords

data-oriented design, cache optimization, AoS SoA, false sharing, cache line, SIMD Next: Custom allocators & pmr (#39-2)
Previous: PIMPL & ABI (#38-3)