[2026] Cache-Friendly C++: Data-Oriented Design Complete Guide
이 글의 핵심
Data-oriented design for C++ performance: AoS vs SoA, cache lines, false sharing, alignment, benchmarks, and production patterns. SEO: cache optimization, DOD, false sharing, SIMD.
Introduction: cache decides throughput
Modern CPUs are often memory bound. Data-oriented design (DoD) lays out data for sequential access and SIMD. Structure-of-arrays (SoA) often beats array-of-structures (AoS) when loops touch few fields of many objects. False sharing kills parallel scaling unless you pad or align per-thread counters to separate cache lines (~64 bytes).
This article covers: DoD, cache lines, alignas, false sharing, scenarios, AoS→SoA examples, pitfalls, benchmarks, engine/simulation patterns.
Table of contents
- Why cache optimization matters
- Data-oriented design
- Cache lines & alignment
- False sharing & padding
- Complete examples
- Common mistakes
- Benchmarks
- Production patterns
- Summary
1. Why cache optimization matters
- 100k entities: updating position only still loads velocity/color/id in AoS → wasted bandwidth.
- More threads slower: false sharing on adjacent counters.
- SIMD won’t vectorize: AoS scatters
xacross strides.
2. Data-oriented design
아래 코드는 mermaid를 사용한 구현 예제입니다. 각 부분의 역할을 이해하면서 코드를 살펴보시기 바랍니다.
flowchart TB
subgraph AoS[AoS]
E1["Entity0: pos, vel, id"]
E2["Entity1: ..."]
end
subgraph SoA[SoA]
X[x[]]
Y[y[]]
Z[z[]]
end
AoS -->|position-only loop| Waste["Loads unused fields"]
SoA -->|position-only loop| Hit["Sequential x,y,z"]
Rule of thumb: thousands+ entities, field-specific hot loops, SIMD → SoA. Small counts (<~100–1000) may favor simpler AoS.
3. Cache lines & alignment
Typical 64-byte lines. alignas(64) hot atomics/counters to separate lines. Use std::hardware_destructive_interference_size (C++17) when available.
4. False sharing
Independent variables on the same cache line invalidate each other across cores. Fix with line-sized padding or per-thread shards.
5. Complete examples
This series also walks through a full particle AoS vs SoA benchmark and padded atomic counters for parallel increments—adapt the code and comments to your codebase.
6. Common mistakes
- SoA index mismatch after partial deletes—use swap-with-last across all arrays.
- Over-padding everything—only hot written fields need isolation.
- SoA with random indices loses locality—sort/pack active entities.
7. Benchmarks
Use perf stat -e cache-misses,cache-references and Release (-O3 -march=native) builds.
8. Production patterns
ECS-style component arrays, batch processing, blocked matrix multiply, hybrid AoSoA blocks.
9. Summary
| Topic | Takeaway |
|---|---|
| DoD | Prefer SoA when loops are field-specific |
| Cache line | 64B, alignas for sharing avoidance |
| False sharing | pad or shard counters |
| Production | measure with perf, profile hot loops |
Keywords
data-oriented design, cache optimization, AoS SoA, false sharing, cache line, SIMD
Next: Custom allocators & pmr (#39-2)
Previous: PIMPL & ABI (#38-3)