[2026] MongoDB Schema Design: Embedded vs Referenced Documents | Modeling Complete Guide
이 글의 핵심
Choose embedded vs referenced collections using document size, read/write patterns, 1:N growth, and consistency—plus buckets, partial updates, and the 16MB limit.
Introduction
MongoDB is “schemaless,” but production services benefit from an explicit schema. Embedded vs referenced design is the core fork: store child data inside the parent document (embedded) or in another collection linked by id (referenced). Embedded documents can end reads in one round trip; references control growth and reuse of documents. This article replaces gut feel with measurable criteria: size, frequency, consistency.
After reading this post
- You can apply a checklist for embedded vs referenced designs
- You get hints for 1:N, N:M, i18n/versioning variants
- You understand the 16MB document limit and index interactions
Table of contents
- Concepts
- Hands-on implementation
- Advanced usage
- Performance comparison
- Real-world cases
- Troubleshooting
- Conclusion
Concepts
Embedded
Related data lives in one BSON document as nested arrays or sub-documents. 아래 코드는 javascript를 사용한 구현 예제입니다. 코드를 직접 실행해보면서 동작을 확인해보세요.
// User + addresses (embedded example)
{
_id: ObjectId("..."),
name: "Kim",
addresses: [
{ label: "home", city: "Seoul", zip: "03000" },
{ label: "work", city: "Seongnam", zip: "13000" }
]
}
Pros: If you almost always read together, one query finishes the job. Updates in the same document can leverage atomic updates.
Referenced
Separate collections linked by ObjectId (or similar). 다음은 간단한 javascript 코드 예제입니다. 코드를 직접 실행해보면서 동작을 확인해보세요.
// users
{ _id: ObjectId("u1"), name: "Kim" }
// addresses
{ _id: ObjectId("a1"), userId: ObjectId("u1"), city: "Seoul" }
Pros: Child collections can grow large without bloating the parent. Fits models where many parents share the same child.
BSON document size
A single document is capped at 16MB. Data that could grow without bound should start as separate collections + indexes.
Hands-on implementation
Decision checklist
| Question | Favors embedded | Favors referenced |
|---|---|---|
| Read together? | Almost always | Rarely or partial reads |
| Updated together? | Often at once | Frequently independent |
| Bounded child count? | Small (e.g. a few addresses) | Thousands/unbounded |
| Shared child across parents? | Rarely | Often |
| Strong cross-doc integrity? | If it fits one document | Multiple collections + app logic/transactions |
Pattern 1: 1:few — embedded
If comments are always shown with a post and counts are limited (or page-sized), consider embedding.
Pattern 2: 1:many (large) — referenced + indexes
아래 코드는 javascript를 사용한 구현 예제입니다. 코드를 직접 실행해보면서 동작을 확인해보세요.
// posts collection
db.posts.createIndex({ slug: 1 }, { unique: true });
// comments collection
db.comments.createIndex({ postId: 1, createdAt: -1 });
db.comments.find({ postId: postObjectId }).sort({ createdAt: -1 }).limit(50);
Pattern 3: Middle ground — buckets
For time-series logs, group readings per day to reduce document count. 아래 코드는 javascript를 사용한 구현 예제입니다. 코드를 직접 실행해보면서 동작을 확인해보세요.
{
sensorId: "s1",
day: ISODate("2026-03-30T00:00:00Z"),
readings: [
{ t: ISODate("2026-03-30T00:05:00Z"), v: 23.1 },
// ....batched for the day (design an upper bound)
]
}
Embedded vs referenced often lands on embedded-but-bucketed compromises.
$lookup (join)
For occasional joins on referenced models, aggregation $lookup works—but for hot paths, prefer two reads in application code or caching.
Advanced usage
- i18n fields:
{ title: { ko: "...", en: "..." } }vs locale-split documents—match your translation workflow. - Partial updates: Use array filters and positional operators to reduce write conflicts on embedded arrays.
- Schema validation: Enforce required fields/types with validator and JSON Schema for operational safety.
Performance comparison
| Aspect | Embedded | Referenced |
|---|---|---|
| Read latency (common path) | Single find | find + find or $lookup |
| Write contention | Hotspot if many writers to one doc | Can spread writes |
| Indexes | Nested field indexes—more complex | Per-collection, often simpler |
| Consistency | Single-doc atomic updates | Application logic + transactions |
Real-world cases
- E-commerce orders: Header + dozens of line items often embedded; product master referenced.
- Social feeds: Post referenced; like counts denormalized with eventual consistency via events.
- B2B multi-tenant: Include tenantId in every query with a leading index.
Troubleshooting
| Symptom | Cause / fix |
|---|---|
| Document near 16MB | Split arrays, archive collections, buckets |
Slow update on large embedded arrays | Split documents or move to references |
| Broken referential integrity | App validation + transactions if needed; explicit delete policies (soft delete) |
| Index used but still slow | Working set size, projection minimization, covering indexes |
Conclusion
MongoDB embedded vs referenced design is not “anything goes because NoSQL”—it is locking read patterns and growth boundaries into code. Compare with relational transaction/join trade-offs using the PostgreSQL vs MySQL guide, apply this checklist first, then validate with load tests.