Does MVCC mean reads never block writes?

MVCC makes snapshot reads friendly to concurrent writers, but writers still contend on row-level locks and can deadlock. Long-running transactions can also delay cleanup and increase bloat-related read costs.

Should I run VACUUM FULL regularly?

Usually no. VACUUM FULL rewrites a table under a strong lock and can interrupt production. Prefer autovacuum and ordinary VACUUM; reserve VACUUM FULL for exceptional maintenance windows.

When do I pick GiST vs GIN?

GIN is common for inverted-index workloads: arrays, jsonb containment, full-text search. GiST fits overlapping keys and generalized search: geometry, ranges, exclusion constraints. Validate with EXPLAIN and docs for your operator class.

Why does EXPLAIN cost not match reality?

Costs are relative estimates. Check row estimate errors (statistics), cost parameters (e.g. random_page_cost on SSDs), join search limits, and confirm with EXPLAIN (ANALYZE, BUFFERS).

PostgreSQL Complete Guide | MVCC, VACUUM, Planner, B-tree/GiST/GIN & Production Patterns

2026년 4월 5일 · 58분 읽기 · 수정 2026년 4월 18일 advanced tutorial

이 글의 핵심

Understand PostgreSQL MVCC, VACUUM, and the cost-based planner, choose B-tree vs GiST vs GIN with confidence, and pair practical SQL tuning with partitioning, replication, backups, and production-grade operations.

What this post covers

PostgreSQL combines MVCC for row versioning, VACUUM for reclaiming dead row space (and related maintenance), and a cost-based optimizer that uses statistics to pick plans. Understanding these three pillars makes index choices and EXPLAIN output far easier to reason about.

This guide explains those internals at a practical depth, then covers B-tree, GiST, and GIN indexes, followed by query tuning, partitioning, replication, backups, and production patterns you can apply immediately.

From experience: On high-churn event pipelines, partitioning plus index redesign often yields large latency wins; pairing that with healthy autovacuum cadence and accurate statistics prevents write/read regressions under load.

1. MVCC: what PostgreSQL guarantees

PostgreSQL does not typically overwrite a row in place on update. Instead, it leaves old row versions in the heap until they can be cleaned up. A transaction snapshot defines which committed changes are visible, and tuple headers store the transaction metadata needed for visibility checks.

1.1 Tuple headers and transaction IDs

Each heap row version carries xmin (inserting transaction) and xmax (invalidating delete/update transaction when applicable). Thinking of updates as delete + insert at the storage level is a productive mental model for troubleshooting bloat and index churn.

1.2 Snapshots and visibility (conceptual)

A snapshot encodes which transactions are considered committed for the statement/transaction. Readers usually pick one visible version per row according to the snapshot rules—so reads tend not to block writers the way coarse table locks would, but writer–writer contention remains.

1.3 HOT (Heap-Only Tuple) updates

When indexed columns do not change, PostgreSQL can sometimes chain new row versions without creating new index entries (HOT). Updates that change indexed expressions/columns are more likely to create new index entries and increase write amplification.

1.4 Operational symptoms

Long transactions delay vacuum effectiveness by widening the range of row versions that must remain reachable.
High update rates can cause both heap and index bloat if cleanup cannot keep pace.

2. VACUUM and storage management

2.1 Ordinary (lazy) VACUUM

A normal VACUUM marks space from dead tuples as reusable for future inserts/updates. It often does not shrink the on-disk file size immediately; disk usage graphs may look “stair-stepped” even when space is reusable inside the file.

2.2 VACUUM FULL vs ordinary VACUUM

VACUUM FULL can shrink files by rewriting the table, but it takes strong locks and is disruptive. Most services rely on autovacuum + tuning and only use VACUUM FULL during planned maintenance.

2.3 Freezing and transaction ID wraparound

Transaction IDs are finite. Old rows must be frozen so the cluster can continue safely. Autovacuum participates in freezing; if autovacuum stalls, you risk operational incidents around wraparound pressure.

2.4 Visibility map and Index Only Scans

The visibility map helps PostgreSQL know when heap visits can be skipped. That matters for Index Only Scans: VACUUM is not only “cleanup”—it also affects whether index-only plans remain attractive.

2.5 Autovacuum tuning hints

For large, high-churn tables, consider per-table settings or partitioning to reduce chunk size and make vacuum/analyze more predictable.

ALTER TABLE events SET (
  autovacuum_vacuum_scale_factor = 0.05,
  autovacuum_analyze_scale_factor = 0.02
);

3. Query planner and statistics

3.1 Parse → analyze → rewrite → plan

SQL becomes an internal query tree; the planner chooses join order and access paths using statistics. The same SQL text can switch plans when data distribution changes—this is expected.

3.2 Cost model knobs

seq_page_cost, random_page_cost, and CPU-related costs are relative weights, not seconds. On SSDs with high cache hit rates, lowering random_page_cost is a common adjustment. effective_cache_size is a hint about OS cache, not a memory allocation.

3.3 Statistics: histograms, MCVs, dependencies

ANALYZE builds summaries of column value distributions. For correlated predicates, extended statistics can reduce misestimates.

CREATE STATISTICS stats_orders (dependencies)
ON user_id, status FROM orders;
ANALYZE orders;

3.4 What to read in EXPLAIN

Estimated vs actual rows: large gaps often mean stale stats or tricky predicates.
Buffers: helps separate CPU vs I/O effects when combined with ANALYZE.

EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
SELECT ...

3.5 Join search space

With many relations, PostgreSQL may rely on GEQO (genetic optimization). Good stats, constraints, and indexing matter even more in those cases.

4. Index access methods: B-tree, GiST, GIN

4.1 B-tree (default)

The default choice for equality, range, ordering, and many join keys. Composite indexes require leading-column predicates to be selective in the way the index expects.

CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at);
CREATE INDEX idx_active_users ON users(email) WHERE is_active = true;
CREATE INDEX idx_users_lower_email ON users(LOWER(email));

4.2 GiST (Generalized Search Tree)

Useful when keys are not simple scalar orderings—cases involving overlap, geometry, ranges, and certain exclusion constraints. It is a different indexing philosophy than B-tree ordering.

4.3 GIN (Generalized Inverted Index)

Strong for inverted workloads: arrays, jsonb, full-text search. Writes can be heavier; tune autovacuum and GIN maintenance settings as needed.

CREATE INDEX idx_metadata ON events USING GIN (metadata jsonb_path_ops);
CREATE INDEX idx_tags ON posts USING GIN (tags);
CREATE INDEX idx_content_search ON articles USING GIN (to_tsvector('english', content));

4.4 Example: search workload

CREATE TABLE articles (
  id SERIAL PRIMARY KEY,
  title TEXT NOT NULL,
  content TEXT NOT NULL,
  tags TEXT[],
  metadata JSONB,
  created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE INDEX idx_articles_tags ON articles USING GIN (tags);
CREATE INDEX idx_articles_metadata ON articles USING GIN (metadata);
CREATE INDEX idx_articles_search ON articles USING GIN(
  to_tsvector('english', title || ' ' || content)
);
SELECT * FROM articles
WHERE to_tsvector('english', title || ' ' || content) @@ to_tsquery('english', 'postgresql & performance')
ORDER BY created_at DESC
LIMIT 10;

5. Production PostgreSQL patterns

Pooling: use PgBouncer (or similar) to keep connection counts sane and favor short transactions.
Read scaling: route reads to replicas, but monitor replication lag and application consistency needs.
Observability: enable pg_stat_statements to find top queries by total time and I/O; pair with log thresholds.
Backup/restore: combine base backups with WAL archiving for PITR.
Migrations: avoid long DDL locks where possible; consider CONCURRENTLY index builds and batched data backfills.

SELECT query, calls, total_exec_time, mean_exec_time
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;

6. Practical query optimization

6.1 EXPLAIN ANALYZE baseline

EXPLAIN ANALYZE
SELECT u.name, COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.created_at > '2026-01-01'
GROUP BY u.id, u.name
ORDER BY order_count DESC
LIMIT 10;

Seq Scan can be legitimate—or a bottleneck—depending on selectivity and size.
Index Scan vs Bitmap Index Scan reflects different access strategies; verify with buffers.

6.2 Sargable predicates beat functions on columns

-- Often poor for indexes: function wraps the column
SELECT * FROM orders WHERE EXTRACT(YEAR FROM created_at) = 2026;
-- Often better: range predicate
SELECT * FROM orders
WHERE created_at >= '2026-01-01' AND created_at < '2027-01-01';

6.3 CTE materialization (when it matters)

WITH recent_orders AS MATERIALIZED (
  SELECT user_id, COUNT(*) AS order_count
  FROM orders
  WHERE created_at > NOW() - INTERVAL '30 days'
  GROUP BY user_id
)
SELECT u.name, ro.order_count
FROM users u
JOIN recent_orders ro ON u.id = ro.user_id;

7. Partitioning

7.1 Range partitioning

CREATE TABLE events (
  id BIGSERIAL,
  user_id INTEGER NOT NULL,
  event_type TEXT NOT NULL,
  created_at TIMESTAMPTZ NOT NULL,
  data JSONB
) PARTITION BY RANGE (created_at);

CREATE TABLE events_2026_01 PARTITION OF events
FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');
CREATE TABLE events_2026_02 PARTITION OF events
FOR VALUES FROM ('2026-02-01') TO ('2026-03-01');
CREATE TABLE events_2026_03 PARTITION OF events
FOR VALUES FROM ('2026-03-01') TO ('2026-04-01');

CREATE INDEX idx_events_user_id ON events(user_id);

7.2 pg_partman

CREATE EXTENSION pg_partman;
SELECT create_parent(
  'public.events',
  'created_at',
  'native',
  'monthly',
  p_premake := 3,
  p_start_partition := '2026-01-01'
);

7.3 Partition pruning

SELECT * FROM events
WHERE created_at >= '2026-03-01' AND created_at < '2026-04-01';

8. Replication

8.1 Streaming replication

Primary (postgresql.conf):

wal_level = replica
max_wal_senders = 3
wal_keep_size = 1GB

Replica (postgresql.conf):

hot_standby = on

pg_basebackup -h primary-host -D /var/lib/postgresql/data -U replicator -P -v -R

8.2 Logical replication

CREATE PUBLICATION my_pub FOR TABLE users, orders;
CREATE SUBSCRIPTION my_sub
CONNECTION 'host=primary-host dbname=mydb user=replicator'
PUBLICATION my_pub;

9. Backup strategy

9.1 pg_dump (logical)

pg_dump -U postgres -d mydb -F c -f mydb_backup.dump
pg_restore -U postgres -d mydb -v mydb_backup.dump

9.2 pg_basebackup + WAL archiving

pg_basebackup -h localhost -D /backup/pgdata -U postgres -P -v

archive_mode = on
archive_command = 'cp %p /backup/wal/%f'

9.3 Automation sketch

#!/bin/bash
DATE=$(date +%Y%m%d_%H%M%S)
BACKUP_DIR="/backup"
DB_NAME="mydb"
pg_dump -U postgres -d $DB_NAME -F c -f "$BACKUP_DIR/${DB_NAME}_${DATE}.dump"
find $BACKUP_DIR -name "*.dump" -mtime +7 -delete
echo "Backup finished: ${DB_NAME}_${DATE}.dump"

10. Performance tuning settings

shared_buffers = 4GB
effective_cache_size = 12GB
work_mem = 64MB
maintenance_work_mem = 1GB
random_page_cost = 1.1
effective_io_concurrency = 200
wal_buffers = 16MB
checkpoint_completion_target = 0.9
max_wal_size = 4GB

ANALYZE users;
VACUUM (VERBOSE, ANALYZE) users;

11. Example: large-scale logging

CREATE TABLE logs (
  id BIGSERIAL,
  user_id INTEGER NOT NULL,
  action TEXT NOT NULL,
  ip_address INET,
  metadata JSONB,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
) PARTITION BY RANGE (created_at);

DO $$
DECLARE
  start_date DATE := '2026-01-01';
  end_date DATE := '2027-01-01';
  partition_date DATE;
BEGIN
  partition_date := start_date;
  WHILE partition_date < end_date LOOP
    EXECUTE format(
      'CREATE TABLE IF NOT EXISTS logs_%s PARTITION OF logs
       FOR VALUES FROM (%L) TO (%L)',
      to_char(partition_date, 'YYYY_MM'),
      partition_date,
      partition_date + INTERVAL '1 month'
    );
    partition_date := partition_date + INTERVAL '1 month';
  END LOOP;
END $$;

CREATE INDEX idx_logs_user_id ON logs(user_id);
CREATE INDEX idx_logs_action ON logs(action);
CREATE INDEX idx_logs_metadata ON logs USING GIN(metadata);

SELECT action, COUNT(*) AS count
FROM logs
WHERE created_at >= '2026-03-01' AND created_at < '2026-04-01'
  AND user_id = 12345
GROUP BY action;

Summary and checklist

Key takeaways

MVCC: versioning + snapshots drive visibility; long transactions and hot update paths affect bloat and cleanup.
VACUUM: reclaims dead tuple space, supports freezing, and feeds visibility-map behavior; prefer autovacuum tuning over routine VACUUM FULL.
Planner: statistics and cost parameters steer plans; validate with EXPLAIN (ANALYZE, BUFFERS).
Indexes: B-tree for most scalar workloads; GiST for overlap/generalized search; GIN for inverted workloads.
Production: pooling, replication lag awareness, observability, and PITR-capable backups.

Production checklist

Right-size indexes; drop unused ones
Profile top queries with EXPLAIN (ANALYZE, BUFFERS)
Keep statistics fresh; consider extended stats where needed
Monitor autovacuum health and bloat
Partition/archival strategy for huge tables
Replication lag and failover drills
Backup/restore and WAL archiving tested regularly
pg_stat_statements + actionable alerts

Keywords covered

PostgreSQL, MVCC, VACUUM, query planner, B-tree, GiST, GIN, partitioning, replication, backup, production operations