Python Complete Guide | CPython Internals, GIL, GC, Imports & Production Patterns

Python Complete Guide | CPython Internals, GIL, GC, Imports & Production Patterns

이 글의 핵심

Beyond syntax: how CPython compiles to bytecode and runs the eval loop, what the GIL implies for CPU vs I/O workloads, refcount plus generational GC, how import paths and caches interact, and production-grade logging, config, and observability.

Core takeaways

This guide focuses on how CPython executes code and how to operate Python services reliably. Unless noted, “Python” means the CPython reference implementation; PyPy, GraalPython, and others differ in JIT, GC, and threading models.

What you will learn

  1. CPython architecture: source → AST → bytecode → the ceval eval loop, frames, and code objects
  2. The GIL: what it serializes, when it can be released, and practical concurrency choices
  3. Memory and GC: refcounting, cyclic garbage, generational collection, weakref, and allocator behavior
  4. Imports: sys.path, importlib, packages, namespace packages, circular imports, metadata
  5. Production patterns: structured logging, configuration, WSGI/ASGI deployment, profiling and observability, security basics

For syntax and library breadth, start with Python basics and the modules lesson. This article goes deeper into runtime internals and operations.


1. CPython interpreter architecture

CPython is an interpreter that compiles to bytecode before execution. That front-loads parsing work and leaves a compact instruction stream for a tight eval loop.

1.1 End-to-end pipeline

  1. Read source from .py or a string
  2. Parse tokens into an AST
  3. Compile the AST to bytecode (each instruction is an opcode)
  4. Execute opcodes in the ceval loop on a stack of frames

Whether you type one line in the REPL or run a file, the mental model is the same: bytecode + frame stack.

1.2 Code objects and frames

Defining a function produces a types.CodeType object: names of locals, constants, bytecode, line-number tables, and more. At runtime, a frame references that code object and holds locals, stacks, and block state while executing.

def add(a, b):
    return a + b

print(add.__code__.co_name)       # 'add'
print(add.__code__.co_varnames)   # ('a', 'b')
print(add.__code__.co_code[:10])  # first bytes of bytecode

Why it matters: profilers, tracers, and debuggers walk these frames. Stack traces exist because exceptions unwind through this frame chain.

1.3 Inspecting bytecode with dis

The dis module renders opcodes in a readable form.

import dis

def f(x):
    return x * 2

dis.dis(f)

You should see opcodes such as LOAD_FAST, BINARY_OP, RETURN_VALUE. CPython implements a stack machine: operands are pushed/popped from an operand stack.

1.4 Object model and PyObject

“Everything is an object” is implemented in C: values flow through PyObject/PyTypeObject, with behavior dispatched via slots and dunder methods. When Python executes a + b, the interpreter searches __add__ / __radd__ paths for the concrete types.

Practical implication: tight loops of tiny Python operations pay opcode dispatch overhead. Mitigations include vectorization (NumPy), C extensions, Cython, or better data structures—validated with profiling, not guesswork.

1.5 The eval loop (ceval) and dispatch

Execution centers on the eval loop in C: a dispatch loop branches per opcode; recent versions may layer adaptive optimizations (details vary by release). From an application perspective, one Python line becomes many opcodes, which is often what you see as cost in profiles.

PEP 523 allows installing a frame evaluation hook for debuggers and experimental JITs. You rarely need this in app code, but it explains how tools attach to the interpreter.


2. The Global Interpreter Lock (GIL)

2.1 What the GIL serializes

The GIL is a mutex that ensures only one thread runs Python bytecode at a time in a standard CPython process. Multiple threads may exist, but parallel bytecode execution is not the default model.

2.2 Why it exists (high level)

Historically, the GIL simplified C API usage and reference-count updates across threads. Making refcounting thread-safe everywhere without a global lock would have required pervasive atomic operations or different object layouts at the time.

2.3 When the GIL can be released

Blocking I/O and many C extensions drop the GIL around long-running native work. While it is released, other threads can make progress—useful for I/O concurrency.

Pure Python CPU-heavy regions may hold the GIL longer, limiting parallel bytecode throughput across threads.

2.4 threading vs multiprocessing

GoalTypical choice
Concurrent I/O (many sockets/clients)asyncio, threading, or both
CPU-bound parallelismmultiprocessing, ProcessPoolExecutor, or task queues (Celery, RQ)
Native code releasing the GILThreads may scale for that section—library-dependent

2.5 Free-threading (experimental)

CPython 3.13+ introduces optional GIL-disabled builds (PEP 703). Treat them as experimental for production until your extension compatibility, single-thread performance, and operational maturity are proven.

2.6 Thread switching and sys.setswitchinterval

The interpreter may periodically switch which thread holds the GIL. sys.getswitchinterval() exposes the switch interval. This is not parallelism; it is fairness to reduce starvation when one thread runs a long CPU-bound bytecode region. Do not confuse it with multi-core scaling.


3. Memory management and garbage collection

3.1 Reference counting

CPython primarily uses reference counting: when an object’s count hits zero, it is reclaimed immediately—prompt, predictable for acyclic graphs.

3.2 Cycles and generational GC

Reference cycles never reach zero. A generational collector finds unreachable cycles. Young objects live in generation 0 and promote to older generations as they survive collections—tuning the balance between CPU spent on GC and memory retention.

import gc

class Node:
    def __init__(self):
        self.ref = None

a, b = Node(), Node()
a.ref, b.ref = b, a  # cycle

del a, b
gc.collect()

3.3 __slots__ and memory

High-cardinality instances can shrink memory by fixing attributes with __slots__ instead of per-instance __dict__. Trade-offs include less dynamism and inheritance constraints—a design decision, not a micro-optimization everywhere.

3.4 pymalloc and small-object allocation

CPython uses pymalloc for small objects. You do not configure it directly, but millions of tiny, short-lived objects can still stress caches and page faults—something profilers and allocator-oriented tools can reveal.

3.5 weakref to avoid strong cycles

Caches, graphs, and callbacks often benefit from weak references to avoid strong reference cycles at the design stage, reducing pressure on the cyclic GC.

3.6 Thresholds and the gc module

Generational collection uses thresholds (gc.get_threshold / gc.set_threshold). Adjust only when you have evidence (memory pressure, GC pauses) and validate with benchmarks—defaults are right for most workloads.


4. Import system and modules

4.1 The sys.modules cache

After the first successful import, modules live in sys.modules. Subsequent import returns the cached object—cheap and deterministic within a process.

4.2 sys.path resolution order

Imports search sys.path in order (directories, zips, editable installs, PYTHONPATH). Name clashes are resolved by first match wins. When the wrong package is imported, inspect and reorder paths before fighting symptoms.

4.3 importlib for explicit control

importlib exposes loaders and finders—handy for plugins and dynamic imports.

import importlib

m = importlib.import_module("json")
importlib.reload(m)  # dev-only hot reload; risky in production

reload can leave half-initialized state behind—avoid in services.

4.4 Packages and namespace packages

__init__.py, namespace packages (PEP 420), and multiple path roots affect how subpackages merge—critical for monorepos and plugin layouts.

4.5 Circular imports

If module A imports B while B imports A, partial initialization can expose incomplete names. Fix with dependency direction, lazy imports, or shared submodules—not band-aids.

4.6 importlib.metadata for distributions

At runtime, importlib.metadata reads installed distribution metadata (version, entry points). Common for CLIs and plugin discovery.


5. Production Python patterns

5.1 Logging, not print

Ship structured logs (often JSON), levels, and correlation IDs. print lacks backpressure, rotation, and consistent formatting.

import logging
import json

class JsonFormatter(logging.Formatter):
    def format(self, record):
        payload = {
            "level": record.levelname,
            "msg": record.getMessage(),
            "logger": record.name,
        }
        return json.dumps(payload, ensure_ascii=False)

h = logging.StreamHandler()
h.setFormatter(JsonFormatter())
logging.basicConfig(level=logging.INFO, handlers=[h])

logging.getLogger("app").info("user_login", extra={"user_id": 123})

5.2 Configuration and secrets

Follow 12-Factor guidance: configuration via environment variables, secrets from a vault, validation through typed settings (for example Pydantic Settings).

5.3 WSGI and ASGI deployment

Traditional sync stacks use Gunicorn/uWSGI (WSGI); async frameworks use Uvicorn/Hypercorn (ASGI). Tune workers/processes based on CPU cores, memory, concurrency targets, and whether native code releases the GIL during hot paths.

5.4 Observability: metrics, traces, profiles

  • Metrics: RPS, latency histograms, error rates, queue depth
  • Traces: OpenTelemetry across services
  • Profiles: cProfile, py-spy for CPU; memory profilers when needed

5.5 Reproducible dependencies and containers

Lock dependencies (uv, Poetry, pip-tools) and run containers as a non-root user with read-only filesystems where possible.

5.6 Security basics

  • Terminate TLS at the edge or ingress consistently
  • Avoid shell injection: pass argument lists to subprocess
  • Never unpickle untrusted bytes

6. Summary

CPython executes AST → bytecode → eval loop; the GIL shapes threading vs multiprocessing choices; memory combines refcount + generational GC; imports are sys.path order + sys.modules caching + package rules; production needs structured logging, validated configuration, sensible ASGI/WSGI deployment, and observability.

The Korean edition of this guide is published as python-complete-guide.