When to use hash tables?

Use when fast search is needed. Can find values in O(1) average time complexity.

How to resolve hash collisions?

Use Chaining (linked lists) or Open Addressing (find another empty slot).

Is Python dict a hash table?

Yes, Python dict is implemented as a hash table.

[2026] Hash Table | O(1) Search Data Structure Complete Guide

2026년 3월 28일 · 20분 읽기 · 수정 2026년 3월 28일 Intermediate Tutorial

이 글의 핵심

Complete guide to hash tables for coding interviews. Master hash functions, collision resolution, and Python dict usage with principles and code examples.

Introduction

What is a Hash Table?

A hash table is a data structure that stores key-value pairs, enabling search/insertion/deletion in average O(1) time complexity. O(1) means nearly constant time regardless of input size. Other names:

Hash Map
Dictionary (Python)
Associative Array
Map (C++, Java) Why is it fast? Arrays allow O(1) access by index. Hash tables use a hash function (converts key to fixed-length number/slot) to transform keys into indices, enabling array-like fast access. 아래 코드는 python를 사용한 구현 예제입니다. 코드를 직접 실행해보면서 동작을 확인해보세요.

# Array: access by index
arr = [10, 20, 30, 40, 50]
print(arr[2])  # 30 (O(1))
# Hash table: access by key
hash_table = {"apple": 100, "banana": 200}
print(hash_table[apple])  # 100 (O(1))
# Internally: hash("apple") % size → index → value

Time Complexity Comparison:

Data Structure	Search	Insert	Delete
Array (unsorted)	O(n)	O(1)	O(n)
Array (sorted)	O(log n)	O(n)	O(n)
Linked List	O(n)	O(1)	O(n)
Hash Table	O(1)	O(1)	O(1)

1. Hash Function

What is a Hash Function?

A hash function converts arbitrary-size data into a fixed-size value (hash value). Good Hash Function Properties:

✅ Deterministic: Same input → same output
✅ Fast Computation: O(1)
✅ Uniform Distribution: Hash values evenly distributed
✅ Minimize Collisions: Different inputs → different outputs (ideally)

Simple Hash Function Example

다음은 python를 활용한 상세한 구현 코드입니다. 함수를 통해 로직을 구현합니다. 각 부분의 역할을 이해하면서 코드를 살펴보시기 바랍니다.

# Method 1: Python built-in hash() + modulo
def simple_hash(key, size):
    return hash(key) % size
print(simple_hash("apple", 10))   # 8 (store at index 8)
print(simple_hash("banana", 10))  # 3 (store at index 3)
print(simple_hash("cherry", 10))  # 6 (store at index 6)
# Method 2: String hash (manual implementation)
def string_hash(s, size):
    hash_value = 0
    for char in s:
        hash_value = (hash_value * 31 + ord(char)) % size
    return hash_value
print(string_hash("apple", 10))
# Python built-in hash()
print(hash("apple"))   # -6076896708304529682 (example)
print(hash(42))        # 42 (integers hash to themselves)
print(hash((1, 2)))    # 3713081631934410656 (tuple hash)
# ❌ Mutable objects cannot be hashed
# print(hash([1, 2]))  # TypeError: unhashable type: 'list'

Hash Collision

Collision occurs when different keys produce the same hash value. 아래 코드는 python를 사용한 구현 예제입니다. 함수를 통해 로직을 구현합니다. 코드를 직접 실행해보면서 동작을 확인해보세요.

# Collision example
def bad_hash(key, size):
    return len(key) % size  # Bad hash function
print(bad_hash("apple", 10))   # 5
print(bad_hash("grape", 10))   # 5 (collision!)
print(bad_hash("peach", 10))   # 5 (collision!)

2. Python dict Usage

Basic Operations

다음은 python를 활용한 상세한 구현 코드입니다. 각 부분의 역할을 이해하면서 코드를 살펴보시기 바랍니다.

# Creation
hash_table = {}
hash_table = dict()
# Insert O(1)
hash_table[apple] = 100
hash_table[banana] = 200
hash_table[cherry] = 300
# Search O(1)
print(hash_table[apple])  # 100
print(hash_table.get("grape", 0))  # 0 (default)
# Update O(1)
hash_table[apple] = 150
print(hash_table[apple])  # 150
# Delete O(1)
del hash_table[apple]
print("apple" in hash_table)  # False
# Existence check O(1)
if "banana" in hash_table:
    print("exists")
# Iteration O(n)
for key in hash_table:
    print(key, hash_table[key])
for key, value in hash_table.items():
    print(f"{key}: {value}")
# Keys/values lists
keys = list(hash_table.keys())
values = list(hash_table.values())

3. Collision Resolution

Method 1: Chaining

Store values with same index in a linked list. 다음은 python를 활용한 상세한 구현 코드입니다. 클래스를 정의하여 데이터와 기능을 캡슐화하며. 각 부분의 역할을 이해하면서 코드를 살펴보시기 바랍니다.

class HashTableChaining:
    def __init__(self, size=10):
        self.size = size
        self.table = [[] for _ in range(size)]
    
    def _hash(self, key):
        return hash(key) % self.size
    
    def insert(self, key, value):
        idx = self._hash(key)
        
        # Update if key exists
        for i, (k, v) in enumerate(self.table[idx]):
            if k == key:
                self.table[idx][i] = (key, value)
                return
        
        # Add if not exists
        self.table[idx].append((key, value))
    
    def get(self, key):
        idx = self._hash(key)
        
        for k, v in self.table[idx]:
            if k == key:
                return v
        
        return None
# Test
ht = HashTableChaining(size=5)
ht.insert("apple", 100)
ht.insert("banana", 200)
print(ht.get("apple"))  # 100

Method 2: Open Addressing

Find another empty slot on collision. 다음은 python를 활용한 상세한 구현 코드입니다. 클래스를 정의하여 데이터와 기능을 캡슐화하며. 각 부분의 역할을 이해하면서 코드를 살펴보시기 바랍니다.

class HashTableLinearProbing:
    def __init__(self, size=10):
        self.size = size
        self.keys = [None] * size
        self.values = [None] * size
    
    def _hash(self, key):
        return hash(key) % self.size
    
    def insert(self, key, value):
        idx = self._hash(key)
        
        while self.keys[idx] is not None:
            if self.keys[idx] == key:
                self.values[idx] = value
                return
            idx = (idx + 1) % self.size
        
        self.keys[idx] = key
        self.values[idx] = value
    
    def get(self, key):
        idx = self._hash(key)
        
        while self.keys[idx] is not None:
            if self.keys[idx] == key:
                return self.values[idx]
            idx = (idx + 1) % self.size
        
        return None

4. Problem Solving

Problem 1: Two Sum

다음은 python를 활용한 상세한 구현 코드입니다. 함수를 통해 로직을 구현합니다. 각 부분의 역할을 이해하면서 코드를 살펴보시기 바랍니다.

def two_sum(arr, target):
    """
    Find indices of two numbers that sum to target
    [2, 7, 11, 15], target=9 → [0, 1]
    
    Time: O(n), Space: O(n)
    """
    seen = {}  # value: index
    
    for i, num in enumerate(arr):
        complement = target - num
        
        if complement in seen:
            return [seen[complement], i]
        
        seen[num] = i
    
    return []
# Test
arr = [2, 7, 11, 15]
print(two_sum(arr, 9))   # [0, 1]
print(two_sum(arr, 13))  # [0, 2]

Problem 2: Group Anagrams

다음은 python를 활용한 상세한 구현 코드입니다. 함수를 통해 로직을 구현합니다. 각 부분의 역할을 이해하면서 코드를 살펴보시기 바랍니다.

def group_anagrams(strs):
    """
    Group anagrams together
    ["eat", "tea", "tan", "ate", "nat", "bat"]
    → [["eat", "tea", "ate"], ["tan", "nat"], [bat]]
    
    Time: O(n * k log k), Space: O(n * k)
    """
    anagrams = {}
    
    for word in strs:
        key = '.join(sorted(word))
        
        if key not in anagrams:
            anagrams[key] = []
        anagrams[key].append(word)
    
    return list(anagrams.values())
# Test
strs = ["eat", "tea", "tan", "ate", "nat", "bat"]
print(group_anagrams(strs))
# [['eat', 'tea', 'ate'], ['tan', 'nat'], ['bat']]

Problem 3: Longest Substring Without Repeating Characters

다음은 python를 활용한 상세한 구현 코드입니다. 함수를 통해 로직을 구현합니다. 각 부분의 역할을 이해하면서 코드를 살펴보시기 바랍니다.

def length_of_longest_substring(s):
    """
    "abcabcbb" → 3 ("abc")
    "bbbbb" → 1 ("b")
    
    Time: O(n), Space: O(min(n, m))
    """
    char_index = {}
    max_length = 0
    start = 0
    
    for i, char in enumerate(s):
        if char in char_index and char_index[char] >= start:
            start = char_index[char] + 1
        
        char_index[char] = i
        max_length = max(max_length, i - start + 1)
    
    return max_length
# Test
print(length_of_longest_substring("abcabcbb"))  # 3
print(length_of_longest_substring("bbbbb"))     # 1
print(length_of_longest_substring("pwwkew"))    # 3

5. Python Collections Module

Counter

다음은 python를 활용한 상세한 구현 코드입니다. 필요한 모듈을 import하고. 각 부분의 역할을 이해하면서 코드를 살펴보시기 바랍니다.

from collections import Counter
# Create
counter = Counter([1, 2, 2, 3, 3, 3])
print(counter)  # Counter({3: 3, 2: 2, 1: 1})
# String count
text = "hello world"
counter = Counter(text)
print(counter)  # Counter({'l': 3, 'o': 2, ...})
# most_common(): most frequent elements
print(counter.most_common(3))  # [('l', 3), ('o', 2), ('h', 1)]
# Operations
c1 = Counter(['a', 'b', 'c', 'a'])
c2 = Counter(['a', 'b', 'b'])
print(c1 + c2)  # Counter({'a': 3, 'b': 3, 'c': 1})
print(c1 - c2)  # Counter({'a': 1, 'c': 1})
print(c1 & c2)  # Counter({'a': 1, 'b': 1}) (intersection)
print(c1 | c2)  # Counter({'a': 2, 'b': 2, 'c': 1}) (union)

defaultdict

아래 코드는 python를 사용한 구현 예제입니다. 필요한 모듈을 import하고. 각 부분의 역할을 이해하면서 코드를 살펴보시기 바랍니다.

from collections import defaultdict
# int default (0)
word_count = defaultdict(int)
for word in ["apple", "banana", "apple"]:
    word_count[word] += 1
print(dict(word_count))  # {'apple': 2, 'banana': 1}
# list default ([])
groups = defaultdict(list)
for name, grade in [("Alice", "A"), ("Bob", "B"), ("Charlie", "A")]:
    groups[grade].append(name)
print(dict(groups))  # {'A': ['Alice', 'Charlie'], 'B': ['Bob']}

LRU Cache Implementation

다음은 python를 활용한 상세한 구현 코드입니다. 필요한 모듈을 import하고, 클래스를 정의하여 데이터와 기능을 캡슐화하며. 각 부분의 역할을 이해하면서 코드를 살펴보시기 바랍니다.

from collections import OrderedDict
class LRUCache:
    def __init__(self, capacity):
        self.cache = OrderedDict()
        self.capacity = capacity
    
    def get(self, key):
        if key not in self.cache:
            return -1
        
        self.cache.move_to_end(key)
        return self.cache[key]
    
    def put(self, key, value):
        if key in self.cache:
            self.cache.move_to_end(key)
        
        self.cache[key] = value
        
        if len(self.cache) > self.capacity:
            self.cache.popitem(last=False)
# Test
cache = LRUCache(3)
cache.put("a", 1)
cache.put("b", 2)
cache.put("c", 3)
cache.get("a")
cache.put("d", 4)  # Removes 'b'

6. Usage Patterns

Pattern 1: Frequency Counting

아래 코드는 python를 사용한 구현 예제입니다. 필요한 모듈을 import하고, 함수를 통해 로직을 구현합니다. 각 부분의 역할을 이해하면서 코드를 살펴보시기 바랍니다.

def char_frequency(s):
    freq = {}
    for char in s:
        freq[char] = freq.get(char, 0) + 1
    return freq
print(char_frequency("hello"))  # {'h': 1, 'e': 1, 'l': 2, 'o': 1}
# Using Counter
from collections import Counter
print(dict(Counter("hello")))

Pattern 2: Grouping

아래 코드는 python를 사용한 구현 예제입니다. 필요한 모듈을 import하고. 각 부분의 역할을 이해하면서 코드를 살펴보시기 바랍니다.

from collections import defaultdict
words = ["apple", "pie", "banana", "cat", "dog", "cherry"]
groups = defaultdict(list)
for word in words:
    groups[len(word)].append(word)
print(dict(groups))
# {5: ['apple'], 3: ['pie', 'cat', 'dog'], 6: ['banana', 'cherry']}

Pattern 3: Caching (Memoization)

다음은 python를 활용한 상세한 구현 코드입니다. 필요한 모듈을 import하고, 함수를 통해 로직을 구현합니다. 각 부분의 역할을 이해하면서 코드를 살펴보시기 바랍니다.

# Fibonacci with caching
def fib_fast(n, cache={}):
    if n in cache:
        return cache[n]
    
    if n <= 1:
        return n
    
    cache[n] = fib_fast(n-1, cache) + fib_fast(n-2, cache)
    return cache[n]
print(fib_fast(30))  # 832040 (fast)
# Using functools.lru_cache (recommended)
from functools import lru_cache
@lru_cache(maxsize=None)
def fib(n):
    if n <= 1:
        return n
    return fib(n-1) + fib(n-2)
print(fib(30))  # 832040 (very fast)

Pattern 4: Index Mapping

아래 코드는 python를 사용한 구현 예제입니다. 함수를 통해 로직을 구현합니다. 각 부분의 역할을 이해하면서 코드를 살펴보시기 바랍니다.

def find_indices(arr, target):
    """Find all indices of target value"""
    index_map = {}
    
    for i, num in enumerate(arr):
        if num not in index_map:
            index_map[num] = []
        index_map[num].append(i)
    
    return index_map.get(target, [])
arr = [1, 2, 3, 2, 4, 2, 5]
print(find_indices(arr, 2))  # [1, 3, 5]

7. Common Mistakes

Mistake 1: Using Mutable Objects as Keys

아래 코드는 python를 사용한 구현 예제입니다. 코드를 직접 실행해보면서 동작을 확인해보세요.

# ❌ Wrong
# hash_table = {[1, 2]: "value"}  # TypeError: unhashable type: 'list'
# ✅ Correct
hash_table = {(1, 2): "value"}  # Use tuple
print(hash_table[(1, 2)])  # value

Mistake 2: Missing KeyError Handling

아래 코드는 python를 사용한 구현 예제입니다. 각 부분의 역할을 이해하면서 코드를 살펴보시기 바랍니다.

# ❌ Wrong
hash_table = {"a": 1}
# print(hash_table[b])  # KeyError: 'b'
# ✅ Correct 1: Use get()
value = hash_table.get("b", 0)
print(value)  # 0
# ✅ Correct 2: Check with in
if "b" in hash_table:
    print(hash_table[b])
else:
    print("Key not found")

Mistake 3: Modifying During Iteration

다음은 python를 활용한 상세한 구현 코드입니다. 각 부분의 역할을 이해하면서 코드를 살펴보시기 바랍니다.

# ❌ Wrong
hash_table = {"a": 1, "b": 2, "c": 3}
# for key in hash_table:
#     if hash_table[key] == 2:
#         del hash_table[key]  # RuntimeError
# ✅ Correct 1: Convert to list
hash_table = {"a": 1, "b": 2, "c": 3}
for key in list(hash_table.keys()):
    if hash_table[key] == 2:
        del hash_table[key]
# ✅ Correct 2: Dictionary comprehension
hash_table = {"a": 1, "b": 2, "c": 3}
hash_table = {k: v for k, v in hash_table.items() if v != 2}

8. Performance Optimization

Load Factor Management

다음은 python를 활용한 상세한 구현 코드입니다. 클래스를 정의하여 데이터와 기능을 캡슐화하며. 각 부분의 역할을 이해하면서 코드를 살펴보시기 바랍니다.

class DynamicHashTable:
    def __init__(self, initial_size=8):
        self.size = initial_size
        self.count = 0
        self.table = [[] for _ in range(self.size)]
    
    def _load_factor(self):
        return self.count / self.size
    
    def _resize(self):
        """Double size when load factor > 0.7"""
        old_table = self.table
        self.size *= 2
        self.table = [[] for _ in range(self.size)]
        self.count = 0
        
        for bucket in old_table:
            for key, value in bucket:
                self.insert(key, value)
    
    def insert(self, key, value):
        if self._load_factor() > 0.7:
            self._resize()
        
        idx = self._hash(key)
        
        for i, (k, v) in enumerate(self.table[idx]):
            if k == key:
                self.table[idx][i] = (key, value)
                return
        
        self.table[idx].append((key, value))
        self.count += 1

Summary

Key Points

Hash Table: Key-value storage, average O(1) search/insert/delete
Hash Function: Convert key to index, uniform distribution important
Collision Resolution:
- Chaining: Store in linked list
- Open Addressing: Find another empty slot
Python Tools:
- dict: Basic hash table
- Counter: Frequency counting
- defaultdict: Auto-create default values
- OrderedDict: Maintain order + extra features

When to Use

Situation	Data Structure	Time Complexity
Fast search	Hash Table	O(1)
Frequency count	Counter	O(n)
Duplicate check	set	O(n)
Grouping	defaultdict	O(n)
Caching	dict / LRU Cache	O(1)

Time Complexity Improvement Pattern

다음은 python를 활용한 상세한 구현 코드입니다. 함수를 통해 로직을 구현합니다. 각 부분의 역할을 이해하면서 코드를 살펴보시기 바랍니다.

# Before: O(n²) - nested loops
def has_duplicate_slow(arr):
    for i in range(len(arr)):
        for j in range(i+1, len(arr)):
            if arr[i] == arr[j]:
                return True
    return False
# After: O(n) - hash table
def has_duplicate_fast(arr):
    seen = set()
    for num in arr:
        if num in seen:
            return True
        seen.add(num)
    return False
# Or simpler
def has_duplicate(arr):
    return len(arr) != len(set(arr))

Keywords

Hash Table, HashMap, Dictionary, Data Structure, O(1) Search, Hash Function, Collision Resolution, Chaining, Open Addressing, Counter, defaultdict, Coding Interview, Algorithm

이 글의 핵심

Introduction

What is a Hash Table?

1. Hash Function

What is a Hash Function?

Simple Hash Function Example

Hash Collision

2. Python dict Usage

Basic Operations

3. Collision Resolution

Method 1: Chaining

Method 2: Open Addressing

4. Problem Solving

Problem 1: Two Sum

Problem 2: Group Anagrams

Problem 3: Longest Substring Without Repeating Characters

5. Python Collections Module

Counter

defaultdict

LRU Cache Implementation

6. Usage Patterns

Pattern 1: Frequency Counting

Pattern 2: Grouping

Pattern 3: Caching (Memoization)

Pattern 4: Index Mapping

7. Common Mistakes

Mistake 1: Using Mutable Objects as Keys

Mistake 2: Missing KeyError Handling

Mistake 3: Modifying During Iteration

8. Performance Optimization

Load Factor Management

Summary

Key Points

When to Use

Time Complexity Improvement Pattern

Recommended Problems

Beginner

Intermediate

Advanced

Related Posts

Keywords