If sorting not needed, always use unordered_set?

Average lookup is close to O(1) but varies by hash/collision. Consider worst case, memory, and traversal needs together.

How to insert custom types?

unordered_set needs hash and equality comparison, set needs comparison operator (or custom comparator). Define consistently.

Can I delete during iteration?

Iterator invalidates if rules not followed. Use safe idioms like erase return iterator pattern or copy-then-delete.

How to do set operations (union/intersection)?

Apply STL algorithms like set_union to sorted sequences, or choose method of iterating one side and querying the other depending on size.

[2026] C++ set/unordered_set | Duplicate Removal Complete Guide

2026년 3월 12일 · 17분 읽기 · 수정 2026년 3월 30일 Beginner Tutorial

이 글의 핵심

set·unordered_set performance comparison, multiset, custom comparator·hash, practical set operations, iterator invalidation guide. Learn when to use set vs unordered_set with real-world examples.

set vs unordered_set

Feature	set	unordered_set
Sorting	O (auto-sorted)	X
Speed	O(log n)	O(1)
Duplicates	Not allowed	Not allowed
Traversal	Sorted order	Random

set vs unordered_set Performance Comparison

From time complexity (average/amortized) perspective, set is based on balanced binary search tree (typically red-black), so insert/delete/search is O(log n). unordered_set is a hash table, so average O(1), but with many hash collisions can approach worst case O(n). Practical differences are:

Small n·order needed: If element count is hundreds~thousands and need sorted traversal or range search like lower_bound, set is simpler.
Large n·key lookup only: For millions+ where you only need fast “exists/not exists” check and order doesn’t matter, unordered_set is often advantageous.
Memory: Hash table has additional overhead from buckets/load factor, tree has pointer overhead per node. Varies by n and key size, so measurement is safe.
Cache locality: Neither uses contiguous memory, both have frequent skip access causing cache misses. Hard to say “unordered is always faster” based on traversal alone. Selection Summary: If need order/range search use set, if keys are hash-friendly (integers, well-distributed strings) and mostly lookups, consider unordered_set first.

multiset and unordered_multiset

set / unordered_set store only one of each key. If need same value multiple times, use multiset, unordered_multiset.

Feature	multiset	unordered_multiset
Order	Maintains sort (same value insertion order is implementation-defined)	No order
Lookup	`count`, `equal_range` for same key range	`count`, `equal_range`
Delete	`erase(key)` removes all of that key, use iterator to remove one	Same
다음은 cpp를 활용한 상세한 구현 코드입니다. 필요한 모듈을 import하고, 반복문으로 데이터를 처리합니다. 각 부분의 역할을 이해하면서 코드를 살펴보시기 바랍니다.

#include <set>
#include <unordered_set>
#include <iostream>
// 변수 선언 및 초기화
int main() {
    std::multiset<int> ms{1, 1, 2, 2, 2};
    std::cout << ms.count(2) << '\n';  // 3
    auto [first, last] = ms.equal_range(2);
    for (auto it = first; it != last; ++it) { /* ....*/ }
    std::unordered_multiset<std::string> ums;
    ums.insert("a"); ums.insert("a");
    std::cout << ums.count("a") << '\n';  // 2
}

When only top k needed, multiset maintains sorted state making it easy to handle max/min at ends. If only counting frequency and order not needed, unordered_multiset or unordered_map<T, int> is often better.

Custom Comparator and Hash Function

set: `Compare` and `operator<`

set sorting criterion must satisfy strict weak ordering. Can use operator< or std::less for custom types. 아래 코드는 cpp를 사용한 구현 예제입니다. 클래스를 정의하여 데이터와 기능을 캡슐화하며. 각 부분의 역할을 이해하면서 코드를 살펴보시기 바랍니다.

struct Person {
    std::string name;
    int id;
};
struct ById {
    bool operator()(const Person& a, const Person& b) const {
        return a.id < b.id;
    }
};
std::set<Person, ById> by_id_set;

unordered_set: `Hash` and `KeyEqual`

In unordered_set<Key, Hash, KeyEqual>, same key must always produce same hash, and KeyEqual determines actual equality on hash collision. Custom types typically provide both. 아래 코드는 cpp를 사용한 구현 예제입니다. 클래스를 정의하여 데이터와 기능을 캡슐화하며, 에러 처리를 통해 안정성을 확보합니다. 각 부분의 역할을 이해하면서 코드를 살펴보시기 바랍니다.

// 타입 정의
struct Point {
    int x, y;
    bool operator==(const Point& o) const { return x == o.x && y == o.y; }
};
struct PointHash {
    std::size_t operator()(const Point& p) const noexcept {
        // Simple combination example (consider boost::hash_combine for projects)
        return std::hash<int>{}(p.x) ^ (std::hash<int>{}(p.y) << 1);
    }
};
std::unordered_set<Point, PointHash> pts;

Caution: When putting strings in unordered_set, bad custom hash causes severe performance drop. If key is std::string, typically use default hash.

Practice: Duplicate Removal·Intersection·Union Summary

Duplicate removal only with original order preservation: Putting in set then moving to vector changes order. If order preservation needed, consider sort then unique or unordered_set + separate order array.
Intersection·union·difference: Above set_intersection / set_union / set_difference patterns require both ranges sorted. Using set automatically satisfies sorted order via iterators.
unordered_set operations: If library doesn’t provide one-shot operations, collect one side into vector and sort, or iterate smaller side and check inclusion in other with count/find (choose better algorithm by size).

Iterator Invalidation

set / multiset

insert / emplace: Existing element iterators and references not invalidated.
erase(it): Only that iterator invalid. Other iterators maintained.
erase(key) / clear: Iterators to deleted elements invalid. unordered_set / unordered_multiset
Rehash changes bucket structure, all iterators may be invalid. insert exceeding load factor can trigger rehash, so indiscriminate insertion during iteration is dangerous.
reserve(n) (or appropriate bucket count reservation) reduces rehash frequency, making iterator invalidation less frequent.
erase: Only iterator at deleted position invalid, other iterators usually valid after C++11, but implementation-dependent, so using erase’s return iterator pattern is safe. 다음은 간단한 cpp 코드 예제입니다. 반복문으로 데이터를 처리합니다, 조건문으로 분기 처리를 수행합니다. 코드를 직접 실행해보면서 동작을 확인해보세요.

for (auto it = s.begin(); it != s.end(); ) {
    if (condition) it = s.erase(it);  // erase returns next iterator
    else ++it;
}

set Basic Usage

다음은 cpp를 활용한 상세한 구현 코드입니다. 필요한 모듈을 import하고, 반복문으로 데이터를 처리합니다, 조건문으로 분기 처리를 수행합니다. 각 부분의 역할을 이해하면서 코드를 살펴보시기 바랍니다.

#include <set>
#include <iostream>
using namespace std;
int main() {
    set<int> s;
    
    // Insert
    s.insert(3);
    s.insert(1);
    s.insert(2);
    s.insert(1);  // Duplicate ignored
    
    // Traverse (auto-sorted: 1, 2, 3)
    for (int x : s) {
        cout << x << " ";
    }
    
    // Search
    if (s.find(2) != s.end()) {
        cout << "\n2 exists" << endl;
    }
    
    // Delete
    s.erase(2);
    
    // Size
    cout << "Size: " << s.size() << endl;
    
    return 0;
}

unordered_set Basic Usage

#include <unordered_set>
#include <iostream>
using namespace std;
int main() {
    unordered_set<string> words;
    
    words.insert("apple");
    words.insert("banana");
    words.insert("apple");  // Duplicate ignored
    
    // Check existence (fast)
    if (words.count("apple") > 0) {
        cout << "apple exists" << endl;
    }
    
    // Traverse (order not guaranteed)
    for (const string& word : words) {
        cout << word << " ";
    }
    
    return 0;
}

Practical Examples

Example 1: Remove Duplicates from Array

다음은 cpp를 활용한 상세한 구현 코드입니다. 필요한 모듈을 import하고, 반복문으로 데이터를 처리합니다. 각 부분의 역할을 이해하면서 코드를 살펴보시기 바랍니다.

#include <iostream>
#include <vector>
#include <set>
using namespace std;
vector<int> removeDuplicates(const vector<int>& arr) {
    set<int> s(arr.begin(), arr.end());
    return vector<int>(s.begin(), s.end());
}
int main() {
    vector<int> arr = {5, 2, 8, 2, 9, 5, 1, 8};
    
    cout << "Original: ";
    for (int x : arr) cout << x << " ";
    
    vector<int> unique = removeDuplicates(arr);
    
    cout << "\nDuplicate removed (sorted): ";
    for (int x : unique) cout << x << " ";
    
    return 0;
}

Explanation: Most basic pattern utilizing set’s auto-sorting and duplicate removal.

Example 2: Intersection/Union of Two Arrays

#include <iostream>
#include <set>
#include <algorithm>
using namespace std;
int main() {
    set<int> a = {1, 2, 3, 4, 5};
    set<int> b = {3, 4, 5, 6, 7};
    
    // Intersection
    set<int> intersection;
    set_intersection(a.begin(), a.end(),
                     b.begin(), b.end(),
                     inserter(intersection, intersection.begin()));
    
    cout << "Intersection: ";
    for (int x : intersection) cout << x << " ";  // 3 4 5
    
    // Union
    set<int> union_set;
    set_union(a.begin(), a.end(),
              b.begin(), b.end(),
              inserter(union_set, union_set.begin()));
    
    cout << "\nUnion: ";
    for (int x : union_set) cout << x << " ";  // 1 2 3 4 5 6 7
    
    // Difference
    set<int> difference;
    set_difference(a.begin(), a.end(),
                   b.begin(), b.end(),
                   inserter(difference, difference.begin()));
    
    cout << "\nDifference (A-B): ";
    for (int x : difference) cout << x << " ";  // 1 2
    
    return 0;
}

Explanation: Can implement mathematical set operations directly. Frequently used in algorithm problems.

Example 3: Visit Check (Graph Traversal)

#include <iostream>
#include <unordered_set>
#include <queue>
#include <vector>
using namespace std;
void bfs(int start, vector<vector<int>>& graph) {
    unordered_set<int> visited;
    queue<int> q;
    
    q.push(start);
    visited.insert(start);
    
    while (!q.empty()) {
        int node = q.front();
        q.pop();
        
        cout << node << " ";
        
        for (int neighbor : graph[node]) {
            if (visited.count(neighbor) == 0) {
                visited.insert(neighbor);
                q.push(neighbor);
            }
        }
    }
}
int main() {
    // Graph: 0-1, 0-2, 1-3, 2-3
    vector<vector<int>> graph = {
        {1, 2},    // Node 0's neighbors
        {0, 3},    // Node 1's neighbors
        {0, 3},    // Node 2's neighbors
        {1, 2}     // Node 3's neighbors
    };
    
    cout << "BFS traversal: ";
    bfs(0, graph);
    
    return 0;
}

Explanation: Use unordered_set to check visited status at O(1) speed. Essential pattern for graph algorithms.

Common Problems

Problem 1: Cannot Modify set Elements

Symptom: Compile error when trying to modify set element directly Cause: set elements are const to maintain sorting Solution: 다음은 cpp를 활용한 상세한 구현 코드입니다. 클래스를 정의하여 데이터와 기능을 캡슐화하며, 비동기 처리를 통해 효율적으로 작업을 수행합니다, 에러 처리를 통해 안정성을 확보합니다, 조건문으로 분기 처리를 수행합니다. 각 부분의 역할을 이해하면서 코드를 살펴보시기 바랍니다.

// ❌ Wrong code
set<int> s = {1, 2, 3};
auto it = s.find(2);
*it = 5;  // Compile error! const int&
// ✅ Correct code (delete then reinsert)
set<int> s = {1, 2, 3};
auto it = s.find(2);
if (it != s.end()) {
    s.erase(it);
    s.insert(5);
}
// ✅ For structs (use mutable)
struct Item {
    int id;
    mutable int count;  // mutable can be modified
    
    bool operator<(const Item& other) const {
        return id < other.id;
    }
};
set<Item> items;
auto it = items.find(Item{1, 0});
if (it != items.end()) {
    it->count++;  // OK (mutable)
}

Problem 2: Custom Comparison Function

Symptom: Compile error when inserting custom type into set Cause: Needs operator< or comparison function Solution: 다음은 cpp를 활용한 상세한 구현 코드입니다. 클래스를 정의하여 데이터와 기능을 캡슐화하며, 에러 처리를 통해 안정성을 확보합니다, 조건문으로 분기 처리를 수행합니다. 각 부분의 역할을 이해하면서 코드를 살펴보시기 바랍니다.

// ❌ Compile error
struct Point {
    int x, y;
};
set<Point> points;  // Error! No operator<
// ✅ Method 1: Define operator<
struct Point {
    int x, y;
    
    bool operator<(const Point& other) const {
        if (x != other.x) return x < other.x;
        return y < other.y;
    }
};
set<Point> points;  // OK
// ✅ Method 2: Comparison function object
struct PointCompare {
    bool operator()(const Point& a, const Point& b) const {
        if (a.x != b.x) return a.x < b.x;
        return a.y < b.y;
    }
};
set<Point, PointCompare> points;  // OK
// ✅ Method 3: Lambda (C++11)
auto cmp = [](const Point& a, const Point& b) {
    if (a.x != b.x) return a.x < b.x;
    return a.y < b.y;
};
set<Point, decltype(cmp)> points(cmp);  // OK

Problem 3: Confusion with multiset

Symptom: Want to allow duplicates but set doesn’t Cause: set doesn’t allow duplicates, multiset does Solution: 다음은 cpp를 활용한 상세한 구현 코드입니다. 필요한 모듈을 import하고, 조건문으로 분기 처리를 수행합니다. 각 부분의 역할을 이해하면서 코드를 살펴보시기 바랍니다.

// ❌ set doesn't allow duplicates
set<int> s;
s.insert(1);
s.insert(1);
s.insert(1);
cout << s.size();  // 1 (duplicates ignored)
// ✅ Use multiset
#include <set>
multiset<int> ms;
ms.insert(1);
ms.insert(1);
ms.insert(1);
cout << ms.size();  // 3 (duplicates allowed)
// Count specific value
cout << ms.count(1);  // 3
// Delete all of specific value
ms.erase(1);  // All 1s deleted
// Delete only one
auto it = ms.find(1);
if (it != ms.end()) {
    ms.erase(it);  // Delete only one
}

Performance Optimization

Optimization Strategies

Choose efficient data structure
- How: Use appropriate STL container for situation
- Effect: Improve time complexity
Prevent unnecessary copying
- How: Use reference passing
- Effect: Reduce memory usage
Compiler optimization
- How: Use -O2, -O3 flags
- Effect: Improve execution speed

Benchmark Results

Method	Execution Time	Memory Usage	Notes
Basic implementation	100ms	10MB	-
Optimization 1	80ms	8MB	Reference passing
Optimization 2	50ms	5MB	STL algorithms
Conclusion: Appropriate optimization can improve performance 2x or more
Caution: Absolute time/memory numbers in above table are examples. In real services, remeasure before/after with same input and compile options.

Advanced: `reserve`·`rehash`·Load Distribution

unordered_set triggers rehash when exceeding load factor, at which point all iterators may be invalid. If approximate element count known before insertion:

std::unordered_set<int> s;
s.reserve(1'000'000); // Reduce rehash frequency, stabilize benchmark

`set` doesn’t have `reserve`, but reducing unnecessary copies/temporary objects helps perceived performance.

Advanced: Heterogeneous Lookup (C++20, `unordered_set`)

When key is std::string, to prevent find("literal") from creating temporary std::string, need custom policy with transparent hash/transparent equality (std::hash<std::string_view>, std::equal_to<>). Standard specialization varies by implementation/version, so check project’s standard library documentation.

// Concept example: lookup with string_view (must verify project policy/standard version)
// std::unordered_set<std::string, SVHash, SVEqual> etc.

Advanced: `node_type` / `extract` (C++17)

set/unordered_set can extract nodes and move to other containers, used for patterns that update keys without reallocation. 아래 코드는 cpp를 사용한 구현 예제입니다. 조건문으로 분기 처리를 수행합니다. 코드를 직접 실행해보면서 동작을 확인해보세요.

std::set<int> a{1, 2, 3};
auto nh = a.extract(2);
if (!nh.empty()) {
    nh.value() = 5;
    a.insert(std::move(nh));
}

`unordered_set` similar but watch rehash timing.

Advanced: Benchmark Method (Practically)

Fix: Compiler optimization (-O2/-O3), CPU fixed frequency (if possible), same seed.
Warmup: Start timer after cache/hash table warmup.
Metrics: Not just average but p95/p99, especially for unordered_* include worst-case hash input as separate case. 아래 코드는 cpp를 사용한 구현 예제입니다. 필요한 모듈을 import하고, 반복문으로 데이터를 처리합니다. 코드를 직접 실행해보면서 동작을 확인해보세요.

#include <chrono>
template<typename F>
auto bench(F&& f, int reps) {
    using clock = std::chrono::steady_clock;
    auto t0 = clock::now();
    for (int i = 0; i < reps; ++i) f();
    return clock::now() - t0;
}

Advanced: Debugging Guide

Symptom	Check
Slow in `unordered_set`	Hash quality, key collision, load factor, wrong `operator==`
Abnormal behavior during iteration	Rehash invalidates iterators — use `reserve` or copy-then-iterate
`set` sorting is strange	Check if comparison operation satisfies strict weak ordering

Advanced: Common Mistake Patterns (Additional)

Using vector as key in unordered_set with only default hash → Hash inefficient or definition missing. Use immutable key as identifier or define custom hash.
Wrong iterator increment during erase while iterating → Use it = container.erase(it) pattern.
Assuming const member read-only is safe in multithreading → unordered_set needs synchronization even for reads (read-only snapshot or shared_mutex design).

FAQ

Q1: Can beginners learn this?

A: Yes, this guide is written for beginners. Basic C++ syntax knowledge is sufficient.

Q2: Frequently used in practice?

A: Yes, very frequently. Essential concept in production projects.

Q3: Comparison with other languages?

A: C++ advantage is performance and control. Faster than Python, more flexible than Java.

Q4: How long to learn?

A: Basic concepts 1-2 hours, mastery about 1-2 weeks.

Q5: Recommended learning order?

Learn basic syntax
Follow simple examples
Apply to practical projects
Learn advanced techniques

Q6: Common mistakes?

Not initializing
Memory management mistakes
Not considering time complexity
Missing exception handling

Posts connected to this topic.

Practical Tips

Tips you can apply immediately in practice.

Debugging Tips

Check compiler warnings first when problems occur
Reproduce problem with simple test cases
Use print debugging to trace execution

Performance Tips

Choose correct data structure for use case
Measure before optimizing
Profile to find bottlenecks

Code Review Tips

Pre-check frequently pointed out parts
Follow team coding conventions
Write clear variable names

Practical Checklist

When Choosing Container

Need sorted order?
Need range queries (lower_bound)?
Only existence check?
What is expected element count?

When Using Custom Types

Defined operator< or comparator?
Defined hash and operator==?
Tested with sample data?

When Iterating

Modifying container during iteration?
Using safe erase pattern?
Considered iterator invalidation?

Keywords

C++, set, unordered_set, Set, STL, Duplicate Removal, Hash Table, Binary Search Tree, Iterator Invalidation, Custom Comparator, multiset