[2026] C++ set/unordered_set | Duplicate Removal Complete Guide

[2026] C++ set/unordered_set | Duplicate Removal Complete Guide

이 글의 핵심

set·unordered_set performance comparison, multiset, custom comparator·hash, practical set operations, iterator invalidation guide. Learn when to use set vs unordered_set with real-world examples.

set vs unordered_set

Featuresetunordered_set
SortingO (auto-sorted)X
SpeedO(log n)O(1)
DuplicatesNot allowedNot allowed
TraversalSorted orderRandom

set vs unordered_set Performance Comparison

From time complexity (average/amortized) perspective, set is based on balanced binary search tree (typically red-black), so insert/delete/search is O(log n). unordered_set is a hash table, so average O(1), but with many hash collisions can approach worst case O(n). Practical differences are:

  • Small n·order needed: If element count is hundreds~thousands and need sorted traversal or range search like lower_bound, set is simpler.
  • Large n·key lookup only: For millions+ where you only need fast “exists/not exists” check and order doesn’t matter, unordered_set is often advantageous.
  • Memory: Hash table has additional overhead from buckets/load factor, tree has pointer overhead per node. Varies by n and key size, so measurement is safe.
  • Cache locality: Neither uses contiguous memory, both have frequent skip access causing cache misses. Hard to say “unordered is always faster” based on traversal alone. Selection Summary: If need order/range search use set, if keys are hash-friendly (integers, well-distributed strings) and mostly lookups, consider unordered_set first.

multiset and unordered_multiset

set / unordered_set store only one of each key. If need same value multiple times, use multiset, unordered_multiset.

Featuremultisetunordered_multiset
OrderMaintains sort (same value insertion order is implementation-defined)No order
Lookupcount, equal_range for same key rangecount, equal_range
Deleteerase(key) removes all of that key, use iterator to remove oneSame
다음은 cpp를 활용한 상세한 구현 코드입니다. 필요한 모듈을 import하고, 반복문으로 데이터를 처리합니다. 각 부분의 역할을 이해하면서 코드를 살펴보시기 바랍니다.
#include <set>
#include <unordered_set>
#include <iostream>
// 변수 선언 및 초기화
int main() {
    std::multiset<int> ms{1, 1, 2, 2, 2};
    std::cout << ms.count(2) << '\n';  // 3
    auto [first, last] = ms.equal_range(2);
    for (auto it = first; it != last; ++it) { /* ....*/ }
    std::unordered_multiset<std::string> ums;
    ums.insert("a"); ums.insert("a");
    std::cout << ums.count("a") << '\n';  // 2
}

When only top k needed, multiset maintains sorted state making it easy to handle max/min at ends. If only counting frequency and order not needed, unordered_multiset or unordered_map<T, int> is often better.

Custom Comparator and Hash Function

set: Compare and operator<

set sorting criterion must satisfy strict weak ordering. Can use operator< or std::less for custom types. 아래 코드는 cpp를 사용한 구현 예제입니다. 클래스를 정의하여 데이터와 기능을 캡슐화하며. 각 부분의 역할을 이해하면서 코드를 살펴보시기 바랍니다.

struct Person {
    std::string name;
    int id;
};
struct ById {
    bool operator()(const Person& a, const Person& b) const {
        return a.id < b.id;
    }
};
std::set<Person, ById> by_id_set;

unordered_set: Hash and KeyEqual

In unordered_set<Key, Hash, KeyEqual>, same key must always produce same hash, and KeyEqual determines actual equality on hash collision. Custom types typically provide both. 아래 코드는 cpp를 사용한 구현 예제입니다. 클래스를 정의하여 데이터와 기능을 캡슐화하며, 에러 처리를 통해 안정성을 확보합니다. 각 부분의 역할을 이해하면서 코드를 살펴보시기 바랍니다.

// 타입 정의
struct Point {
    int x, y;
    bool operator==(const Point& o) const { return x == o.x && y == o.y; }
};
struct PointHash {
    std::size_t operator()(const Point& p) const noexcept {
        // Simple combination example (consider boost::hash_combine for projects)
        return std::hash<int>{}(p.x) ^ (std::hash<int>{}(p.y) << 1);
    }
};
std::unordered_set<Point, PointHash> pts;

Caution: When putting strings in unordered_set, bad custom hash causes severe performance drop. If key is std::string, typically use default hash.

Practice: Duplicate Removal·Intersection·Union Summary

  • Duplicate removal only with original order preservation: Putting in set then moving to vector changes order. If order preservation needed, consider sort then unique or unordered_set + separate order array.
  • Intersection·union·difference: Above set_intersection / set_union / set_difference patterns require both ranges sorted. Using set automatically satisfies sorted order via iterators.
  • unordered_set operations: If library doesn’t provide one-shot operations, collect one side into vector and sort, or iterate smaller side and check inclusion in other with count/find (choose better algorithm by size).

Iterator Invalidation

set / multiset

  • insert / emplace: Existing element iterators and references not invalidated.
  • erase(it): Only that iterator invalid. Other iterators maintained.
  • erase(key) / clear: Iterators to deleted elements invalid. unordered_set / unordered_multiset
  • Rehash changes bucket structure, all iterators may be invalid. insert exceeding load factor can trigger rehash, so indiscriminate insertion during iteration is dangerous.
  • reserve(n) (or appropriate bucket count reservation) reduces rehash frequency, making iterator invalidation less frequent.
  • erase: Only iterator at deleted position invalid, other iterators usually valid after C++11, but implementation-dependent, so using erase’s return iterator pattern is safe. 다음은 간단한 cpp 코드 예제입니다. 반복문으로 데이터를 처리합니다, 조건문으로 분기 처리를 수행합니다. 코드를 직접 실행해보면서 동작을 확인해보세요.
for (auto it = s.begin(); it != s.end(); ) {
    if (condition) it = s.erase(it);  // erase returns next iterator
    else ++it;
}

set Basic Usage

다음은 cpp를 활용한 상세한 구현 코드입니다. 필요한 모듈을 import하고, 반복문으로 데이터를 처리합니다, 조건문으로 분기 처리를 수행합니다. 각 부분의 역할을 이해하면서 코드를 살펴보시기 바랍니다.

#include <set>
#include <iostream>
using namespace std;
int main() {
    set<int> s;
    
    // Insert
    s.insert(3);
    s.insert(1);
    s.insert(2);
    s.insert(1);  // Duplicate ignored
    
    // Traverse (auto-sorted: 1, 2, 3)
    for (int x : s) {
        cout << x << " ";
    }
    
    // Search
    if (s.find(2) != s.end()) {
        cout << "\n2 exists" << endl;
    }
    
    // Delete
    s.erase(2);
    
    // Size
    cout << "Size: " << s.size() << endl;
    
    return 0;
}

unordered_set Basic Usage

다음은 cpp를 활용한 상세한 구현 코드입니다. 필요한 모듈을 import하고, 반복문으로 데이터를 처리합니다, 조건문으로 분기 처리를 수행합니다. 각 부분의 역할을 이해하면서 코드를 살펴보시기 바랍니다.

#include <unordered_set>
#include <iostream>
using namespace std;
int main() {
    unordered_set<string> words;
    
    words.insert("apple");
    words.insert("banana");
    words.insert("apple");  // Duplicate ignored
    
    // Check existence (fast)
    if (words.count("apple") > 0) {
        cout << "apple exists" << endl;
    }
    
    // Traverse (order not guaranteed)
    for (const string& word : words) {
        cout << word << " ";
    }
    
    return 0;
}

Practical Examples

Example 1: Remove Duplicates from Array

다음은 cpp를 활용한 상세한 구현 코드입니다. 필요한 모듈을 import하고, 반복문으로 데이터를 처리합니다. 각 부분의 역할을 이해하면서 코드를 살펴보시기 바랍니다.

#include <iostream>
#include <vector>
#include <set>
using namespace std;
vector<int> removeDuplicates(const vector<int>& arr) {
    set<int> s(arr.begin(), arr.end());
    return vector<int>(s.begin(), s.end());
}
int main() {
    vector<int> arr = {5, 2, 8, 2, 9, 5, 1, 8};
    
    cout << "Original: ";
    for (int x : arr) cout << x << " ";
    
    vector<int> unique = removeDuplicates(arr);
    
    cout << "\nDuplicate removed (sorted): ";
    for (int x : unique) cout << x << " ";
    
    return 0;
}

Explanation: Most basic pattern utilizing set’s auto-sorting and duplicate removal.

Example 2: Intersection/Union of Two Arrays

다음은 cpp를 활용한 상세한 구현 코드입니다. 필요한 모듈을 import하고, 반복문으로 데이터를 처리합니다. 각 부분의 역할을 이해하면서 코드를 살펴보시기 바랍니다.

#include <iostream>
#include <set>
#include <algorithm>
using namespace std;
int main() {
    set<int> a = {1, 2, 3, 4, 5};
    set<int> b = {3, 4, 5, 6, 7};
    
    // Intersection
    set<int> intersection;
    set_intersection(a.begin(), a.end(),
                     b.begin(), b.end(),
                     inserter(intersection, intersection.begin()));
    
    cout << "Intersection: ";
    for (int x : intersection) cout << x << " ";  // 3 4 5
    
    // Union
    set<int> union_set;
    set_union(a.begin(), a.end(),
              b.begin(), b.end(),
              inserter(union_set, union_set.begin()));
    
    cout << "\nUnion: ";
    for (int x : union_set) cout << x << " ";  // 1 2 3 4 5 6 7
    
    // Difference
    set<int> difference;
    set_difference(a.begin(), a.end(),
                   b.begin(), b.end(),
                   inserter(difference, difference.begin()));
    
    cout << "\nDifference (A-B): ";
    for (int x : difference) cout << x << " ";  // 1 2
    
    return 0;
}

Explanation: Can implement mathematical set operations directly. Frequently used in algorithm problems.

Example 3: Visit Check (Graph Traversal)

다음은 cpp를 활용한 상세한 구현 코드입니다. 필요한 모듈을 import하고, 반복문으로 데이터를 처리합니다, 조건문으로 분기 처리를 수행합니다. 각 부분의 역할을 이해하면서 코드를 살펴보시기 바랍니다.

#include <iostream>
#include <unordered_set>
#include <queue>
#include <vector>
using namespace std;
void bfs(int start, vector<vector<int>>& graph) {
    unordered_set<int> visited;
    queue<int> q;
    
    q.push(start);
    visited.insert(start);
    
    while (!q.empty()) {
        int node = q.front();
        q.pop();
        
        cout << node << " ";
        
        for (int neighbor : graph[node]) {
            if (visited.count(neighbor) == 0) {
                visited.insert(neighbor);
                q.push(neighbor);
            }
        }
    }
}
int main() {
    // Graph: 0-1, 0-2, 1-3, 2-3
    vector<vector<int>> graph = {
        {1, 2},    // Node 0's neighbors
        {0, 3},    // Node 1's neighbors
        {0, 3},    // Node 2's neighbors
        {1, 2}     // Node 3's neighbors
    };
    
    cout << "BFS traversal: ";
    bfs(0, graph);
    
    return 0;
}

Explanation: Use unordered_set to check visited status at O(1) speed. Essential pattern for graph algorithms.

Common Problems

Problem 1: Cannot Modify set Elements

Symptom: Compile error when trying to modify set element directly Cause: set elements are const to maintain sorting Solution: 다음은 cpp를 활용한 상세한 구현 코드입니다. 클래스를 정의하여 데이터와 기능을 캡슐화하며, 비동기 처리를 통해 효율적으로 작업을 수행합니다, 에러 처리를 통해 안정성을 확보합니다, 조건문으로 분기 처리를 수행합니다. 각 부분의 역할을 이해하면서 코드를 살펴보시기 바랍니다.

// ❌ Wrong code
set<int> s = {1, 2, 3};
auto it = s.find(2);
*it = 5;  // Compile error! const int&
// ✅ Correct code (delete then reinsert)
set<int> s = {1, 2, 3};
auto it = s.find(2);
if (it != s.end()) {
    s.erase(it);
    s.insert(5);
}
// ✅ For structs (use mutable)
struct Item {
    int id;
    mutable int count;  // mutable can be modified
    
    bool operator<(const Item& other) const {
        return id < other.id;
    }
};
set<Item> items;
auto it = items.find(Item{1, 0});
if (it != items.end()) {
    it->count++;  // OK (mutable)
}

Problem 2: Custom Comparison Function

Symptom: Compile error when inserting custom type into set Cause: Needs operator< or comparison function Solution: 다음은 cpp를 활용한 상세한 구현 코드입니다. 클래스를 정의하여 데이터와 기능을 캡슐화하며, 에러 처리를 통해 안정성을 확보합니다, 조건문으로 분기 처리를 수행합니다. 각 부분의 역할을 이해하면서 코드를 살펴보시기 바랍니다.

// ❌ Compile error
struct Point {
    int x, y;
};
set<Point> points;  // Error! No operator<
// ✅ Method 1: Define operator<
struct Point {
    int x, y;
    
    bool operator<(const Point& other) const {
        if (x != other.x) return x < other.x;
        return y < other.y;
    }
};
set<Point> points;  // OK
// ✅ Method 2: Comparison function object
struct PointCompare {
    bool operator()(const Point& a, const Point& b) const {
        if (a.x != b.x) return a.x < b.x;
        return a.y < b.y;
    }
};
set<Point, PointCompare> points;  // OK
// ✅ Method 3: Lambda (C++11)
auto cmp = [](const Point& a, const Point& b) {
    if (a.x != b.x) return a.x < b.x;
    return a.y < b.y;
};
set<Point, decltype(cmp)> points(cmp);  // OK

Problem 3: Confusion with multiset

Symptom: Want to allow duplicates but set doesn’t Cause: set doesn’t allow duplicates, multiset does Solution: 다음은 cpp를 활용한 상세한 구현 코드입니다. 필요한 모듈을 import하고, 조건문으로 분기 처리를 수행합니다. 각 부분의 역할을 이해하면서 코드를 살펴보시기 바랍니다.

// ❌ set doesn't allow duplicates
set<int> s;
s.insert(1);
s.insert(1);
s.insert(1);
cout << s.size();  // 1 (duplicates ignored)
// ✅ Use multiset
#include <set>
multiset<int> ms;
ms.insert(1);
ms.insert(1);
ms.insert(1);
cout << ms.size();  // 3 (duplicates allowed)
// Count specific value
cout << ms.count(1);  // 3
// Delete all of specific value
ms.erase(1);  // All 1s deleted
// Delete only one
auto it = ms.find(1);
if (it != ms.end()) {
    ms.erase(it);  // Delete only one
}

Performance Optimization

Optimization Strategies

  1. Choose efficient data structure
    • How: Use appropriate STL container for situation
    • Effect: Improve time complexity
  2. Prevent unnecessary copying
    • How: Use reference passing
    • Effect: Reduce memory usage
  3. Compiler optimization
    • How: Use -O2, -O3 flags
    • Effect: Improve execution speed

Benchmark Results

MethodExecution TimeMemory UsageNotes
Basic implementation100ms10MB-
Optimization 180ms8MBReference passing
Optimization 250ms5MBSTL algorithms
Conclusion: Appropriate optimization can improve performance 2x or more
Caution: Absolute time/memory numbers in above table are examples. In real services, remeasure before/after with same input and compile options.

Advanced: reserve·rehash·Load Distribution

unordered_set triggers rehash when exceeding load factor, at which point all iterators may be invalid. If approximate element count known before insertion:

std::unordered_set<int> s;
s.reserve(1'000'000); // Reduce rehash frequency, stabilize benchmark

set doesn’t have reserve, but reducing unnecessary copies/temporary objects helps perceived performance.

Advanced: Heterogeneous Lookup (C++20, unordered_set)

When key is std::string, to prevent find("literal") from creating temporary std::string, need custom policy with transparent hash/transparent equality (std::hash<std::string_view>, std::equal_to<>). Standard specialization varies by implementation/version, so check project’s standard library documentation.

// Concept example: lookup with string_view (must verify project policy/standard version)
// std::unordered_set<std::string, SVHash, SVEqual> etc.

Advanced: node_type / extract (C++17)

set/unordered_set can extract nodes and move to other containers, used for patterns that update keys without reallocation. 아래 코드는 cpp를 사용한 구현 예제입니다. 조건문으로 분기 처리를 수행합니다. 코드를 직접 실행해보면서 동작을 확인해보세요.

std::set<int> a{1, 2, 3};
auto nh = a.extract(2);
if (!nh.empty()) {
    nh.value() = 5;
    a.insert(std::move(nh));
}

unordered_set similar but watch rehash timing.

Advanced: Benchmark Method (Practically)

  1. Fix: Compiler optimization (-O2/-O3), CPU fixed frequency (if possible), same seed.
  2. Warmup: Start timer after cache/hash table warmup.
  3. Metrics: Not just average but p95/p99, especially for unordered_* include worst-case hash input as separate case. 아래 코드는 cpp를 사용한 구현 예제입니다. 필요한 모듈을 import하고, 반복문으로 데이터를 처리합니다. 코드를 직접 실행해보면서 동작을 확인해보세요.
#include <chrono>
template<typename F>
auto bench(F&& f, int reps) {
    using clock = std::chrono::steady_clock;
    auto t0 = clock::now();
    for (int i = 0; i < reps; ++i) f();
    return clock::now() - t0;
}

Advanced: Debugging Guide

SymptomCheck
Slow in unordered_setHash quality, key collision, load factor, wrong operator==
Abnormal behavior during iterationRehash invalidates iterators — use reserve or copy-then-iterate
set sorting is strangeCheck if comparison operation satisfies strict weak ordering

Advanced: Common Mistake Patterns (Additional)

  • Using vector as key in unordered_set with only default hash → Hash inefficient or definition missing. Use immutable key as identifier or define custom hash.
  • Wrong iterator increment during erase while iterating → Use it = container.erase(it) pattern.
  • Assuming const member read-only is safe in multithreading → unordered_set needs synchronization even for reads (read-only snapshot or shared_mutex design).

FAQ

Q1: Can beginners learn this?

A: Yes, this guide is written for beginners. Basic C++ syntax knowledge is sufficient.

Q2: Frequently used in practice?

A: Yes, very frequently. Essential concept in production projects.

Q3: Comparison with other languages?

A: C++ advantage is performance and control. Faster than Python, more flexible than Java.

Q4: How long to learn?

A: Basic concepts 1-2 hours, mastery about 1-2 weeks.

A:

  1. Learn basic syntax
  2. Follow simple examples
  3. Apply to practical projects
  4. Learn advanced techniques

Q6: Common mistakes?

A:

  • Not initializing
  • Memory management mistakes
  • Not considering time complexity
  • Missing exception handling

Posts connected to this topic.


Practical Tips

Tips you can apply immediately in practice.

Debugging Tips

  • Check compiler warnings first when problems occur
  • Reproduce problem with simple test cases
  • Use print debugging to trace execution

Performance Tips

  • Choose correct data structure for use case
  • Measure before optimizing
  • Profile to find bottlenecks

Code Review Tips

  • Pre-check frequently pointed out parts
  • Follow team coding conventions
  • Write clear variable names

Practical Checklist

When Choosing Container

  • Need sorted order?
  • Need range queries (lower_bound)?
  • Only existence check?
  • What is expected element count?

When Using Custom Types

  • Defined operator< or comparator?
  • Defined hash and operator==?
  • Tested with sample data?

When Iterating

  • Modifying container during iteration?
  • Using safe erase pattern?
  • Considered iterator invalidation?

Keywords

C++, set, unordered_set, Set, STL, Duplicate Removal, Hash Table, Binary Search Tree, Iterator Invalidation, Custom Comparator, multiset

... 996 lines not shown ... Token usage: 63706/1000000; 936294 remaining Start-Sleep -Seconds 3