[2026] C++ Profiling: Find Bottlenecks with Timers, gprof, perf, and Callgrind

2026년 3월 12일 · 11분 읽기 · 수정 2026년 3월 12일 Intermediate Tutorial

이 글의 핵심

C++ profiling guide: chrono timers, gprof, Linux perf, Valgrind Callgrind, and common pitfalls—measure before you optimize.

What is profiling?

Profiling is the process of measuring program performance and finding bottlenecks. 아래 코드는 cpp를 사용한 구현 예제입니다. 코드를 직접 실행해보면서 동작을 확인해보세요.

// Before: you do not know what is slow
void process() {
    step1();
    step2();
    step3();
}
// After: step2 takes ~90% of the time

Basic timing

다음은 cpp를 활용한 상세한 구현 코드입니다. 필요한 모듈을 import하고, 반복문으로 데이터를 처리합니다. 각 부분의 역할을 이해하면서 코드를 살펴보시기 바랍니다.

#include <chrono>
#include <iostream>
void measureTime() {
    auto start = std::chrono::high_resolution_clock::now();
    
    for (int i = 0; i < 1000000; i++) {
        // work
    }
    
    auto end = std::chrono::high_resolution_clock::now();
    auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(end - start);
    
    std::cout << "Time: " << duration.count() << "ms" << std::endl;
}

Examples

Example 1: Scoped function timer

다음은 cpp를 활용한 상세한 구현 코드입니다. 필요한 모듈을 import하고, 클래스를 정의하여 데이터와 기능을 캡슐화하며. 각 부분의 역할을 이해하면서 코드를 살펴보시기 바랍니다.

#include <chrono>
#include <iostream>
class Timer {
    std::chrono::time_point<std::chrono::high_resolution_clock> start;
    std::string name;
    
public:
    Timer(const std::string& n) : name(n) {
        start = std::chrono::high_resolution_clock::now();
    }
    
    ~Timer() {
        auto end = std::chrono::high_resolution_clock::now();
        auto duration = std::chrono::duration_cast<std::chrono::microseconds>(end - start);
        std::cout << name << ": " << duration.count() << "μs" << std::endl;
    }
};
void slowFunction() {
    Timer t("slowFunction");
}
void fastFunction() {
    Timer t("fastFunction");
}

Example 2: gprof

g++ -pg program.cpp -o program
./program
gprof program gprof.out > analysis.txt

Example 3: perf

perf record ./program
perf report
perf stat ./program

Example 4: Valgrind Callgrind

valgrind --tool=callgrind ./program
kcachegrind callgrind.out.*

Finding bottlenecks

다음은 cpp를 활용한 상세한 구현 코드입니다. 필요한 모듈을 import하고, 클래스를 정의하여 데이터와 기능을 캡슐화하며, 에러 처리를 통해 안정성을 확보합니다, 반복문으로 데이터를 처리합니다. 각 부분의 역할을 이해하면서 코드를 살펴보시기 바랍니다.

#include <map>
#include <chrono>
class Profiler {
    struct Entry {
        size_t count = 0;
        long long totalTime = 0;
    };
    
    std::map<std::string, Entry> entries;
    
public:
    void start(const std::string& name) {
        // record start
    }
    
    void end(const std::string& name) {
        // record end
    }
    
    void report() {
        for (const auto& [name, entry] : entries) {
            std::cout << name << ": " 
                      << entry.totalTime / entry.count << "μs" 
                      << " (" << entry.count << " calls)" << std::endl;
        }
    }
};

Common pitfalls

Pitfall 1: Measurement overhead

아래 코드는 cpp를 사용한 구현 예제입니다. 반복문으로 데이터를 처리합니다. 각 부분의 역할을 이해하면서 코드를 살펴보시기 바랍니다.

// ❌ Measuring inside a tight loop
for (int i = 0; i < 1000000; i++) {
    auto start = std::chrono::high_resolution_clock::now();
    doWork();
    auto end = std::chrono::high_resolution_clock::now();
}
// ✅ Measure the whole loop
auto start = std::chrono::high_resolution_clock::now();
for (int i = 0; i < 1000000; i++) {
    doWork();
}
auto end = std::chrono::high_resolution_clock::now();

Pitfall 2: Unoptimized debug builds

아래 코드는 cpp를 사용한 구현 예제입니다. 코드를 직접 실행해보면서 동작을 확인해보세요.

// Debug build can be misleadingly slow
g++ -g program.cpp
// ✅ Profile release-like build with symbols
g++ -O2 -g program.cpp

Pitfall 3: Cache effects

// First run may be slow (cold cache)
// Later runs faster (warm cache)
// ✅ Run multiple times and average

Pitfall 4: Premature optimization

// ❌ Optimize before measuring
// ✅ Measure → find hotspot → optimize that area only

Profiling tools (quick reference)

다음은 bash를 활용한 상세한 구현 코드입니다. 각 부분의 역할을 이해하면서 코드를 살펴보시기 바랍니다.

# gprof
g++ -pg program.cpp
./a.out
gprof a.out gmon.out
# perf (Linux)
perf record ./program
perf report
# Valgrind Callgrind
valgrind --tool=callgrind ./program
# Instruments (macOS)
instruments -t "Time Profiler" ./program
# Visual Studio Profiler (Windows)

FAQ

Q1: When should I profile?

A: When you have a performance issue, before major optimizations, or as part of regular monitoring.

Q2: Which tool?

A: gprof for a quick start; perf for detail on Linux; Callgrind for accurate call graphs; Instruments on Mac.

[2026] C++ Profiling: Find Bottlenecks with Timers, gprof, perf, and Callgrind

이 글의 핵심

What is profiling?

Basic timing

Examples

Example 1: Scoped function timer

Example 2: gprof

Example 3: perf

Example 4: Valgrind Callgrind

Finding bottlenecks

Common pitfalls

Pitfall 1: Measurement overhead

Pitfall 2: Unoptimized debug builds

Pitfall 3: Cache effects

Pitfall 4: Premature optimization

Profiling tools (quick reference)

FAQ

Q1: When should I profile?

Q2: Which tool?

Q3: What units?

Q4: Optimization order?

Q5: Production profiling?

Q6: Learning resources?

A: Optimized C++, perf docs, Valgrind docs.

See also (internal links)

Practical tips

Debugging

Performance

Code review

Checklist

Before coding

While coding

At review

Keywords

C++, profiling, performance, optimization, gprof, perf, Callgrind.

이 글의 핵심

What is profiling?

Basic timing

Examples

Example 1: Scoped function timer

Example 2: gprof

Example 3: perf

Example 4: Valgrind Callgrind

Finding bottlenecks

Common pitfalls

Pitfall 1: Measurement overhead

Pitfall 2: Unoptimized debug builds

Pitfall 3: Cache effects

Pitfall 4: Premature optimization

Profiling tools (quick reference)

FAQ

Q1: When should I profile?

Q2: Which tool?

Q3: What units?

Q4: Optimization order?

Q5: Production profiling?

Q6: Learning resources?

A: Optimized C++, perf docs, Valgrind docs.

See also (internal links)

Practical tips

Debugging

Performance

Code review

Checklist

Before coding

While coding

At review

Keywords

C++, profiling, performance, optimization, gprof, perf, Callgrind.

Related posts