Go Advanced #6 Profiling — pprof and benchmark
After #5 unsafe and cgo, this time a tool of the opposite flavor. Measurement.
“Don’t guess; measure.”
Performance issues almost always appear somewhere other than where you suspect. Go’s standard tools are powerful enough to move you quickly from guesswork to measurement.
benchmark — standard tooling #
// adder_test.go
package adder
import "testing"
func BenchmarkAdd(b *testing.B) {
for i := 0; i < b.N; i++ {
Add(1, 2)
}
}go test -bench=. -benchmemBenchmarkAdd-8 1000000000 0.30 ns/op 0 B/op 0 allocs/opb.N— auto-tuned to make timing stable-benchmem— adds allocation infons/op— time per single executionB/op/allocs/op— bytes and count of allocations
Benchmark-writing fundamentals #
func BenchmarkParse(b *testing.B) {
data := loadBigInput() // heavy setup
b.ResetTimer() // measure from here
for i := 0; i < b.N; i++ {
Parse(data)
}
}Setup cost is excluded with ResetTimer. You can also pause and resume timing inside the loop with b.StopTimer and b.StartTimer.
Avoiding compiler optimizations #
func BenchmarkSum(b *testing.B) {
for i := 0; i < b.N; i++ {
sum(1, 2) // ✗ if the result is unused, the compiler may eliminate it entirely
}
}Solution — assign the result to a package-level variable (so the compiler can’t eliminate it).
var benchResult int
func BenchmarkSum(b *testing.B) {
var r int
for i := 0; i < b.N; i++ {
r = sum(1, 2)
}
benchResult = r
}benchstat — comparing two results #
go test -bench=. -count=10 > before.txt
# edit code
go test -bench=. -count=10 > after.txt
go install golang.org/x/perf/cmd/benchstat@latest
benchstat before.txt after.txt │ before │ after │
│ sec/op │ sec/op vs base │
Parse-8 │ 1.23µ ± 2% │ 0.85µ ± 1% -30.89% (p=0.000 n=10)The p value tells you whether the difference is statistically significant. A single run can be fooled by noise — -count=10 is recommended.
CPU profile #
go test -bench=. -cpuprofile=cpu.out
go tool pprof cpu.out(pprof) top
Showing nodes accounting for 1.23s, 87.86% of 1.4s total
flat flat% sum% cum cum%
0.43s 30.71% 30.71% 0.65s 46.43% parse
0.31s 22.14% 52.86% 0.31s 22.14% hash
...
(pprof) list parse
(pprof) web ← call graph in the browserUse top to see which functions consume the most time, and list <fn> for line-by-line timing.
Memory profile #
go test -bench=. -memprofile=mem.out
go tool pprof -alloc_space mem.outTwo viewpoints:
-alloc_space— cumulative allocations (best shows GC load)-inuse_space— currently live memory
In hot spots with many allocations, the GC runs frequently and throughput drops. Memory profiles are usually analyzed alongside escape analysis.
Escape analysis #
go build -gcflags='-m' main.go./main.go:5:9: &User{...} escapes to heapThis tells you why an object that could have stayed on the stack was allocated on the heap instead — the starting point for reducing allocations.
Production profiling — net/http/pprof #
import (
"net/http"
_ "net/http/pprof" // /debug/pprof/* endpoints registered automatically
)
func main() {
go http.ListenAndServe(":6060", nil)
// the main server runs separately...
}# CPU profile for 30 seconds
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30
# current memory
go tool pprof http://localhost:6060/debug/pprof/heap
# current goroutines
go tool pprof http://localhost:6060/debug/pprof/goroutineIn production you can profile without stopping the service. Just be careful not to expose port 6060 publicly — keep it internal only.
Finding goroutine leaks #
curl http://localhost:6060/debug/pprof/goroutine?debug=1This shows the stacks of every currently live goroutine. A growing count means a leak — the first tool to reach for when you suspect the leak patterns from Intermediate #3.
Trace — analysis along a time axis #
If CPU and memory profiles tell you “where,” trace tells you “when.”
go test -bench=. -trace=trace.out
go tool trace trace.outA browser opens showing goroutine scheduling, GC events, and system calls along a timeline. This is the right tool for time-axis problems like GC running too frequently or goroutines starving.
Race detector #
You saw it briefly in #2.
go test -race ./...
go run -race main.goAlways on in tests and local runs. In CI, run a separate race build as well. With roughly 5–10x overhead, do not enable it in production.
Measurement workflow #
A typical flow:
- Users/metrics report it’s slow
- CPU profile → which functions consume time
- (If memory is suspected) memory profile → where allocations occur
- Write a benchmark for the suspected part — a reproducible measurement
- Edit + benchstat to compare — did it actually get faster?
- Redeploy and reconfirm via metrics
The key is to verify with measurement at every step. Optimizing on guesses alone often makes things slower.
Common cases #
String concatenation #
var s string
for _, p := range parts {
s += p // ✗ allocates a new string each time
}Solution — strings.Builder.
var b strings.Builder
for _, p := range parts {
b.WriteString(p)
}
s := b.String()Slice without preallocated capacity #
result := make([]int, 0, len(items)) // ✓ pre-size capacity
for _, x := range items {
result = append(result, transform(x))
}Starting with make([]int, 0) causes multiple reallocations as the slice grows. Pre-sizing keeps it to a single allocation.
Map capacity #
m := make(map[string]int, expectedSize)Maps also benefit from an expected size hint — it reduces rehashing and reallocations.
Interface boxing #
var any interface{} = 42 // int → interface{} boxing (potentially heap-allocated)Frequent conversions to interface{} (or any) in hot loops accumulate allocations. Where possible, use concrete types.
Wrap-up #
What we covered:
- benchmark —
b.N,-benchmem,ResetTimer, preserve results to avoid optimization - benchstat — statistical comparison of two measurements
- CPU profile —
top,list,web - Memory profile —
-alloc_spaceis usually more useful - Escape analysis —
-gcflags='-m'for heap-allocation reasons - net/http/pprof — real-time profiling in production
- trace — time axis, GC, scheduling
- race detector — always on in tests
- Workflow: metrics → profile → bench → edit → benchstat → reconfirm
In the next post (#7 Code Generation) — another path Go often recommends. How to automate without paying reflect’s cost, and standard tools like go generate and stringer.