Introducing ebpf_exporter
https://blog.cloudflare.com/introducing-ebpf_exporter/
- Two main options to collect system metrics
- node_exporter: gauges and counters with lots of plugins
- cAdvisor: gauges and counters for container level metrics
- Counters are great, but they lack details about individual events
- rate() - is 1 sec change, so > 1sec is impossible to get
- Histograms - better, but node_exporter - only counters
- Datazaurus picture => any plot to preserve the same mean + standard deviation values
- Histogram - shows the truth
- io + linux kernel => ?
- should have low overhead
- universal
- should be supported out of the box
- safe
- => eBPF = sandboxed, low overhead, user-defined bytecode in kernel
- can never crash
- can’t interfere with kernel negatively
- c -> bcc (llvm based) -> kernel bytecode
- examples:
- directory cache lookups times
- slow ext4 operations
- L3 CPU cache hit
- etc
- map keys to labels (disk names, function names, cpu number)
- ebpf_exporter repo with examples
- complex example
- tracepoint fires -> increment counter for function address)
- function address -> hash ->
ksym
decoder (/proc/kallsyms)
- => graph of counters “what function the kernel is in”