Leaf-prometheus servers, one meta-Prometheus server
Problems:
Configuration
Add one more single point of failure
Complex rules to expose only certain data on the federated endpoint
Not all data is available from a single query API
Another solution: HA pairs of Prometheus servers
independently collect data -> problem with deduplication
Prometheus 2.0: total number of time series doesn’t impact server performance
Downsampling: reduce sampling rate (to see the big picture)
Thanos:
Prometheus Sidecar to store and query data
Querier: request data from all the sidecars, then run PromQL query agains the data (deduplication from HA pairs)
Object storage to store historic data in cloud storage. Align data (how?)
Immutable data: you can always write blocks to storage
Store component. Gossip -> are treated like Sidecar, used to cache and handle data in Storages
Files are large, slow to download => Store Gateway caches index parts of files, smart query planner => minimize requests to storage (get part of a file). 4-6 orders of magnitude faster than naive implementation.
=> hard to distinguish object storage requests from local ssd requests
Compactor: apply Prometheus local compaction (downsampling) to the Object Storage.
Ruler: evaluates rules and alerts against Thanos Queriers. Then they are backed up to the Object Stores.
How to migrate from Prometheus (sequence of steps)