1000 nodes and beyond

1.1. Watching from API server instead of etcd
1.2. Created a “read cache” at the API server level (in-memory cache, instead of etcd read). Cache is updated via watch in the background
pods density: 30 (1.1) -> 100 (1.2): Pod Lifecycle Event Generator (PLEG). Pull docker state per pod -> per host
scheduler throughput. 30k pod scheduled time was cut by ~1400% + link (coreos)
JSON parser
1.3 and beyond:
- API server is bottleneck (JSON). -> protobuff
- label selector performance improvement
- etcd 3.0 with Kubernetes usecase in mind

Gliush Notebook