Separation of compute and state in Google BigQuery and Cloud Dataflow (and why it matters)
https://cloud.google.com/blog/big-data/2017/10/separation-of-compute-and-state-in-google-bigquery-and-cloud-dataflow-and-why-it-matters
- separation = ability to maintain intermediate state between processing stages in a high performance component, separate from either the compute cluster or storage
- Benefits to separating compute from state:
- Less state in compute -> it is more ephemeral and scalable. Easier to parallelize and easier to recover from a lost node
- Processing stages don’t conflict within the same compute node
- Easier for the processing engine to re-partition the workload
- Pipeline execution: stage N+1 before stage N is finished.
- Resilience to node failures
- Service can utilitize available resources more effeciently