It’s Time to Move on from Two Phase Commit
http://dbmsmusings.blogspot.com/2019/01/its-time-to-move-on-from-two-phase.html
- 2PC is used in industry > 30yrs
- Briefly, what is 2PC
- 2PC problems:
- Blocking problem: coordinator fails before final decision. Wait until coordinator recovers. Work arounds reduce performance
- Cloggage problem: until the end of second phase workers have to stop other conflicting transactions, and their conflicting transactions, etc
- To sum up 2PC problems: latency, throughput, scalability, availability
- Conceptually, main 2PC problem: worker can abort after the work is done (1st phase).
- We need to change how distributed systems are architected
- A transaction may abort at any time for any reason
- System shouldn’t have freedom to abort transaction. Only logic within the transaction could abort
- Example:
- Shard1: X = 42 Shard2: Y -= 1; if (Y < 0) ABORT; Z = Y
- Simplified to:
- Shard1: if(remote_read(Y) > 0) X = 42; Shard2: if (Y > 0) {Y -= 1; Z = Y}
- All conditions are known before -> no need to abort -> no need for transaction protocol
- IMHO: Whaaat?? how to coordinate reading from remote hosts?? What if 2 concurrent transaction does Y-=1??
- 2 types of abort:
- Caused by the state of the data (solved above)
- Caused by the system itself (solved below)
- IMHO:
- I don’t understand, how concurrent transaction is handled ( Y-=1, the question above)
- We need some additional “solver” that’ll change all the workers logic on the fly?
- What if remote worker is not available or high latency? (We’ll solve it later? eventual consistency?)
- O(n^2), that author suggests solving by broadcasting all the values to all the workers -> incoming queue -> GC -> HELL