Streams: a new general purpose data structure in Redis
http://antirez.com/news/114
On Disk IO, Part 1: Flavours of IO
https://medium.com/@ifesdjeen/on-disk-io-part-1-flavours-of-io-8e1ace1de017
- Filesysem IO has a large influence (like Network IO)
- It has smaller set of tools.
- Flavours of IO:
- syscalls: open, write, read, fsync, sync, close
- Standard IO: fopen, fwrite, fread, fflush, fclose
- Vectored IO: writev, readv
- Memory mapped IO: open, mmap, msync, munmap
- Buffered IO. in stdlib: full and line buffering.
- Kernel buffering
- HDD + SDD: block device. The smallest addressable unit is sector. Not possible to read/write less then sector
- Smallest addressable unit of FS is block (generally > sector)
- Page size >= block size (Virtual Memory unit)
- Disk -> OS (+cache) -> RAM
- Page cache: linux delay writes and coalescing join adjecent reads. read-write-read can be done in memory. “Dirty” pages are flushed
- Delaying FS errors (during flush)
- DirectIO: O_DIRECT while opening file (slower). Postgresql, Mysql uses it. Should be block-aligned.
On Disk IO, Part 2: More Flavours of IO
https://medium.com/@ifesdjeen/on-disk-io-part-2-more-flavours-of-io-c945db3edb13
- mmap
- PageFault during first read of not-yet-loaded-memory
- mmap with protection flags (e.g. in read-only). If OS violates -> SegFault
- help Kernel about bufferingIO ->
fadvise
, mlock
(to lock pages to be held in the memory, all operations are via Page Cache)
- Asynchronous IO (AIO). Register operations + callbacks to be called. io_submit, io_getevents
- Problems: AIO is not in glibc, some syscalls (stat, fsync, open) aren’t fully async
- Linux AIO is not the same as Posix AIO. Posix AIO is in user space.
- VectorIO
On Disk IO, Part 3: LSM Trees
https://medium.com/@ifesdjeen/on-disk-io-part-3-lsm-trees-8b2da218496f
- Immutable vs Mutable data structures
- LSM-trees
IMHO: Must read