Why Batching is Inevitable

Syscall overhead + block I/O make batching a physics requirement, not a choice.

Writing 64 KiB with different batch sizes. Cost = syscall overhead + I/O latency.

Unbuffered (1 byte at a time)
65,536 syscalls329,280 μs0.2 MB/s
Batched (64 bytes)
1,024 syscalls6,720 μs9.8 MB/s
Batched (512 bytes)
128 syscalls2,240 μs29.3 MB/s
Block-aligned (4 KiB batches)
16 syscalls1,680 μs39 MB/s
Batched (16384 bytes)
4 syscalls1,620 μs40.5 MB/s

The 1-byte-at-a-time nightmare

Writing 64 KiB one byte at a time = 65,536 syscalls. Each syscall crosses the user/kernel boundary — even on a fast machine, that's 328 ms of pure overhead, before any disk I/O.

Cost model

Syscall overhead5 μs
SSD I/O latency100 μs / op
Block size (filesystem)4096 bytes

Simplified model. Real syscall overhead varies 1–20μs. SSD latency ~50–200μs. HDDs 5–15ms.

Rust BufWriter

// BAD: 1000 syscalls
for i in 0..1000 {
file.write(&[i as u8])?;
}
// GOOD: ~1 syscall
let mut w = BufWriter::
with_capacity(4096, file);
for i in 0..1000 {
w.write(&[i as u8])?;
}
w.flush()?;

Key Insight

Batching isn't an optimization — it's physics. Two facts make it inevitable:

  1. Every syscall has ~5μs overhead (kernel mode switch). 1000 × 1-byte writes = 5ms of overhead alone.
  2. Every disk operation transfers at least 4 KiB. Writing 100 bytes in 100 separate syscalls still does 100 disk ops.

This is why SST files use 4 KiB blocks. It's not a design preference. You're paying for a 4 KiB block transfer whether your data is 1 byte or 4096. Fill the block. And it's why the MemTable exists — batch all writes in RAM until you have enough to justify flushing to disk.

Now you have all the primitives. Let's build something real with them: a write-ahead log from scratch.