Why Batching is Inevitable

Syscall overhead + block I/O make batching a physics requirement, not a choice.

Writing 64 KiB with different batch sizes. Cost = syscall overhead + I/O latency.

Unbuffered (1 byte at a time)

65,536 syscalls329,280 μs0.2 MB/s

Batched (64 bytes)

1,024 syscalls6,720 μs9.8 MB/s

Batched (512 bytes)

128 syscalls2,240 μs29.3 MB/s

Block-aligned (4 KiB batches)

16 syscalls1,680 μs39 MB/s

Batched (16384 bytes)

4 syscalls1,620 μs40.5 MB/s

The 1-byte-at-a-time nightmare

Writing 64 KiB one byte at a time = 65,536 syscalls. Each syscall crosses the user/kernel boundary — even on a fast machine, that's 328 ms of pure overhead, before any disk I/O.

Total data to write

Cost model

Syscall overhead5 μs

SSD I/O latency100 μs / op

Block size (filesystem)4096 bytes

Simplified model. Real syscall overhead varies 1–20μs. SSD latency ~50–200μs. HDDs 5–15ms.

Rust BufWriter

// BAD: 1000 syscalls

for i in 0..1000 {

file.write(&[i as u8])?;

}

// GOOD: ~1 syscall

let mut w = BufWriter::

with_capacity(4096, file);

for i in 0..1000 {

w.write(&[i as u8])?;

}

w.flush()?;

Key Insight

Batching isn't an optimization — it's physics. Two facts make it inevitable:

Every syscall has ~5μs overhead (kernel mode switch). 1000 × 1-byte writes = 5ms of overhead alone.
Every disk operation transfers at least 4 KiB. Writing 100 bytes in 100 separate syscalls still does 100 disk ops.

This is why SST files use 4 KiB blocks. It's not a design preference. You're paying for a 4 KiB block transfer whether your data is 1 byte or 4096. Fill the block. And it's why the MemTable exists — batch all writes in RAM until you have enough to justify flushing to disk.

Now you have all the primitives. Let's build something real with them: a write-ahead log from scratch.