The Disk Doesn't Care

You ask for 1 byte. The disk reads 4 KiB. The hardware speaks in blocks, not bytes.

File — 32 KiB (8 blocks × 4 KiB)

reading byte at offset 5000

Block 004095
Block 140968191
Block 2819212287
Block 31228816383
Block 41638420479
Block 52048024575
Block 62457628671
Block 72867232767

Click a block to target a byte inside it. The highlighted dot = offset 5000.

What actually happens

Your code
read(fd, buf, 1) at offset 5000
0500032767

Why 4 KiB blocks?

HDD (spinning)

Sector = 512B or 4KB. High seek latency (10ms). Reading 1 extra sector costs nothing in seek time, only bandwidth.

SSD / NVMe

Page = 4–16 KB. Flash reads entire page even for 1 byte. erase block = 128–512 KB minimum write unit.

Filesystem (ext4)

Default block size = 4 KiB. Aligns with both HDD sectors and SSD pages. Page cache tracks at 4 KiB granularity.

// kernel/Documentation/filesystems
block_size = 4096; // bytes
// "The smallest unit that the
//  filesystem can allocate"

Key Insight

Random byte access is a convenient lie. The OS presents you with a byte-addressable abstraction, but underneath, every read transfers at least one 4 KiB block from disk to the page cache. If you read 1 byte, you've paid the cost of reading 4096 bytes.

This is why SST files pack records into blocks — you're paying for 4 KiB whether you like it or not, so you might as well fill the block. It's also why sequential reads are so much faster than random: you're reading blocks that contain data you'll actually use.

But there's another layer of indirection above the disk: the page cache. When you call write(), the data doesn't go to disk immediately. It goes to RAM. And if the power cuts out before the OS flushes it...