File I/O

From bytes to data structures. Understand what happens between your code and the disk.

The Gap You Need to Bridge

You think in key-value pairs, but disks think in bytes at offsets. The OS gives you read(), write(), and seek() — that's it. This chapter shows you how to bridge that gap: encoding structures into bytes, understanding why hardware forces you to think in blocks, and why every database ends up with the same patterns.

All examples use Rust with direct OS APIs. We'll reference Linux syscalls and libc where it matters.

The Journey

Each topic solves a specific problem. By the end, you'll understand why databases are designed the way they are.

The Raw Interface

What the OS actually gives you

File descriptors, read(), write(), seek() - that's all you get. No read_line, no read_struct, just bytes at offsets.

Key Insight: A file is just Vec<u8> with a cursor. The OS doesn't know what those bytes mean.

The Encoding Problem

Where does one value end and another begin?

If you write "hello" then "world", you read "helloworld". Learn framing: length-prefixes, delimiters, and fixed-width fields.

Key Insight: You must encode boundaries into the byte stream. This is called 'framing'.

Numbers Have Secrets

Endianness, alignment, and varints

The number 300 can be [0x2C, 0x01] or [0x01, 0x2C] or [0xAC, 0x02]. Learn why databases use variable-length integers.

Key Insight: Every encoding is a tradeoff between space, speed, and complexity.

Building a Codec

Encode once, decode anywhere

A codec is the contract between your structs and bytes. Learn to serialize and deserialize data structures.

Key Insight: The codec bridges your mental model (structs) and reality (bytes).

The Disk Doesn't Care

Sectors, blocks, and the lie of byte-addressable storage

You ask for 1 byte, the disk reads 4KB. The hardware speaks in blocks, not bytes.

Key Insight: Random byte access is a convenient lie. The truth is block I/O.

The Page Cache

Why your writes don't go to disk (immediately)

The OS lies to you. write() returns, but data is still in RAM. Learn about fsync() and durability.

Key Insight: Durability requires explicit action. Default writes are 'fire and forget'.

Why Batching is Inevitable

The economics of I/O

Given block-based hardware and syscall overhead, batching isn't optional - it's physics.

Key Insight: SST files use blocks because they have to, not because they want to.

Building a Log File

Putting it all together

Combine framing + codec + batching + fsync to build a durable write-ahead log from scratch.

Key Insight: Now you understand why LSM databases buffer writes and flush to immutable files.

How This Connects to LSM Databases

SST Files

Now you'll understand why they use blocks, why they're immutable, and why we need an index.

MemTable

This is the "buffer" before flushing to disk. Batching writes for efficiency.

Compaction

This is batch processing of multiple SST files. Same principles, larger scale.