Building a Codec

Encode once, decode anywhere. A codec is the contract between your structs and bytes on disk.

In-memory struct (KeyValue[])

0KeyValue {key: "name",value: "Alice"}

1KeyValue {key: "age",value: "30"}

2KeyValue {key: "city",value: "SF"}

Byte stream on disk

44 bytes total

Records

"name"→"Alice"

"age"→"30"

"city"→"SF"

Format (per record)

key_len4 bytes, u32 LE

keykey_len bytes, UTF-8

val_len4 bytes, u32 LE

valueval_len bytes, UTF-8

// Rust encode

buf.extend(&(key.len() as u32)

.to_le_bytes());

buf.extend(key.as_bytes());

buf.extend(&(val.len() as u32)

.to_le_bytes());

buf.extend(val.as_bytes());

Total size

24B length headers + 11B keys + 9B values = 44B

Key Insight

A codec is a symmetric contract: whatever the encoder writes, the decoder must be able to read back, byte for byte. The encoder calls encode(struct) → bytes. The decoder calls decode(bytes) → struct. They never communicate except through the byte stream — which might have been written days ago, on a different machine.

This is why the format must be self-describing (length prefixes tell you how many bytes to read) or you need a shared external schema. RocksDB, protobuf, and SQLite all make different tradeoffs here.

But there's still a gap: you're thinking in bytes, but the disk doesn't. It speaks in 4 KB blocks. Read 1 byte, read 4096.