Building a Codec

Encode once, decode anywhere. A codec is the contract between your structs and bytes on disk.

In-memory struct (KeyValue[])

0KeyValue {key: "name",value: "Alice"}
1KeyValue {key: "age",value: "30"}
2KeyValue {key: "city",value: "SF"}

Byte stream on disk

44 bytes total

04
00
00
00
6e
61
6d
65
05
00
00
00
41
6c
69
63
65
03
00
00
00
61
67
65
02
00
00
00
33
30
04
00
00
00
63
69
74
79
02
00
00
00
53
46

Records

"name""Alice"
"age""30"
"city""SF"

Format (per record)

key_len4 bytes, u32 LE
keykey_len bytes, UTF-8
val_len4 bytes, u32 LE
valueval_len bytes, UTF-8
// Rust encode
buf.extend(&(key.len() as u32)
.to_le_bytes());
buf.extend(key.as_bytes());
buf.extend(&(val.len() as u32)
.to_le_bytes());
buf.extend(val.as_bytes());

Total size

24B length headers + 11B keys + 9B values = 44B

Key Insight

A codec is a symmetric contract: whatever the encoder writes, the decoder must be able to read back, byte for byte. The encoder calls encode(struct) → bytes. The decoder calls decode(bytes) → struct. They never communicate except through the byte stream — which might have been written days ago, on a different machine.

This is why the format must be self-describing (length prefixes tell you how many bytes to read) or you need a shared external schema. RocksDB, protobuf, and SQLite all make different tradeoffs here.

But there's still a gap: you're thinking in bytes, but the disk doesn't. It speaks in 4 KB blocks. Read 1 byte, read 4096.