Building a Codec
Encode once, decode anywhere. A codec is the contract between your structs and bytes on disk.
In-memory struct (KeyValue[])
Byte stream on disk
44 bytes total
Records
Format (per record)
key_len4 bytes, u32 LEkeykey_len bytes, UTF-8val_len4 bytes, u32 LEvalueval_len bytes, UTF-8Total size
24B length headers + 11B keys + 9B values = 44B
Key Insight
A codec is a symmetric contract: whatever the encoder writes, the decoder must be able to read back, byte for byte. The encoder calls encode(struct) → bytes. The decoder calls decode(bytes) → struct. They never communicate except through the byte stream — which might have been written days ago, on a different machine.
This is why the format must be self-describing (length prefixes tell you how many bytes to read) or you need a shared external schema. RocksDB, protobuf, and SQLite all make different tradeoffs here.
But there's still a gap: you're thinking in bytes, but the disk doesn't. It speaks in 4 KB blocks. Read 1 byte, read 4096.