The Encoding Problem

If you write 'hello' then 'world', you read back 'helloworld'. Where's the boundary? Learn framing strategies.

Value A

"hello"

+

Value B

"world"

Byte stream on disk (10 bytes)

⚠ Boundaries lost!
68h
65e
6cl
6cl
6fo
77w
6fo
72r
6cl
64d
"hello""world"

Pros

+ Zero overhead

+ Trivial to write

Cons

- Cannot tell where one value ends

- Requires external schema

- Impossible to decode independently

Framing strategy

Rust

// WRONG: boundaries are lost
file.write_all(a.as_bytes())?;
file.write_all(b.as_bytes())?;
// Reader sees: "helloworld" — which part is "hello"?

Key Insight

When you write "hello" then "world" with no framing, the file contains 68 65 6c 6c 6f 77 6f 72 6c 64 — 10 bytes with no boundary information. The reader sees "helloworld" and has no way to recover the original two values. Framing is not optional; it's the contract between writer and reader.

Every real format picks a strategy: protobuf uses length-prefixes, JSON uses delimiters (}/,), SQL databases use fixed-width or length-prefix. The choice is a tradeoff — and it's irrevocable once data is written to disk.

Next problem: even with framing, numbers are ambiguous. Is 0x2C 0x01 the number 300 or 556? That depends on endianness.