The Encoding Problem
If you write 'hello' then 'world', you read back 'helloworld'. Where's the boundary? Learn framing strategies.
Value A
"hello"
Value B
"world"
Byte stream on disk (10 bytes)
⚠ Boundaries lost!Pros
+ Zero overhead
+ Trivial to write
Cons
- Cannot tell where one value ends
- Requires external schema
- Impossible to decode independently
Framing strategy
Rust
// WRONG: boundaries are lost file.write_all(a.as_bytes())?; file.write_all(b.as_bytes())?; // Reader sees: "helloworld" — which part is "hello"?
Key Insight
When you write "hello" then "world" with no framing, the file contains 68 65 6c 6c 6f 77 6f 72 6c 64 — 10 bytes with no boundary information. The reader sees "helloworld" and has no way to recover the original two values. Framing is not optional; it's the contract between writer and reader.
Every real format picks a strategy: protobuf uses length-prefixes, JSON uses delimiters (}/,), SQL databases use fixed-width or length-prefix. The choice is a tradeoff — and it's irrevocable once data is written to disk.
Next problem: even with framing, numbers are ambiguous. Is 0x2C 0x01 the number 300 or 556? That depends on endianness.