Binary Encoding Deep Dive
From key-value pairs to bytes on disk
When you write db.put("apple", "red fruit"), how does that become bytes on disk? The OS doesn't understand "keys" and "values" - it only knows read() and write()of raw bytes.
The Fundamental Problem
If you write two strings back-to-back, how do you know where one ends and the next begins?
We need a framing protocol - a way to encode boundaries into the byte stream.
Common Solutions
1. Length Prefixing
Write the length before each piece of data. The reader knows exactly how many bytes to read.
2. Delimiters
Use a special byte (like \0 or \n) to mark boundaries.
3. Fixed Size
Every field is exactly N bytes. Pad shorter values, truncate longer ones.
VarInt: Variable-Length Integers
How do you encode the length itself? If lengths can be 1 to 1,000,000+, using a fixed 4 bytes wastes space for small values.
The VarInt Trick
Use 7 bits of each byte for data, 1 bit to signal "more bytes follow".
Green (0) = last byte,Red (1) = more bytes follow
See It In Action
Logical Structure
Binary Output
How RocksDB Does It
RocksDB uses length-prefixed encoding with VarInts. Each key-value entry in a data block:
Key Takeaways
- Length-prefixing is the standard way to frame variable-length data
- VarInt saves space by using 1 byte for small numbers, more for larger ones
- Binary format design affects both space efficiency and read/write speed
- Understanding binary encoding helps you debug corrupt data and optimize storage