Numbers Have Secrets
The number 300 can be [0x2C, 0x01] or [0x01, 0x2C] or [0xAC, 0x02]. Endianness, alignment, and varints.
00000001 00101100
All representations of 300
The ambiguity problem
The byte sequence 2c 01 has multiple valid interpretations:
- → as u16 little-endian: 300
- → as u16 big-endian: 11265
- → as varint: 44
The schema must specify: what type? what endianness? Otherwise the bytes are meaningless.
Interesting examples
Varint (LEB128) algorithm
The high bit signals more bytes follow. The low 7 bits carry the value, LSB first.
Key Insight
Every encoding of a number is a tradeoff: space vs speed vs range vs complexity.
Fixed-width (u32, u64)
Constant decode time. Random-access. Wastes space for small numbers.
Varint (LEB128)
1–5 bytes. Dense for small values. Sequential decode only. Used by RocksDB, protobuf.
Little-endian
LSB first. Native for x86/ARM. Better for partial int reads.
Big-endian
MSB first. Network order. Lexicographic sort works naturally.
With framing (Chapter 2) and number encoding (this chapter), you have all the primitives. Next: how do you combine them into a codec — an encoder and decoder for a complete data structure?