How Bloom Filters Work

A probabilistic data structure for set membership

Side Quest~15 min

A Bloom filter answers one question: "Is this element in the set?"But it can only say "definitely no" or "probably yes".

The Problem

Without bloom filters, checking a missing key across many SST files causes many useless disk reads.

Without Bloom Filter

You may do 100 useless reads for one missing key.

With Bloom Filter

Most files are skipped in memory before touching disk.

Try It Yourself

Bloom Filter (32 bits, 5 set, 16% fill)Keys: apple, banana
0
1
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
1
0
0
0
0
0

Why False Positives Happen

Different keys can hash to overlapping bit positions. The tradeoff: tiny memory use for occasional extra checks.

The Math (Optional)

False positive probability:
p ≈ (1 - e-kn/m)k
Optimal hash function count:
k = (m/n) × ln(2) ≈ 0.693 × (m/n)

Key Takeaways

  • Bloom filters trade accuracy for space.
  • No false negatives - if it says no, it is no.
  • False positives are acceptable when they only trigger a cheap extra check.
  • In RocksDB, bloom filters help skip SST files that definitely do not contain a key.