I caught Facebook – Needle in a Haystack: Efficient Storage of Billions of Photos on Flowgram. First up, I’m not a big fan of Flowgrams – the format is sensible, slide and voice, is excellent, but the delivery in a web browser isn’t optimal… make downloadable videos!
The talk however, was excellent. Do watch it, and learn a bit more about Facebook’s infrastructure. Anyway, some notes I took from the talk:
- “We’re one of the largest MySQL installations in the world”
- Use memcache – “We have memcache because databases aren’t fast” (later on in the questions)
- Separate team focusing on APE (Apache, PHP and Extensions that they work on)
- 6.5 billion total images, 4-5 sizes stored for each, so 30 billion files, of about 540TB total… During peak? 475,000 images served per second, and growing by 100 million uploads per week
- Images are usually pulled from a Content Delivery Network (CDN), so it reduces the request rate on their servers
- They use NetApp Storage, but basically their upload servers speak NFS to write to NetApp.
- Cachr (evhttp based) and File Handle Cache use memcache as a backing store… FHC is based on lighttpd!
- Makes use of a “haystack” – user-level abstraction, storing a separate index file that has more efficient metadata (to reduce disk seeks – 1 disk seek or less for any workload). Pretty deep in the discussion of the haystack server architecture, also evhttp-based
- MySQL use? Very few transactions, very few joins
- Video is a very different beast, and the design is a little different
If you’re into information about photo storage sites, don’t hesitate to also read my previous notes on Flickr.