Keeping social networks up and running is no easy task. Today's casualty, Twitter is down again. I suspect that their engineers are in the "war room" thinking about what the root cause is and resolving it quickly. Perhaps it's another denial of service (DoS) attack.

Why are performance issues hard on social networking sites? A few reasons:

  • Content sites are easy to cache. Social networks have updates happening more frequently from many users in many different ways.
  • Managing relationships in databases is still a "hard" problem. Actions from one person affect many users in unique ways. I tweet, my 500 followers get updates in real time.
  • Performance is hard to monitor. Usage patters from a small subset of users can undermine performance for many users
  • Security is a battle. The more successful networks are prone to more complex attacks, but even smaller networks have a never ending battle with spamers.
  • Performance considerations are different depending on the type of user. How to scale millions of Twitter users with few updates/followers is a very different problem than making it perform well for the 1% of users with significant tweet'ing and followers.
