learn2scale – what’s up with Malaysian news sites? Will the cloud work for them?
Seriously kids, what’s with the lack of scalability? I’ve never seen CNN or the NYTimes go down on “trimmed” versions.
Is it a question of bandwidth? Is it lack of hardware?
Take for example, Malaysiakini (the first alternative news source in Malaysia, with a subscription model built around it). It runs FreeBSD, uses PostgreSQL, and has a CMS on top of it (so almost a LAMP stack right there). There’s even use of Squid for caching. Yet there’s lacking load balancing? This is where the cloud can come into play, when there’s high traffic.
Next up, The Malaysian Insider. They’re the new kid on the block. Its probably Linux, Joomla, and MySQL is confirmed. No caching (hello, memcached at some stage?). Looks like a one server operation. Again, if you want to start lean, scale to the cloud…
Of course, what takes the cake, is one of the most famous dailies, The Star. The .asp tells me they’re on some kind of Microsoft platform, and I don’t know how scalable that is (maybe with their live.com/livemesh goo). But for a major newspaper (ala the NYTimes equivalent in Malaysia), I’m surprised they’re too busy to serve us content.
Is it the fault of the applications
Is the next wave, getting open source applications to act in a scalable fashion? A CMS like Drupal or Joomla, how ready is it for instant scaling? After all, EC2 has persistent storage (I don’t know if Sun’s network.com offers this or not?).
It seems like there’s a lot of OpenSolaris images for EC2 and web stuff, at OpenSolaris on Amazon EC2. I see a Joomla AMI, for example. How easy is this to plug-in for something like The Malaysian Insider? How easy will it be for them to scale up their services (i.e. start more instances, but will Joomla load balance? What considerations must they make if they went this route?). Similar question for the Drupal AMI.
I’m thinking I need to spend some time playing with “the cloud” in due time… Any thoughts or pointers on this, are also graciously appreciated.
The problem is nytimes etc can all run distributed caching services for dns round robin configs with companies like akamai.
There’s no local equivalent…the nearest decent caching services to us are in Japan and that just don’t cut the mustard.
I remember CNN really applied the same system in the 90’s. They even had a name for it like Defcon 1, 2, 3 or something similar, don’t remember it anymore.
The idea was that if there was some big news that would triple or more the traffic on the site, first they’d strip away some images (and videos when those existed), then some css and text and finally even advertising was let go of. Just to keep the news site alive at all.
Who knows, maybe they still do it but it’s so classy you never notice it?
Since you’re looking at .My news sites what’s your 2 sen on nst, bharian and utusan?
it is perhaps of 2 reasons – cost and expertise? especially for those new kids on the block sites offering free news, i doubt they have the moolah to do more.
but in the future things might change… let’s hope.
I second @wahlau. Sounds like cost and expertise to me. Pareto’s Law here: more than 80% of the time they don’t need it, but they can’t justify the ROI of setting up, testing and maintaining a *working* cluster or cloud for the occasional spike in traffic.
Personally, I’m okay with it because they still fulfill the most important criteria: the news (text) got through, nevermind the fancy graphics and layout. Although they just need to slap something like AdSense on it to capitalize on the traffic.
Professionally, I see plenty of shortcuts they can take to handle the traffic. CPU/server would be dirt simple to scale up for a read-only site. Highly suspect bandwidth is the issue. Content type separation and a CDN, even for a local site, might be required. A manual CDN with servers in TMNet and Jaring might be required. Jaring datacentre/misc TMNet connection is often flakey enough that you get really high latency connections. You’d often get dropped packets that way when you reach your bandwidth cap, and frustrated users reloading the half-loaded page will exacerbate the problems.
Re: Amazon EC2
You may deploy those (drupal/joomla et al) on EC2, but from my limited knowledge of EC2, I doubt you’d get automagic scaling — EC2 make it easier to deploy lots of instances, but unless you get those instances synchronised, updates would be an issue.
Now, it may be possible to quickly deploying EC2 or otherwise images that are read-only mirrors (via a caching squid?) that sync with a centralized read-write site (the original). Then use DNS roundrobin or funky BIND9 Views/DNS Zone to direct requests to the shortest hop mirror. e.g. TMNet customers get directed to mirror(s) sitting in a TMNet datacentre, Jaring customers get directed to mirror(s) sitting in a Jaring datacentre…
All speculations, maybe fun to implement… “yclian?
Oops, supposed to link to http://blog.yclian.com/2008/07/second-day-of-barcamp-malaysia-awesome.html
Feel free to edit my previous comment, Colin. Need preview button for your comments!
Like some guys have pointed up there, a CDN is exactly what these guys need. Most major websites use CDN these days and I bet the local sites couldn’t justify the ROI to their bosses.
In terms of price, there’re actually a number of affordable CDNs in the market, such as Cachefly. I don’t exactly know how S3 works (it’s not a CDN and it is not highly distributed), but it’s not a bad idea to use it to store your large content.
It also depends on what exactly is the problem that you are trying to solve. It could be a solution to global delivery, speed, reliability/availability, and/or flash crowd that you are looking for.
Me.. being a MalaysiaKini’s reader has been seriously bugged by the site’s availability. I would introduce them Aflexi (an even cheaper solution, that my company is building) when it is available. :)
– yc
Well, not necessarily a cloud is needed to handle traffic like that, the sites can deploy a huge cluster of servers to support the traffic; however, cloud or CDN would definitely be one of the ways for them to scale up easily.
With EC2 and Google cloud emerging, they are meant to allow enterprise to scale infrastructure instantly, simply allow you to sleep well at night and dream about traffic booming 100x yet without worrying the infrastructure.
Needless to say, cloud is definitely one of the ways for them. But again, I think Amazon does not have a cloud in Asia, which might be slower in terms of loading time.
Whei Meng Wong
CEO
Aflexi Sdn Bhd