Archive for the ‘Databases’ Category

DotOrg: WordPress, Eventum visits

I spoke to Matt and Barry today, and it was great to see them at the DotOrg Pavilion at the MySQL Expo, since the last time we caught up was at WordCamp 2006. Since WordCamp, WordPress.com is now spanning something like 900,000+ registered users! That number used to be over 300,000+, just a few months ago, so it looks like they’re really popular.

While Barry entertained a visitor, Matt and I got to talking about growing companies. He’s really happy with the size of Automattic, and is going to try for as long as possible to keep the company size, under fifty. He’s also found it interesting that some people are running WordPress 1.2 (ick! security holes galore), and while I worried that the database itself might not be migrate-able, he mentions that going from 1.2 to 2.1 should be no problem at all. Speaking of releases, a new WordPress is just around the corner.

I just love updating WordPress. He mentioned that there’s a plugin that downloads the new package, unpacks it, installs it, and does it all on the fly, but it scares him, due to the recent break-in. Well, it seems fair, and I too wouldn’t want such a plugin, but a more automated way of doing things, would rock, though. For what it’s worth, WordPress 2.2 will allow you to disable all plugins at one button click, so that should be useful (and yes, this means I can script it easier). And now, they run an md5 like every minute on WordPress to make sure the release doesn’t change!

I’ve also been advised to use SVN for deploying WordPress. I’ll definitely look into it soon, as I want to get everything into revision control eventually.

I also got to speak with Bryan Alsdorf, Eventum developer, and got a nice run down of the history of the software. More love needs to be given to this tool, as its great, is basically the infrastructure that runs MySQL Support, and others will definitely find this useful too (if in the support business). I learnt some new things, the fact that there is also a command line interface to it, makes me like this software even more. Again, part of the fun of working in a distributed environment – this is the first time I’ve spoken at length to, with Bryan.

Got so many more people and DotOrg folk to visit tomorrow. Well, enough writing now, its time to go visit the DotOrg reception to grab some drinks and food.

Technorati Tags: , , , ,

“Open Source disrupts inefficient models and produces new wealth.”

Today’s keynote, by MySQL CEO Marten Mickos, was titled The Participatory & Disruptive Spirit of the Dolphin. Here are some random notes I took.

Tired                Wired
Packaged Apps        On-Demand
Closed Source        Open Source (jboss, php, mysql, apache, linux)
Complex Hardware     Commodity Hardware

Most innovative companies, are clearly wired, and are enjoying the technology market shifts. Small players can now swim around the big players.

“Open Source disrupts inefficient models and produces new wealth.”

Star Wreck – Finnish dry humor, production by amateurs, and exteremely popular. Catch it on YouTube?

“We hope you’ll travel to MECA” – MySQL Enterprise Connection Alliance. Probably not the most sensible comment, but it does humor in it ;-)

Serve the underserved – offer flying to those that would otherwise take a bus. Offer a database to people that would rather use a filesystem.

A smarter way to produce the goods and distribute them, this causes disruption.

2007 MySQL Applications of the Year

  • YouTube: #1 in online video
  • amp’d mobile: #1 in 3G mobile entertainment
  • Adobe: #1 in creative software (embeds MySQL in Acrobat and CS)

2007 MySQL Community Awards

  • Martin Friebe: quality contributor of the year. Everyone’s talking about his under-3 hour contribution in where his bug, was verified and fixed so quickly… Yet there are patches awaiting for years!
  • Paul McCullagh: code contributor of the year, thanks to the PBXT Storage Engine. Yes, PBXT is the hotness, its a single-man show that is a true community storage engine, that is feature laden, and performs really well!
  • Sheeri Kritzer: advocate, communicatior and facilitator of the year. Beware the SHE-BA, she’s got great blog entries, a superb podcast, and does so much cool things for the Community.

Never heard of amp’d, but its great to know that Adobe uses a lot of MySQL – Creative Suite 2, 3, and even Acrobat. That just rocks.

YouTube is clearly cool, I got a t-shirt when I visited their Google booth, and I was lamenting to them that they’d become just like Google in terms of recruitment – they spam! Apparently they’re looking for MySQL people, so if you’re after a job, and like online video and YouTube, they’ve got jobs awaiting. I on the other hand, walked there with my Senior Director of Human Resources, so we had a good chuckle :-)

Technorati Tags: , , , ,

MyODBC not showing in drivers list on Mac OS X

Today I missed a bunch of good talks that I was hoping to attend, because I was figuring out a problem at the Guru Bar. Offending criminal: MySQL Connector/ODBC 3.51. Offending OS: Mac OS X/PowerPC.

OS X comes with an ODBC Administrator. Once you unpack the MySQL package, and the .pkg installs itself, you’ll find that all your files are installed in /usr/lib. You need to fire up ODBC Administrator, click on Drivers, and Add the driver. Give it an appropriate Description (MySQL), provide the location of the driver file (/usr/lib/libmyodbc3.dylib), and define it as available within the System (this enables you to enable at the System DSN as well as the User DSN, in the next step).

Later, etiher add as a user or system DSN, one for mysql. The keyword/value pairs are such that it should be: server/localhost, port/3306, database/test.

The caveat with all of this, is that you actually need to have /usr/lib/libltdl.3.dylib present. Because libmyodbc3.dylib references it, and if you don’t have it, it will fail rather beautifully. How do you get libltdl.3.dylib? Get XCode! If you don’t have your install discs around, get it from the Apple Developer Connection. Beware, its a 923MB download (now you see why I missed not only a talk – large downloads at the conference tend to break, duh!).

Testing the ODBC connection? Make good use of /usr/bin/iodbctest. We were of course doing some odd things with Microsoft Excel (ick!). Once /usr/lib/libltdl.3.dylib is installed, the ODBC connection magically works. With regards to Excel, the external data source will just show up and thus, you can use it. If you didn’t define the keyword/value pairs in the ODBC Administrator, you can always do it in Excel (however, running iodbctest will then fail).

Why was this not discovered earlier? Probably because developer tools are a really common thing to have installed. But users, tend to not have XCode installed, by default. And OS X doesn’t have packaging guidelines, unlike sensible RPM/DEB. When I get back, I’ll see if its possible to ship this missing bit otherwise get the Documentation team to update documentation…

Bottom-line: Make sure XCode is installed if you’re going to use Connector/ODBC.

Technorati Tags: , , , , , ,

Scaling Twitter: “Is Twitter is UDP or TCP? Its definitely UDP.”

Presented by Blaine Cook, a developer from Odeo, now probably CTO of Twitter (Obvious Corp spawed, I think). There’s a video and slides (yes, you need evil Flash so I haven’t viewed it myself). Then there are my notes… possibly with some thoughts attached to them. No, they’re not organized, I’m too busy and tired…

Rails scales, but not out of the box. This will cause Twitter to stop working very quickly.

600 requests/second, 180 rails instances (mongrel), 1 DB server (MySQL) + 1 slave (read only slave, for statistics purposes), 30-odd processes for misc. jobs, 8 Sun X4100s.

Uncached requests in less than 200ms in most of the time.

steps:
1. realize your site is slow
2. optimize the database
3. “Cache the hell out of everything”
4. scale messaging
5. deal with abuse
6. profit.

Have stats (something Twitter didn’t have before): munin, nagios, awstats/google analytics (latter doesn’t obviously work if your site itself doesn’t load), exception notifier/logger (exception logger is what they use at Twitter, so you don’t get lots of email :P). You need reporting to track problems.

Benchmarks – they don’t do profiling, they just rely on their users! What torture for the poor users…

“The next application I build is going to be easily partionable.” – Stewart Butterfield
Dealing with abusers…
Inverse spamming – The Italians – receiving SMS gives you free call credits!
9,000 friends in 24 hours doesn’t scale!
Just be ruthless, delete said users. This is where you thank the reporting tools, to allow you to detect abusers.

They’ve looked at Magic Multi Connections, it looks great, but it wouldn’t work for Twitter.

Main bottleneck is really in DRb and template generation. Template optimizer that Steven Kays wrote doesn’t work for them.

Twitter: built by 2 people first. And now, they’re just 3 developers.

When mongrels hit swap, they become useless. So turn swap off.

Twitter themselves don’t seem to want to give out details of how many users, etc. they have. Shifty, beyond the fact that they claim its “a lot of users”.

Twitter is not built for partitioning. Social applications should be designed to be easily partionable. WordPress, anything 37signals builds, tends to be partionable. Things start becoming hairy when you have 3,000+ friends!

Index everything – Rails won’t do it for you, but you need to repeat for any column that appears in a WHERE clause.

Denormalize a lot – heresy in the rails book? but he hopes not. This is single handedly what saved Twitter.

They use InnoDB. Don’t do status.count() when there’s millions of rows… it’ll stop working. MyISAM will be faster, but still, don’t.

email like “$#!$” – search. Twitter has disabled search right now… This makes their database enjoy life.

Average DB time is 50ms (to at most 100ms)

They’re not hurting on the DB. The master DB machine is at a quarter CPU usage. So they don’t see the need to partition at this point.

Twitter does a lot of caching, they use MemCache. If you really need status.count() use memcache.

Query for friends status on your Twitter homepage, is a complicated query using a lot of JOIN. They use ActiveRecord, they store the status in memory, and they don’t touch the DB. They plan to use memcache in the future for the statuses too.

ActiveRecord objects are huge (which is why its not stuck in memcache yet). They’re looking at implementing ActiveRecord nano or something simiar – smaller, store in cache critical attributes, and use add method missing if you don’t find what you’re looking for.

90% of Twitter’s requests are API requests. So cache them. No fragment or page caching on the front-end, but for API requests, lots of caching.

Producer(s) -> Message Queue -> Consumer(s)

DRb: zero redundancy, tightly coupled.

They use ejabberd for Jabber server.

When the Jabber client went down, everything went down. So they moved to using Rinda. Its O(N) for take() so if the queue has 70,000 messages, you just shut it down, restart it, and lose those 70,000 messages. Sigh.

“Someone asked if Twitter is UDP or TCP? Its definitely UDP.” — Blaine Cook

LiveJournal has a horizontally scaled MySQL, that is just MySQL + Lightweight Locking. RabbitMQ (erlang) is something they’re looking at, quite clearly, but it looks ugly, and they don’t want to possibly implement it.

Starling was written. Ruby, will be ported to something faster. Does 4000 transactional messages/second, will have multiple queues (like a cache invalidation one), speakes MemCache (set, get), writes it all to disk. First pass was written in 4 hours, and its been working fine for the last few days (i.e. since Wednesday). Twitter died on Tuesday at the Web 2.0 conference! Starling will probably be open source.

Use messages to invalidate your cache.

Dealing with abusers…
Inverse spamming – The Italians – receiving SMS gives you free call credits!
9,000 friends in 24 hours doesn’t scale!
Just be ruthless, delete said users. This is where you thank the reporting tools, to allow you to detect abusers.

They’ve looked at Magic Multi Connections, it looks great, but it wouldn’t work for Twitter.

Main bottleneck is really in DRb and template generation. Template optimizer that Steven Kays wrote doesn’t work for them.

Twitter: built by 2 people first. And now, they’re just 3 developers.

When mongrels hit swap, they become useless. So turn swap off.

Twitter themselves don’t seem to want to give out details of how many users, etc. they have. Shifty, beyond the fact that they claim its “a lot of users”.

Technorati Tags: , , , ,

MySQL Conference 2007 schedule is *packed*

Have you taken a look at the MySQL Conference 2007 schedule yet? With just one day to start, I’d advise you to take a gander. So many interesting things, that my only complaint (well, a suggestion) is that I hope that these talks get a video recording and they should be given to attendees via the web. Apple have done this for WWDC, and linux.conf.au did a great job in 2007 to record every session.

Why video recording? Because each block of time, have 8 sessions, in where about 3-5 sessions on average can be interesting. Last I checked, I couldn’t split myself.

Sure the slides will make it online eventually, but the talk itself is where most interest really is at, I believe.

6-8% of folk don’t want MySQL as a default back-end

At the 2nd Annual Silicon Valley Ruby Conference, about a dozen folk have stated that they don’t want MySQL to be their default database for Rails. I do presume that there might be 150-200 folk that have rocked up, so thats a small portion of the market, I guess.

From what I gather, some people have to integrate with other applications, and having two database backends, probably don’t make so much sense. For example, if you use GRASS for GIS mapping stuff, you’d not want your default web app database to be MySQL, right?

I do sincerely hope to meet them all in the next couple of days, to see what their concerns are, failing which there’s a who’s who in where we can still get contacted with later.

Then the distribution statistics. 40% on Mac OS X, 50% on Windows (ick), and a mere 10% on Linux (lots of OS X users do use Linux).

Technorati Tags: , ,


i