Archive for the ‘Databases’ Category

Sharding for the masses: Introducing the SPIDER storage engine (OpenSQLCamp @ FrOSCon)

This is the Sharding for the masses: Introducing the SPIDER storage engine by Giuseppe Maxia, given at OpenSQLCamp, at FrOSCon, in August 2009. These are somewhat live notes, and the slides are available too.

Why sharding? Scaling, of course. The MySQL way to solve this, is replication (even Yahoo! and Google use this).

When the master doesn’t have enough resources to cope with what you do (i.e. large data sets), replication chokes.

You can use proxies for sharding. There exists MySQL Proxy (can be programmed using a scripting language – Lua), HSCALE (built on top of MySQL Proxy), SpockProxy (a fork of MySQL Proxy, without LUA scripting, specialised for sharding), in the market these days. This however, is the single point of failure – everything has to pass through one proxy.

Enter SPIDER – a MySQL storage engine, built on top of the partitions engine. It associates a partition with a remote server, and is transparent to the user. Its developed by Kentoku Shiba.

Installation: Get 5.1.37 sources, then get the source code for Spider 1.0, and then get the patch for condition pushdown.

Why the condition pushdown patch? Remote server works less, by receiving the condition. The SPIDER engine without the condition pushdown patch is still fast, but it can be more than 10x faster with condition pushdowns.

http://dev.mysql.com/doc/refman/5.1/en/condition-pushdown-optimization.html (works with NDBCLUSTER), http://dev.mysql.com/doc/refman/5.4/en/condition-pushdown-optimization.html (works with MyISAM). The patch by Kentoku, will add cond_push and cond_pop, to ha_partition – so now, every storage engine that uses table partitioning can get condition pushdown through ha_partition.

You need to setup the engine first: http://datacharmer.org/downloads/spider_setup.sql (the SQL is also available in the DOCS).

spider_remote_employees.sql – use this in conjunction with http://launchpad.net/test-db/ – a good example of how to use the SPIDER storage engine.

MySQL Labs

Who remembers snaps? This is the place to go, when you wanted nightly source code snapshots of stuff that comes out of MySQL AbSun Microsystems build systems, that is related to the MySQL product line. There you can get all the snapshots for GA releases, as well as archives; (albeit not very up-to-date). image of MySQL Labs

Anyway, its good to know now there is a focus, just for server snapshots, available at MySQL Labs. These are testing builds, that come out directly from pushbuild (the build system). Its not for production use, but what’s really useful is the fact that there’s also a recommendation to use the MySQL Sandbox, written by Giuseppe Maxia. Today, I see builds for 5.1 and 5.1 with GIS extensions.

In due time, I expect 5.4 and 6.0 builds to make its way. This should help QA a lot more as well, as people start playing with daily builds, and finding bugs. In case you’re wondering why there have been no updates in a couple weeks, just hang in there. Its currently manually pushed, but will soon be done automatically via cron(8).

RethinkDB all the rage today

RethinkDB is all the rage today, as its a Y Combinator funded startup, which also launched a developer pre-alpha today. So what is RethinkDB you ask? Yet-another-MySQL-storage-engine, that’s what. But this time, its tuned for solid-state drives (SSDs), which also happen to be all the rage these days.

Anyway, check them out more, and the materials currently tell me that they’re using append-only algorithms, which allow for live schema changes and hot backups, with instantaneous recovery from power failure. Those are just some of the exciting bits.

What didn’t excite me so much was the fact that you were only getting 32-bit or 64-bit Linux binaries, built against MySQL 5.1.31 and you’ll just install it via the INSTALL PLUGIN option. But they are trying to get some semblance of a community growing, with their getting involved page, filled with some papers, as well as a support mailing list (I see Mark Callaghan is already busy asking them questions). And of course you can follow them on their blog, or on Twitter. All this without source ;-)

One of the developers also confirmed that they’re adding “features required by WordPress so we could eat our own dogfood”. They haven’t started profiling (much yet?), and they’ve probably got ways to go on performance. Seems like “getting it working for WordPress”, is slowly becoming a good testing ground – Jeff Waugh did so for WordPress and Drizzle, too.

Anyway, it seems like its time to get some SSDs, as we start seeing things like this pop up. RethinkDB will also face another problem for mass adoption – how many hosting providers are using SSDs? Probably not many (if at all).

Have you tried RethinkDB? Your thoughts?

What Wikipedia looks like when their database goes away


This wiki has a problem

An unknown error connecting to MySQL on 10.0.6.28? Oh dear me… It came back up within 2 minutes though from the time I got the screenshot.

Some Planet MySQL changes over time

First up: I normally read Planet MySQL from an RSS reader. I am assuming you do too. But in the interest of all the new features that the website itself has, I thought I’d take some time to talk about them in brief (to respect your time). Needless to say, go forth and check out Planet MySQL if you’ve not been there in a while.

A change in URL

We used to go to http://planetmysql.org but now it redirects to http://planet.mysql.com/.

Its also worth noting that from a Google PageRank of 8, Planet MySQL has dropped to a PageRank of 6. One wonders why?

Voting

Planet MySQL voting internalSometimes, Planet MySQL has got totally unrelated posts. They might be press releases that no one likes, or a post like this, which talks about Planet MySQL. Planet MySQL is after all touted as “Your blogs, news and opinions”, so I guess this post is in line with that. So unless I’m blatantly selling sexually enhancing drugs, or talking about thing that are unrelated to MySQL, I don’t think one should be voting down such a post. Anyway, I digress.

You get 5 votes per 24-hour period. This is to probably prevent gaming the system. And there’s visual appeal too, to voting. Planet MySQL - voting

Anyway, you can see the top voted (last week), and top voted (last month), on the left hand column of planet now. So if you want something to be located prominently, start getting your friends to “digg” the site ;)

Searching

searching
A feature that is not talked about often enough… Everytime the planet crawls your site (:20 minutes off the hour, if I remember correctly), it slurps the entire page. Sure, it displays it as an excerpt (so we don’t take away Google juices from you), but we do keep a massive archive of all planet postings… So even if your blog goes away from the Internet, it will always live on in the archives.

Its worth noting its searchable! So maybe you read something that’s interesting two years ago, but have no idea what its about. You can search, even by tags. Wow! I think this could be a pretty useful feature, if using Google itself, has failed you…

Tags

In WordPress, there are options for “tags” and “categories”. Tags only came around relatively recently, it was always categories previously. Now, SimplePie reads those tags when it slurps the entire page, and it gets displayed. When logged in, you can even edit the tags. I tried this, but it didn’t seem to quite work yet, even when logged in. Maybe I just found a bug, let me report it…

Buzz

Have you seen MySQL Buzz? Its a dashboard, showing you where people talk about MySQL in its entirety – voted entries, forums, Google love, and those fabulous word frequency clouds!

Today, I noticed that on Planet, we’ve seen famous words like: data, database, time, server, innodb, chrome, table, performance. In the Forums, its quite different: string, long, table, null, insert, code. Ha!

Anything else?

Go forth and vote for this post, if you’ve read this far. Next time, I’ll talk about the Librarian after I’ve tried it out. A post like this would never make it there (since this is very community focused), but think of it like a knowledge base/bank and it can be rather powerful.

Google Summer of Code in the mid-term

We had 12 projects, and by the time we’ve hit mid-terms, we’ve decided to cull 2 project so far, leaving us with 10 projects.

This year, the MySQL project can really divide itself into three groups – those hacking on MySQL, Drizzle, or phpMyAdmin. Next year, will we see others? I certainly hope so…

Drizzle – Padraig O’Sullivan is doing an excellent job at working on a new implementation of the INFORMATION_SCHEMA. Nathan Williams is doing great work at code cleanup for Drizzle, and making it conform to C++ standards. Jiangfeng Peng is hacking on batch nested loop join’s in Drizzle.

phpMyAdmin – Derek Schaefer is adding import improvements to phpMyAdmin, while Tomas Srnka is working on adding MySQL Replication support for phpMyAdmin (and impressing his mentor!). Zahra Naeem is working on change tracking of data/structures, and you’d expect some more work after the mid-term, once some problems are worked through.

MySQL – Joseph Lukas is working on new commands to allow the changing of session variables temporarily as needed in a query. Haihao Tang is working on WL#4034, which is focused around the I_S storage engine. Tulay Meuzzinoglu is working on an SQL optimiser for mod_ndb, and there’s a lot of good stuff already going into the codebase, as is.

Common problems? Timezones and language barriers. How do other open source projects deal with this?

Much thanks to all the mentors who are doing a great job! If you want to keep track, either watch the individual Launchpad accounts, or check out the summer of code list for weekly progress reports


i