Archive for the ‘Databases’ Category

Sphinx user stories by Stéphane Varoqui

Stephane Varoqui, Field Services SkySQL, Vlad Fedorkov, Director of PS, Sphinx Inc, Christophe Gesche, LAMP Expert, Delcampe, Herve Seignole, Web Architect, Groupe Pierre & Vacances Center Parcs – this is a big talk!

Pros: Filtering takes place on attributes in separate tables. Rely on the optimizer choice. HASH JOIN can help (MariaDB 5.3). Table elimination can help (MariaDB 5.2). ICP Index Condition Pushdown can help (MariaDB 5.3/MySQL 5.6). Max 80M documents at Pixmania, all queries come in less than 1s using 128GB of RAM (MariaDB 5.2). At PAP.fr, there is 16GB RAM with MariaDB 5.2.

Cons: CPU intensive (replication with many slaves). Need covering indexes to cover various !filter !order. Join & sorting cost on lazy filtering.

The more indexes you have in the system, the more you need to increase the main memory of the server. Keep the Btree’s in memory.

What about denormalized schemas? Not really CPU intensive, just IO. Can go to disk, full partition scan with filtering taking place on record order using covering index. Can shard but not that easy. Use the spider storage engine or shard-query. Can use memory engine for isolation. There are cons like duplicate data, duplicate indexes, missing material views, merge index cost, impact on write performance, and can consume a lot of memory with many indexes.

MySQL can push hardware, so read less/do less/read serialized/map reduce to get better latency. Chose data type wisely, replace string with numeric, vertical & horizontal sharing, snowflake compression (combination of attributes, build a table of the combination and replace it with an ID). If you are lazy, just use Sphinx!

Sphinx is just another daemon that can serve queries. Its easy to setup, easy to scale, storage engine makes it accessible to current MySQL users, API in core MariaDB (SphinxSE), SphinxQL, SphinxSE is transparent to the application layer of MySQL protocol.

Demo done using the Employees DB.

Pierre & Vacances – Centerparcs. Free text search, they use MariaDB using Levenshtein UDF implementation. Went live 01/2011. First implementation of Sphinx (12 indexes). Its grown, they use PHP API. The new goal is to never send an empty result. 1 index per website/market, with a total of 15 million docs. Index built on standalone server. Using internal gearman job schedule to generate index before cache generated. Current monitoring is via Nagios & perl, but the next step is to use Monyog & MariaDB INFORMATION_SCHEMA plugin.

Delcampe is an auction website with 45M ‘active’ items. Its dedicated for collectors. 3 string fields, and 15 attributes. 40-120K new items daily. Started with mysql fulltext in 2007, moved to Sphinx in 2008. There was a need to have more filters. Now they have 5 sphinx servers + 1 MySQL server. HAproxy to load balance.

New MySQL 5.6 Features by Oli Sennhauser

First talk at FOSDEM MySQL Devroom by Oli Sennhauser, of FromDual. Quick notes/liveblog of the talk, plus links from a quick search.

New Release Model: starts with at least in beta quality, milestone releases are RC quality (every 3-6 months), between milestones new features are allowed, GA releases every 12-18 months, no more than 2 releases in active support. There is also MySQL Labs, in where features can make it into a release model, but not necessarily.

Oli’s guess: MySQL 5.6 GA in April 2012 (Collaborate) or June 2012. As a consequence, MySQL 5.0 and MySQL 5.1 will be EOL probably by April this year.

New improvements in partitioning: explicit partition selection, exchanging partitions (good for ETL jobs).

New improvements in InnoDB: InnoDB INFORMATION_SCHEMA has got some new entries. Buffer (INNODB_BUFFER), Fulltext (INNODB_FT), Metrics (INNODB_METRICS), Data Dictionary (INNODB_SYS). The InnoDB Monitor is now obsolete. innodb_purge_threads can be set > 1. Kernel mutex split which should lead to improved concurrency. Persistent optimizer statistics (since InnoDB used to do random dives, so when you do a mysqldump, things can be different) – you can do SET GLOBAL innodb_analyze_is_persistent = 1; (read more: InnoDB Persistent Statistics at last, InnoDB Persistent Statistics Save the Day).

New improvements in optimizer: ORDER by on non-indexed columns, Multi Range Read (MRR), Index Condition Pushdown (ICP), query execution plan for DML statements, Batched Key Access (BKA). There is also optimizer tracing.

There is now some new instrumentations in Performance Schema (introduced in MySQL 5.5). There is also improved replication features, but that will be covered in a later talk.

Fractional seconds (microseconds) is introduced in MySQL 5.6. GET DIAGNOSTICS for stored procedures. Pluggable authentication (socket). memcached/InnoDB still in labs, hasn’t made it to a milestone release. Some 400+ bugs have been fixed. Lots of cleanup’s (old unused variables, commands removed).

Pay attention to MySQL 5.6 incompatible changes when you plan to upgrade.

Probably some good resources: What’s New in MySQL 5.6, MySQL 5.6 (Early Access Features) – InnoDB & Replication.

MariaDB/MySQL users in Paris & Brussels

I’m about to head to Paris to present at the February meetup of the MySQL User Group in Paris, France. It happens 1st February from 6-8pm at the Patricks Irish Pub. Its free to attend, and I understand that SkySQL keeps this event afloat.

I’m also heading to my first FOSDEM right afterwards and will definitely hang out at the MySQL & Friends Devroom. There is an amazing lineup of speakers, with all talks being about 25-30 minutes, it looks like it is going to be a lot of fun. To boot, Michael “Monty” Widenius will also be there, so expect lots of Salmiakkikossu.

If you want to keep track of where Monty Program folk are going to be to talk about MariaDB, make sure you’re subscribed to our news page, which also includes important release information. Pretty much every conference that we plan to attend (and have attended) is at the conference page.

I am looking forward to meeting & learning from many MariaDB/MySQL users!

SCALE 10x – there’s lots of MySQL there!

I’m just about to get on a plane to head to my inaugural SCALE event. It’s their tenth year running!

In a world filled with NoSQL related media, its kind of nice to see that on Friday January 20 2012, we have a MySQL room right next to the PostgreSQL room (schedule). It is awesome to see that the track will have participation from Oracle, Monty Program Ab, and SkySQL Ab.

On Saturday for the main tracks, I’ve got a talk about the growing MySQL diaspora (just got larger this year in case you haven’t paid attention to the packaged up Galera product!). This one is a constant work in progress and I’m hoping to complete research closer towards March ’12.

Monty Program and SkySQL are also sharing a booth in the expo hall, so come by booth #65 for some interesting schwag (t-shirts, poppers, etc.). Looking at the schedule lineup, I’m surprised I’ve never ever been to a SCALE before – looks totally awesome. See you in LAX (well, we’re so close-by the Los Angeles Airport :P)

A Tale of Two Conferences

Big BenLast week was a bumper week in London for MySQL users, DBAs & developers. We had the Oracle MySQL Developer Day and Percona Live London 2011. Both events were sold out, bringing in a good 300+ people to each event. From what I could tell the crowds were quite unique, so thats a good 600+ people interested in MySQL in London. The death and unpopularity of MySQL is greatly exaggerated.

At Oracle’s event, we naturally only had Oracle presenters. There was Simon Deighton (Sales Manager), Tony Holmes (Sales Consultant), Luca Olivari (Sales Consulting EMEA from the MySQL days), Andrew Morgan & Mat Keep for MySQL Cluster & High Availability. The event was actually pretty good if you were a MySQL beginner to intermediate user (that seemed to be the target audience — about 1 person was playing with 5.6, and about 1% of the audience was already using 5.5). The Q&A sessions were of high calibre, and answers obviously only pointed towards Oracle products.

At Percona’s event, we had wide and varied speakers, but an absence of Oracle. The crowd were already users of MySQL who wanted to get a lot more out of the database servers. It also served five tracks, so attendees had a lot of choice and value to choose from. There was an absence of beginner-centric talks, so one could get lost quite easily if you were sent there just for training. I already said I had an awesome time there.

The way I see it is Percona Live was meant for practitioners, while the Oracle MySQL Developer Day was meant for beginners to intermediate users of MySQL (they were probably already experienced Oracle DBAs). These kind of events are both important as you get a spread spectrum of people attending conferences. You can never really please all attendees at a large event, and in many ways it is always a balance you strike at large events like the O’Reilly MySQL Conference & Expo.

All in, London was abuzz with MySQL. Both events were out in the Tower Hill area. It is clear that MySQL and its diaspora are alive and kicking, and its quite possible the community of users are also growing.

Percona Live London 2011

I was at Percona Live London 2011 these past two days. Very interesting conference. Good work Peter & team — you’ve managed to gather a good 300+ people at one venue in London. So full was the venue, that during today morning’s keynote I had to sit in the spillover room and miss out on Peter calling out my name :-) (no, Stewart and I were not drinking at 9am!)

Gave my session titled Why MariaDB? (slides). Pleasantly realized that there were many new faces. Better still, everyone has heard of MariaDB in the room. More interestingly is that a bunch of people are now also using MariaDB in production!

Had to rush through the last few slides (about how open we are, the worklog, knowledgebase, etc.), but you don’t have much time in 30 minutes so you have to be succinct! The slides are attached.

 


i