Keeping citizens data in local data centers

Moscow has just made waves by trying to tell web services at personal data of local Russians needs to be stored on Russian soil – see: Moscow seeks to tighten grip online (it’s an FT link which is behind a paywall, but surely a search will find similar articles).

Not long ago, Brazil tried to do this, but the idea was quickly scrapped. The FT take is that this is a means of trying to control citizenry. And of course it’s deadly costly to companies like Google or Facebook.

I don’t see it that way (though it may very well be the main motivator). Keeping local data in local data centers can mean a lot for a country.

One of the strong points of the open source economy is that there is sovereignty preserved. It’s a selling point. Keeping data local (I.e. Shards of data) means that he data doesn’t leave the borders. It keeps it out of prying eyes of international spy agencies. It also means faster access for local citizenry – you can’t beat faster than a local DC.

What else is good with such a move? Instead of just edge nodes being run by several engineers, you get the full benefit of more jobs being created and more data centers being operated.

Yes, the costs go up for international companies. Also there’s a barrier to entry or immediate expansion into said markets. Middlemen who partner with these foreign companies will definitely likely reap some rewards – think of the franchise model (recent examples have KakaoTalk in Korea, with local partnerships say in Malaysia).

But it may also prove to spur local entrepreneurship. You create local clones of services, and maybe one day there is an exit to the international company (think group buying clones and Groupon). Or like in China you create new classes of millionaires and billionaires (by censoring and blocking foreign websites).

Today with the cloud as a back end, people in Korea ask why their data is stored in Tokyo or Singapore points of presence (taking the Amazon example). Why do Malaysians have to only benefit from edge locations and a small employee base and data centre usage, with all major data stored in Singapore?

Friendly (efficient) nations thus lose monopolies on being regional hubs. Who suffers? The shareholders of these international firms (increased cost of doing business – policy wonks, more operations, etc.). But when you’re a large market like Brazil, Russia, or China, shareholders will demand that you not only enter, but conquer those markets – valuation is added for growth potential.

Overall I’m not entirely sure this proposal is a bad idea from a local economic standpoint.

PERFORMANCE_SCHEMA disabled in MariaDB 10.0.12

Astute readers of the release notes for MariaDB 10.0.12 will notice that there is a line that reads: performance_schema is now disabled by default.

We didn’t come to this decision by accident. Recently at the SkySQL company meeting in Budapest, we did have some time to break out into our usual working teams to talk about our daily operations. Team MariaDB had a debate about PERFORMANCE_SCHEMA and how it was left on by mistake in 10.0 GA as there was a decision to turn it off. Personally, I don’t like introducing such changes in a GA release, and there was no archive of such a discussion, so the next best thing to do was to ask the MariaDB developers and users via a post to both maria-developers and maria-discuss (3 June 2014) and to ensure that a Jira ticket existed (MDEV-6249).

But first, let’s delve into a little background information for context of this discussion. Elena started investigating a query performance issue reported on IRC, and she found that with default settings the query performance dropped tremendously with PERFORMANCE_SCHEMA enabled. We had seen that the WebScaleSQL folk had disabled PERFORMANCE_SCHEMA and there was discussion on the mailing list about a loss in performance (“Our perf testing agrees with your assessment (we see about a 5%-6% perf hit when it’s included and on, and a 2%-3% hit when it’s included but off)”).

On the maria-developers and maria-discuss lists, no one complained about having such a change. Hence the decision to disable it by default now. The alternative to this in MariaDB that comes without performance overhead is user statistics. I read a comment somewhere that there is constant evolution of PERFORMANCE_SCHEMA but the user statistics haven’t changed in a while (since MariaDB 5.2), though reliable sources tell me that more work is being done on this. Do MariaDB users want to see evolution of user statistics?

So from a user standpoint, the best way to find out if PERFORMANCE_SCHEMA is ON or OFF is to run SHOW VARIABLES LIKE 'perf%';. On MariaDB 10.0.11, you will see that it is ON but on 10.0.12 you will see that it is OFF. If you are planning to enable it in a development environment, all you have to do is edit your my.cnf to have performance_schema=1 and restart your server to get to using it again.

It looks like the decision might have been the right one for the time being, looking at the recently resurfaced mysql#68514.

On-disk/block-level encryption for MariaDB

I don’t normally quote The Register, but I was clearing tabs and found this article: 350 DBAs stare blankly when reminded super-users can pinch data. It is an interesting read, telling you that there are many Snowden’s in waiting, possibly even in your organisation. 

From a MariaDB standpoint, you probably already read that column level encryption as well as block level encryption for some storage engines are likely to come to MariaBD 10.1 via a solution by Eperi. However with some recent breaking news, Google is also likely to do this – see this thread about MariaDB encryption on maria-discuss. 

Google has already developed on-disk/block-level encryption for InnoDB, Aria (for temporary tables), binary logs and temporary files. The code isn’t published yet, but will likely happen soon, so clear benefits of open source development principles. 

Elsewhere, if you’re trying to ensure good policies for users, don’t forget to start with the audit plugin and roles.

RHEL7 now with MariaDB

Congratulations to the entire team at Red Hat, for the release of Red Hat Enterprise Linux 7 (RHEL7). The release notes have something important, under Web Servers & Services:

MariaDB 5.5

MariaDB is the default implementation of MySQL in Red Hat Enterprise Linux 7. MariaDB is a community-developed fork of the MySQL database project, and provides a replacement for MySQL. MariaDB preserves API and ABI compatibility with MySQL and adds several new features; for example, a non-blocking client API library, the Aria and XtraDB storage engines with enhanced performance, better server status variables, and enhanced replication.

Detailed information about MariaDB can be found at https://mariadb.com/kb/en/what-is-mariadb-55/.

This is a huge improvement over MySQL 5.1.73 currently shipping in RHEL6. I’m really looking forward to welcome more MariaDB users. Remember if you are looking for information, find it at the Knowledge Base. If you’ve found a bug, report it at Jira (upstream) or Bugzilla (Red Hat). If you want to chat with friendly developers and users, hop on over to #maria on irc.freenode.net. And don’t forget we have some populated mailing lists: maria-discuss and maria-developers.

Downloading older releases of MariaDB

MariaDB has plenty of mirrors to download the latest versions of MariaDB. Typically mirrors carry the last couple of releases and the current release, but what if you wanted to access something much older?

You have two resources for the complete archive:

  1. http://archive.mariadb.org/ – this is the official archive, but from what I gather, it can be quite slow
  2. http://downloads.skysql.com/files/MariaDB/ – SkySQL provides a complete mirror and it is very fast so I would use this instead. 

Current download archives stand at 337GB. But you can feel free to test older releases, see when things got introduced, etc. 

Per query variable settings in MySQL/Percona Server/WebScaleSQL

Recently there was a discussion on the webscalesql mailing list started by Chip Turner on a proposed change to the MAX_STATEMENT_TIME patch. This feature has been known as per query variable settings (WL#681) and even shipping in Percona Server 5.6 as per-query variable statement.

This feature has piqued my interest since 2009, when the MySQL project (then owned by Sun Microsystems) participated in Google Summer of Code 2009, and we got code from Joseph Lukas to do just that (see his tree on Launchpad – lp:~jlukas79/+junk/mysql-server).

So code has been floating around since 2009. It never made it into a shipping release of any MySQL-based distribution till 24 October 2013 when Percona Server 5.6.14-62.0 was released. Percona’s syntax implementation was as suggested in WL#681. This got me curious as to if a feature is already shipping in a distribution of MySQL, what is the WebScaleSQL answer to things – is there a look at other branches or is compatibility from a user/DBA perspective only with focus on upstream?

I got my answer from Steaphan Greene. Very sensible, and a great direction to see how the companies involved can influence upstream MySQL and quite obviously the downstream distributions. It is of course great to note that this syntax improvement will probably be in MySQL 5.7.5 DMR (it already is in 5.7.4 DMR).

For what it’s worth, this feature works well alongside server-side statement timeouts, which Percona Server 5.6 implements (as an alpha quality feature) via the Twitter patch of Davi Arnaut. The MySQL team at Oracle has of course been listening, and in MySQL 5.7.4 DMR (release notes) they too have implemented this feature (WL#6936). Kudos!

Update (6 May 2014): Morgan Tocker has opened up mysql#72540.


i