mysql – Grey Panthers Savannah

Upgrading from MySQL to MariaDB on Ubuntu

gpanther — Sun, 25 Nov 2012 09:14:00 +0000

So you decided that Oracle doesn’t know its left foot from the back of his neck when it comes to open source (how’s that for a mixed metaphor), but you are not ready just yet to migrate over to PostgreSQL? Consider MariaDB. Coming from Monty Widenius, the original author of MySQL, it aims to be 100% MySQL compatible while also being truly open-source.

Give that it’s 100% MySQL compatible, you can update in-place (nevertheless it is recommended that do a backup of your data first). The steps are roughly adapted from here.

Go to the MariaDB repository configuration tool and generate your .list file (wondering what’s up with the 5.5 vs 10.0 version? See this short explanation). You don’t know the exact Ubuntu version you’re running? Just use lsb_release -a.
Save the generated file under /etc/apt/sources.list.d/MariaDB.list as recommended and do an sudo aptitude update. You should see an output complaining about some public keys.
Do sudo apt-key adv --recv-keys --keyserver keyserver.ubuntu.com 0xCBCB082A1BB943DB to add those keys (replace the last number with the one you saw in the previous output).
Issue sudo apt-cache policy mysql-common and you should see mariadb as an upgrade option.
Finally do sudo aptitude upgrade mysql-common libmysqlclient18 and watch your MySQL database being transformed into a MariaDB one and all keeping chugging along just as usual!

Oracle buys Sun (and gets MySQL)

gpanther — Tue, 28 Apr 2009 09:26:00 +0000

Here is Monty’s (co-founder of MySQL, left SUN some time ago) opinion. On a more light-hearted note, here are some Slashdot comments

Fro rho – a good example for why case sensitivity is important:

> Their string comparisons are case sensitive.

8.4 has citext. Or you can make an index with lower() on the appropriate columns.

IMO it’s preferable for software to not assume that "Helped my uncle Jack off a horse." and "Helped my uncle jack off a horse." are the same thing.

And from Just Some Guy we get the security angle on it:

Imagine an OS where strcmp() was case insensitive, and where it was used to compare hashed passwords when authenticating users. Realize that base64 is now really base36, and that you’re been throwing away approximately half the bits per character in the encoded password, and that your passwords are now about .5^$LENGTH as secure.

Have fun auditing your MySQL-based webapps to make sure that none of them use base64 password encoding coupled with case-insensitive searches!

The new Solaris licensing terms:

1s – free
0s – $10 per 0, minimum 100,000 0s

per processor core, multiplied by the number of megabytes of RAM installed in your system.

Oh, pardon me, this isn’t a production system, but is a development workstation? Allow me to refer you to the above licensing fee schedule. Thank you for choosing Oracle!

The new stock ticker:

Oracle (ORCL) announces that in order to emphasize the importance of this operation, and better reflect its activities, will switch its stock ticker name to JAVA.

My personal opinion is that this will accelerate people migrating to MySQL forks, like Drizzle, which is good, because it removes much of the old cruft, but migration is painful, if you’ve happen to rely (knowingly or unknowingly) on one of those “features”. But it has to be done (like migrating to Apache 2, PHP 5, etc).

Random Database Blogging

gpanther — Tue, 03 Feb 2009 11:27:00 +0000

From the Database and Performance blog: Queuing Theory & Resource Utilization – a lightweight introduction into the field which explains why you don’t have linear growth all the way – at a moment you hit a magic ceiling and things get much worse.

PostgreSQL Replicator is an other way to replicate your PostgreSQL database. Now it has rpm’s built automatically for it by the pgsqlrpms project. This makes it easier to try it out certainly. Now if we could have some .deb’s please

A brief comparison between londiste and slony.

The perils of InnoDB with Debian and startup scripts – something useful to read about if you are using MySQL + Debian.

Initial ext3 vs ext4 Results – this looks awesome. Together with PostgreSQL 8.4 this should rock!

An other benchmark comparing different scheduling algorithms under Linux for use with a DB – this is why I like Linux! You can tune it to your hearts will. With closed source you have to trust that the vendor made the correct choice.

A (not so new) technique for breaking databases

gpanther — Sat, 17 May 2008 14:59:00 +0000

There is joke which goes something like: those who know how to do it, do it. Those who don’t, teach it. Those who don’t even know how to teach it, supervise it. Sadly this is true for many tech journalists who make up sensationalized titles both because of lack of comprehension and because they have to sell their writing. Of course people pitching topics to the journalists aren’t all that innocent themselves.

One such example would be the New attack technique threatens databases piece from The Register. What this boils down to if a plain SQL injection attack, at a different level.

The summary of the paper (warning, pdf!) is: suppose someone, who should know better, writes the following stored procedure (because I don’t know Oracle, it will be written in pseudo SQL, but you will get the point):


CREATE PROCEDURE test1(stuff DATE) RETURNS varchar AS
BEGIN
 query = "SELECT * FROM products WHERE arrival > '" || stuff || "'";
 EXECUTE query;
END;

The thought process (if there was any) behind it probably was along the lines: I know that constructing dynamic SQL queries is bad (both because I expose myself ot SQL injection attacks and because syntax errors aren’t verified during the creation of the procedure – given that query is just a string from the point of view of the parser), but I’ve put the value between quotes and I know that Oracle will validate the parameter before passing it to the procedure. As dates can’t have quotes in them, I’m ok.

The problem is (as the paper describes) that by altering a session variable, you can define the format of a date for Oracle, making these types of procedures exploitable. Solution: don’t create SQL queries using string concatenation, because it will bite you in the rear sooner or later.

As I mentioned earlier I’m no Oracle guru (in fact I haven’t used Oracle in my life), but being curious and all I looked how Postgres and MySQL would behave in a similar situation. Postgres had a flawless behavior, you can write queries which include input variables without the need to construct it in a string having the dual benefit of proper quotation and syntactical verification at procedure creation time. With MySQL you have to use 5.0.13 at least (this is not a big deal at all, given that you have to use at least version 5 if you want stored procedures), from which version onwards you can take advantage of prepared statements inside of stored procedures.

Avoiding the dogpile effect

gpanther — Sun, 11 May 2008 08:59:00 +0000

When using caching to speed up webpages (or other request-response oriented protocols), it is very common to tie the update of the cache to a new request, meaning that every request checks if the cache is too old first. If not, it returns the value from the cache. If it is, it tries to recompute the value, after which it is stored in the cache and returned to the client.

Two advantages of this approach are that first it doesn’t require other, special scripts to run on the server, and second of all, if the value is not needed, it’s not recalculated.

However, one problem which the naive implementation miss, is the fact that if multiple requests come at roughly the same time, and the cache is expired, all the requests will try to recalculate the value, which can very much bring down the service if this operation is costly. Apparently this effect is known as the dogpile effect, clobbering updates or stampeding requests.

To prevent this, you have basically two (and a half) choices:

Move the update process outside of the request-response cycle (in a cronjob for example). Advantages: simple (if you can arrange for scripts to be run on a periodical basis) and gives a predictable response time (because data is always taken from the cache). Disadvantages: the data recalculation is made even when there are no clients requesting it (this can avoided by creating a signaling mechanism between the requests and the update script – so that the recalculation is done only if there were requests recently for the data – however this has the probability to return outdated data). It requires you to be able to run scripts periodically. Also, the dogpile effect can still manifest itself, if for some reason the recalculation step takes longer than the update interval (so you should use lockfiles or similar synchronization mechanisms to ensure that only one instance of the update script can run at a given moment)
Keep the update code inside of the request-response cycle. When you observe that the cache has expired, you should take a lock on the element before you try to recalculate it, and only do the recalculation if the lock was successfully aquired. Advantage: only one request will recalculate the value, thus avoiding the mini-ddos. This technique can be used even when there is no possibility to run scripts on a scheduled basis. Also unnecessary updates will be avoid (if there are no requests, the recalculation will not occur). Disadvantage: requests will have variable latency (shorter for cache hits, longer for cache misses).
This later technique has two variations (depending on the systems tolerance for delay and data freshness): when a thread observes that the data has expired but it fails to aquire the update lock, it can do two things: return the cached data immediately (this means that it can return outdated data, but only one request – the one doing the acual recalculation – will have a delay) or wait for the update script to finish (to ensures up-to-date data, however all the requests during the update time will be delayed). The first method can be implemented using a try-lock method (the script tries to aquire the lock with a timeout of 0) and the second with a lock mechanism (the script tries to aquire the lock with an infinite timeout). When using the second method, make sure that you check the freshness of the data a second time after you aquired the update lock, because otherwise you might recalculate the data multiple times in the following scenario:
- Request A comes in, it aquires the lock to recalculate the data
- Request B comes in, tries to aquire the lock and enters a waiting state
- Request A finishes
- Request B aquires the lock, but now the cache contains fresh data (written by request A), thus there is no reason to recalculate it.

For locking you can use the filesystem, or databases. For example with MySQL you can use the GET_LOCK function, under PostgreSQL you can use the advisory lock functions and with memcache you can emulate the locks. One important thing though: in PostgreSQL you can trick the locking system (or rather you can trick yourself) if you do the following:

Request A aquires the lock in exclusive mode
Request B aquires the lock in shared mode
Request B re-aquires the lock in exclusive mode

At first glance you would think that step 3 fails, since A already has an exclusive lock. However the logic if you already have the lock, all locking will succeed overrides this, and request B succeeds. My advice would be: choose one locking strategy (exclusive or shared) and stick with it, don’t try to intermix the two (or at least not for the same lock).

Circumventing the need for transactions in MySQL

gpanther — Tue, 08 Apr 2008 18:34:00 +0000

While reading the excellent series on "Web 2.0" and databases on the O’Reilly radar blog it occurred to me that there is a nice trick with MySQL for making it semi-transactional (as a side-note: these days I have work with MySQL less and less and am fully enjoying the goodness that is PostgreSQL and pgAdmin).

Lets say that you have the following situation:

MySQL with MyISAM tables
A process which does a SELECT and depending on the result (for example if a given field has a certain value) issues an UPDATE

It is quite obvious that this method is not "thread safe", meaning that if you have multiple clients operating on the same records, you can very easily get into the following situation:

Client A does the SELECT and decides that it needs to update
Client B does the SELECT and it to decides to update
Client A does the update
Client B does the update

As the number of clients grows, the possibility of this situation appearing tends very fast to 100%. Your options to eliminate these situations are the following (again, assuming MyISAM tables with no transaction support):

Method 1

Simulate transactions by locking the table – this can reduce the system to a crawl, since it effectively serializes all updates. The problems gets worse and worse as the delay between the SELECT and UPDATE increases (if you have to perform complex calculations to decide if an update is needed for example).

Method 2

After doing the update, do an additional select to make sure that we were the last to update the field.

A slightly more elegant solution would be the following: when doing the UPDATE, put in the WHERE part the expected value for the fields which may change. Probably a little example is in order to make this clear. Lets suppose that we have the following table:

column_a	column_b
1	aaa
…	…

And we want to do something like this:

SELECT * FROM table WHERE column_a = 1
…check if column_b equal "aaa"…
UPDATE table SET column_b = "bbb" WHERE column_a = 1

To avoid the race condition, modify the last update as follows:

UPDATE table SET column_b = "bbb" WHERE column_a = 1 AND column_b = "aaa"

Now we don’t need to perform an additional select, we can directly check the "number of affected columns" (which is returned in most – if not all – client libraries) and if it’s one, we succeeded, otherwise someone else "stole our thunder".

Method 3

The one I actually wanted to talk about: express the whole procedure as a single SQL statement. Following the previous example, we could again write:

UPDATE table SET column_b = "bbb" WHERE column_a = 1 AND column_b = "aaa"

The difference compared to the previous method is that we don’t need the SELECT because we included the verification step in the WHERE clause. Again, we check the number of affected rows to find out if we succeeded. The basis of this method is that although MySQL doesn’t guarantee the serializability of multiple queries (on MyISAM tables unless you lock the table), it does guarantee the serializability of individual queries. In fact it has to because otherwise simple queries like UPDATE table SET a = a + 1 could not be guaranteed to produce a correct results in all circumstance. So, as long as you can express the operations which are prone to produce incorrect results when multiple copies are executed, you are fine. There is almost no limit to the conditions you can express in SQL. If your expression becomes too complicated or requires complicated controls structures (branches, loops, etc), you can hide it away in a stored procedure since version 5.0. However you should not access the database from these stored procedures, because this would break the "query serialization" process (queries from a stored procedure, unless explicitly part of a transaction, is subject to synchronization problems!). It should strictly operate on its input parameters.

Creating optimal queries for databases

gpanther — Sun, 19 Aug 2007 20:51:00 +0000

Although I’m a big PostgreSQL supporter, I started out as a MySQL user and still use MySQL daily, so I listen to the OurSQL podcast. In the latest episode (number 22) the topic was Things To Avoid With MySQL Queries. While I picked up a few tips from it (and most of the things mentioned is applicable across the board, not just specifically to MySQL), I realized that pgAdmin, the GUI administration tool for PostgreSQL has a great feature (between the many) that it’s not talked about a lot: the visual representation for EXPLAIN queries. Because what can you interpret easier, this:

or this:

Of course everything has two sides, so here is a small gotcha with pgAdmin: every time you access a database which doesn’t have the default encoding set to UTF-8, it will pop-up a warning saying that for maximum flexibility you should use the UTF-8 enconding. However what it fails to mention that if you don’t use the standard C or SQL_ASCII encoding, you will have to define the indexes with special operator classes if you wish for them to be useful for query execution.

MySQL triggers and stored procedures

gpanther — Tue, 31 Jul 2007 06:42:00 +0000

So MySQL is trying to be a big boy and have advanced features like triggers and stored procedures (not just UDF’s). However their syntax seems a little complicated compared to the PostgreSQL one. So here it goes:


DROP TRIGGER IF EXISTS mytrigger;
DELIMITER |

CREATE TRIGGER mytrigger BEFORE INSERT ON test1
FOR EACH ROW BEGIN
  INSERT INTO test2 SET a2 = NEW.a1;
  DELETE FROM test3 WHERE a3 = NEW.a1;  
  UPDATE test4 SET b4 = b4 + 1 WHERE a4 = NEW.a1;
END;
|

DELIMITER ;

The play with the delimiter is necessary to be able to put multiple statements (separated by inside of the trigger. The DROP TRIGGER IF EXISTS construct is the equivalent of the CREATE OR REPLACE construct from PostgreSQL.

The syntax for procedures / functions is similar:


DROP PROCEDURE IF EXISTS simpleproc;
DELIMITER //
CREATE PROCEDURE simpleproc (OUT param1 INT)
BEGIN
  SELECT COUNT(*) INTO param1 FROM t;
END;
//