postgresql – Grey Panthers Savannah

NoCOUG SQL challenge

gpanther — Mon, 06 Apr 2009 09:17:00 +0000

NoCOUG (which stands for Northen California Oracle Users Group) published an SQL challenge [PDF]: using SQL determine the probability of achieving a given number by throwing a non-balanced dice N times.

Being a PostgreSQL fanboy that I am, I’ve given a try with PG. Here are the results:

To create the table and populate it (note the 1.0 notation, otherwise it does integer arithmetic):


CREATE TABLE die
(
  face_id integer NOT NULL,
  face_value integer NOT NULL,
  probability double precision NOT NULL,
  CONSTRAINT die_pkey PRIMARY KEY (face_id)
);

INSERT INTO die(face_id, face_value, probability) VALUES
	(1, 1, 1.0/6 + 1.0/12),
	(2, 3, 1.0/6 + 1.0/12),
	(3, 4, 1.0/6 + 1.0/12),
	(4, 5, 1.0/6 - 1.0/12),
	(5, 6, 1.0/6 - 1.0/12),
	(6, 8, 1.0/6 - 1.0/12);

And here is a stored procedure written in plpgsql to calculate the probabilities:


CREATE OR REPLACE FUNCTION calc_probs(depth integer) RETURNS SETOF die AS $$
BEGIN
  IF (depth <= 1) THEN
    RETURN QUERY SELECT * FROM die;
  ELSE
    RETURN QUERY
      SELECT 
          MIN(A.face_id) AS face_id,
          A.face_value + B.face_value AS face_value, 
          SUM(A.probability * B.probability) AS probability 
        FROM die A, (SELECT * FROM calc_probs(depth - 1)) B
        GROUP BY 2;
  END IF;
END;
$$ LANGUAGE plpgsql;

SELECT face_value, probability FROM calc_probs(100) ORDER BY face_value;

I didn’t do any benchmarks, but it should be quite fast. One optimization you could do for such functions in production is to declare it STABLE (or an unsafe optimization: declare it IMMUTABLE if the underlying table changes very infrequently). From the documentation:

STABLE indicates that the function cannot modify the database, and that within a single table scan it will consistently return the same result for the same argument values, but that its result could change across SQL statements. This is the appropriate selection for functions whose results depend on database lookups, parameter variables (such as the current time zone), etc.

Finally, here is a solution using the CTE (Common Table Expression) feature from the upcoming 8.4 release (you can think of the CTE’s like dynamically defined VIEWS – for more details about them you can start at the following links: CTEReadme, Common Table Expressions (WITH and WITH RECURSIVE) and Waiting for 8.4 – Common Table Expressions (WITH queries)):


WITH RECURSIVE p AS (
    SELECT 1 AS throws, face_value, probability FROM die
UNION ALL
    SELECT B.throws + 1 AS throws,
        A.face_value + B.face_value AS face_value,
        A.probability * B.probability AS probability
        FROM die A, p B
        WHERE B.throws < 3
) SELECT face_value, SUM(probability) FROM p
    WHERE throws = 2 GROUP BY face_value ORDER BY face_value;

This solution is a little less optimal because it does the GROUPing at the end, but I wasn’t able to include it in the inner select (it kept giving me an error saying that grouping is not supported on the recursive part of the queries). It also makes you repeat the number of dice throws twice. It is possible that it can be written better (this is the first time I’ve experimented with the particular feature) or that it just isn’t suitable for these types of queries.

Toying around with this challenge was fun and it certainly shows that PostgreSQL is on par with most of Oracle’s features.

Picture taken from TooFarNorth’s photostream with permission.

Small programming tips

gpanther — Thu, 02 Apr 2009 08:30:00 +0000

A quickpost inspired by issues I encountered recently.

How to concatenate (aggregate) strings with PostgreSQL? In MySQL you can write:
```
SELECT concat(name) FROM test_table
```
Because concat also works as an aggregate function (like MIN, MAX, SUM, COUNT, etc). To get the equivalent result in PostgreSQL, you can use the following query (based on ideas from a previous post):
```
SELECT array_to_string(ARRAY(SELECT name FROM test_table), 'separator')
```
This is a hack in some ways, because you should write a custom aggregate function, but it works.
How to convert date/time to a Unix timestamp with PostgreSQL?
```
SELECT date_part('epoch', NOW())
```
This returns a floating point number (to represent also the milisecond of the timestamp). If you want just second level precision, you need to convert it to an integer (with the remark that you need consider the issue of rounding up, rounding down, etc):
```
SELECT date_part('epoch', NOW())::INTEGER
```

How to find the point of execution (from the point of the callstack) in a Java program and possibly store it for later display? This is useful in asynchronous situations, where objects are created in one place, but their methods called in other places. The solution: create a Throwable object, because it keeps track of the creation point (and no, creating a Throwable object doesn’t cause exceptions, throw-ing them does):

// where you are interested in the stack
Throwable callingPoint = new Throwable();

// later on (in a different method / thread / etc)
callingPoint.printStackTrace(); // this dumps the stacktrace to the stderr

// ...

// or, if you want to make something more sophisticated with the stacktrace
StringBuilder sb = new StringBuilder();
sb.append(e.getMessage()); sb.append('n');
for (StackTraceElement ste : e.getStackTrace()) {
	sb.append('t'); sb.append(ste); sb.append('n');
}

Picture taken from selena marie’s photostream with permission.

PostgreSQL data corruption issues

gpanther — Fri, 27 Feb 2009 08:28:00 +0000

Lately I’ve been helping out a friend with PG data corruption issues. Usually PG is pretty good about data consistency, but it too can fail under extreme conditions (multiple power failures, fsync=off in the name of speed, no battery-backed RAID controller). The interesting thing I didn’t realize, is that your transaction log can get corrupted!

Some errors I’ve seen include:


Exception [OperationalError] - [could not access status of transaction 1277830 DETAIL:  Could not open file "pg_clog/0001": No such file or directory.


PANIC: corrupted item pointer [...]

Some ideas I’ve found / had:

Recreate the missing files if=/dev/zero of=0001 bs=1024 count=256– from here
Use pg_resetxlog (located in /usr/lib/postgresql/8.3/bin/pg_resetxlog under Debian/Ubuntu)
Dump and reload the data on an other machine. A problem which can appear is that of data which violates constraints (like NOT NULL). One should remove all the constraints and add them back one by one, cleaning out the data which violates it.

This is very much a “work in progress” situation, since I didn’t manage to solve it to my satisfaction, but maybe these pointers will be useful for somebody.

Image taken from nvshn’s photostream with permission. Created with corrupt – the data corruption software

PostgreSQL musings

gpanther — Sun, 15 Feb 2009 07:10:00 +0000

First, a very good article about creating (and maintaining!) data clustering with PostgreSQL. This made me think: wouldn’t it be nice if the automated tuning wizards would give you a short article to read which discusses the proposed solution instead of just the “turn knob X” type of suggestions?

Also, Percona is hiring performance experts, including PostgreSQL ones. This is good news, since Percona is known both for conveying useful information and for the valuable source contributions they make to the product. Will we see a “high performance” PostgreSQL soon? (Yes, PG is high performance, I’m referring to their blog name here).

Random Database Blogging

gpanther — Tue, 03 Feb 2009 11:27:00 +0000

From the Database and Performance blog: Queuing Theory & Resource Utilization – a lightweight introduction into the field which explains why you don’t have linear growth all the way – at a moment you hit a magic ceiling and things get much worse.

PostgreSQL Replicator is an other way to replicate your PostgreSQL database. Now it has rpm’s built automatically for it by the pgsqlrpms project. This makes it easier to try it out certainly. Now if we could have some .deb’s please

A brief comparison between londiste and slony.

The perils of InnoDB with Debian and startup scripts – something useful to read about if you are using MySQL + Debian.

Initial ext3 vs ext4 Results – this looks awesome. Together with PostgreSQL 8.4 this should rock!

An other benchmark comparing different scheduling algorithms under Linux for use with a DB – this is why I like Linux! You can tune it to your hearts will. With closed source you have to trust that the vendor made the correct choice.

pl/lolcode

gpanther — Wed, 07 Jan 2009 14:59:00 +0000

The news (via Joshua Drake’s blog): video / audio / slides available for two more talks on the postgresql conference site. Now for the funny part (this is from the slides of the “Babel of Procedural Languages” by David Fetter):


HAI
    CAN HAS DATABUKKIT?
    I HAS A RESULT
    I HAS A RECORD
    GIMMEH RESULT OUTTA DATABUKKIT "SELECT ﬁeld FROM mytable"
    IZ RESULT NOOB?
        YARLY
            BYES "SUMWUNZ IN YR PGSQL STEELIN YR DATA"
    KTHX
    IM IN YR LOOP
        GIMMEH RECORD OUTTA RESULT
        VISIBLE RECORD!!FIELD
        IZ RESULT NOOB? KTHXBYE
    IM OUTTA YR LOOP
KTHXBYE

This… is… incredibly… funny!!!

How to be the coolest DBA out there…

gpanther — Wed, 17 Dec 2008 18:12:00 +0000

By managing your PostgreSQL install with your iPhone!

Poor man’s traffic logger

gpanther — Mon, 10 Nov 2008 15:25:00 +0000

I was reading the following blog post about filtering out MySQL queries and was reminded of a situation I faced once. The situation was as follows: I needed to find out where certain PostgreSQL queries were coming from, however the server was behind a pgpool instance, so all the queries were seen as coming from the same IP.

The solution was to tcpdump on the interface/port where pgpool was listening and search the traffic for the specific queries. This solution is much more elegant of course :-). Also, somebody in the comments mentioned a nifty little tool called MySQL query sniffer, which looks very nice and probably could be adapted for PG (using something like PgPP as the basis).

A small Slony-I tutorial

gpanther — Tue, 28 Oct 2008 14:42:00 +0000

Update: the altperl scripts seem to take away the need for most of the steps here. Take a look at the post Slony1-2.0.0 + PostgreSQL 8.4devel for an example for how to use them.

When I first installed Slony-I to replicate between two PostgreSQL servers, it was very confusing. Now I am somewhat less confused (proven by the fact that the replication is actually working :-)) and would like to share some things I’ve learned along the way.

The first thing to understand is how Slony-I works on the high level:

When you install Slony in a database, it creates some tables in a separate schema.
It also adds some triggers to your tables the following way: (a) on the master it adds triggers which log the changes after modifying the table and (b) on the slave it adds triggers which disallow changing the data (this is logical, but can be surprising)
When data gets modified on the master, these triggers log the changes to the slony schema.
Each slave needs to run a daemon (slon) to fetch these changes from the slony schema on the master, write them locally and delete (to keep the slony schema from growing indefinitely)

A limitation of the Slony system is that you can’t change the list of tables or their structure dynamically. To do this you must break the synchronization, perform the changes and let Slony copy all the data over. This sounds really bad, but Slony is quite fast actually, copying over a ~30GB dataset in ~4 hours (on a local 100MB network).

Now with these out of the way, how do you actually start a replication?

The first thing you need to make sure of is that both servers can connect to each other. You must be able to go to both servers and successfully execute the following command:


psql -h [the other server] -U [user] -d [DB name]

Now that you made sure that you have connectivity, you must generate an initialization file and feed it to slonik (the slony management tool). You can do this fairly easily if you have a regular structure for your tables. You have the query for it below:


SELECT '#!/bin/bash
slonik <<_EOF_
cluster name = slon_cluster_a;

node 1 admin conninfo = ''dbname=dbname host=first_host user=u password=p'';
node 2 admin conninfo = ''dbname=dbname host=second_host user=u password=p'';

init cluster (id = 1, comment = ''master a'');

create set (id = 1, origin = 1, comment = ''fdb'');
' || array_to_string(ARRAY(SELECT 
 CASE 
  WHEN c.relkind = 'S' THEN
   'set add sequence (set id = 1, origin = 1, id = ' || (SELECT SUM(1) FROM pg_class WHERE pg_class.relname <= c.relname) ||
    ' full qualified name = ''' || n.nspname || '.' || c.relname || ''', comment = ''Sequence ' || c.relname || ''');' 
  WHEN c.relkind = 'r' THEN
   'set add table (set id = 1, origin = 1, id = ' || (SELECT SUM(1) FROM pg_class WHERE pg_class.relname <= c.relname) ||
    ' full qualified name = ''' || n.nspname || '.' || c.relname || ''', comment = ''Table ' || c.relname || ''');' 
 END
 FROM pg_class c
 LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
 LEFT JOIN pg_tablespace t ON t.oid = c.reltablespace 
 WHERE 
  (c.relkind = 'r' OR c.relkind = 'S')
  AND n.nspname = 'public'
  AND NOT(c.relname LIKE '%_old')
  AND NOT(c.relname = 'foo')
 ORDER BY c.relname ASC), E'n') ||
'store node (id = 2, comment = ''slave a'');
store path (server = 1, client = 2, conninfo = ''dbname=dbname host=first_host user=u password=p'');
store path (server = 2, client = 1, conninfo = ''dbname=dbname host=second_host user=u password=p'');

store listen (origin = 1, provider = 1, receiver = 2);
store listen (origin = 2, provider = 2, receiver = 1);

subscribe set ( id = 1, provider = 1, receiver = 2, forward = no);'

Now I know that this looks like a monster query, but it is actually quite simple. It creates a bash script (I'm assuming that the DB servers are running Linux) composed out of three parts:

The initial part (make sure that you set up the paths correctly here)
The middle part which differs depending on the type of object (table or sequence - sequences need to be synchronized too because they (1) are outside of transactions and (2) can be changed explicitly)
The final part. Again, make sure that the connection strings are correct.

Regarding the middle part: most probably you will want to fiddle with the filtering criteria. For example the above query includes tables / sequences which:

Are in the public schema (this is important because we have the schema created by slony)
Don't end in _old
Are not named foo

Save the output of the query in a file on one of the servers (either the master or the slave), give it executable permissions (chmod +x slony_setup.sh) and run it (./slony_setup.sh). You only have to run it once (ie. either on the master or on the slave) because it will set up the triggers / tables / etc on both machines (if the connection strings are correct - if they are not you will have other troubles as well, so go and correct them :-)).

A remark about choosing the (PostgreSQL) user under which to run Slony: for simplicity it is recommended to use postgres or an other superuser, because it has to make modifications to the DB (add schemas, create triggers, etc). From what I understand it is possible to use a locked down user (ie one which has minimal privileges), but it is a painful procedure.

Now go to the client, login as user "u" (the one defined in the slony setup script) - you can do this indirectly for example by doing sudo su u if you have sudo rights - and start the slon daemon: slon slon_cluster_a dbname=dbname user=u. You should see a lot of message go by. If there aren't any obvious errors, the initial data copying has begun. You can see the status of Slony by executing the following query on the slave (of course you should replace _slon_cluster_a with the correct schema name):


SELECT * FROM _slon_cluster_a.sl_status

What you are interested in is the st_lag_num_events column. Initially this will increase, until the copying of the data is done and the slave had a chance to catch up with the master. After that it should hover around 0. Sometimes you will see spikes (I assume that these appear during autovacuum), but it always should tend towards 0.

Now that you've set up the replication, how do you tear it down? (because for example you want to change the tables). The easiest way I found was using pgAdmin.

First stop the slon process on the slave.

Now in pgAdmin go to your database, and you should see a "Replication" entry, similar to the one shown below:

You should delete the entry on both the master and the slave (this is one of the rare cases when working with slony when you need to operate independently on the two machines - of course probably there is some command-line script which does this for you).

Now do the changes on the master and export the exact database structure to the slave. Again, for me the easiest method was to drop and recreate the database on the slave and after that export the schema from the master to the slave like this:


pg_dump -h [master] -s dbname|psql dbname

The -s tells pg_dump to only dump the schema, not the data.

Now regenerate and re-run the slonik script and start the slon daemon.

That's all folks. Hope it helps somebody who is struggling to understand slony, like I was.

PS. An interesting side-effect of installing slony on the slave is that other triggers get dropped (and need to be recreated). This can be important in some scenarios, for example: lets say that you have a table A and a table A_log which contains the operations done on table A. A_log is updated by a trigger on A. Now if you use Slony, you have two options:

Let Slony synchronize both tables. This is the easiest, however it can happen that for short period of times you will have inconsistencies between A and A_log on the slave if you are using the "READ COMMITTED" isolation level (which is recommended in most of the cases) because Slony might have synchronized one of the tables, but not the other. If this is acceptable for you, great.

If this is not acceptable, you have to do two things: first, exclude the table A_log from the synchronization (by adding it to the exclusion criteria in that huge select at the beginning of the article). Second, after you executed the slon script, you need to add the trigger back on the slave for the "A" table.

PS PS. What I presented here is a very rudimentary configuration and can be substantially improved. For example Slony has support for the following scenario: the master goes away and now the slave should become the master and when later the master comes back (is repaired for example), it can become the new slave ("switch"). From what I understand some changes need to be done to the above configuration to support this (fairly important scenario).

Update: Thanks to one of the commenters I took a closer look of EnterpriseDB and on their site found a link to the following documentation which describes the structure and intention of Slony: Slony-I - A replication system for PostgreSQL - Concept. It is a short read (~17 pages) and very useful for understanding Slony.

Update: Here is an other tutorial for Slony. This is pretty much the same as the samples on the Slony website, however I found it to be easier to understand (this is also possibly because in the meantime I've learned more about Slony, but give it a try). It also gives examples on how to use some helper scripts included in Slony to make your life easier.

Updates:

Slony examples under Windows - very nice looking step-by-step documentation. Good to know that now Slony is also available under Window3s
Adding new tables to a Slony cluster - somewhat complicated, but there is a workaround.
Slightly off-topic: Replication using pgpool - I tried to use pgpool for replication once, but had no look. However it seems that this gentleman had, and luckily he documented the process.

Update: it seems that slony requires the existence of the public schema, even if it isn't installed in it.

Update: Setting up Slony cluster with PerlTools - a much simpler method than the one described by me

Using Perl to access PostgreSQL under Windows

gpanther — Tue, 30 Sep 2008 12:47:00 +0000

This appears by a non-intuitive problem for people. Below I assume that you are using some version of ActivePerl for Windows (5.8 or 5.10). First of all:

Under no circumstances (ok, I rephrase: only under extreme circumstances) should you use DBD::PgPP. It is old, not very performant (given that its implemented in Pure Perl) and has some nontrivial bugs. An example for one such bug is the following:


my $sth = $dbh->prepare('SELECT foo FROM bar WHERE baz = ? AND fuzz = ?');
$sth->execute('a?b', 'c');

With PgPP the actual query it will try to execute will be something like: SELECT foo FROM bar WHERE baz = a'c'b AND fuzz = ?. I assume that what it does is that it takes the parameters provided to execute one by one and looks for the first question mark in the query. But given that the first parameter itself contains a question mark, the method fails…

So how do you install DBD::Pg if there aren’t in the ActiveState PPM repositories?

Go to the dbdpgppm project on pgfoundry
Download the correct PPD file for your needs (depending on the Perl version – 5.6, 5.8 or 5.10 – and if you need SSL support or not)
Issue the following command:
ppm install the_downloaded.ppd