NoCOUG (which stands for Northen California Oracle Users Group) published an SQL challenge [PDF]: using SQL determine the probability of achieving a given number by throwing a non-balanced dice N times.
Being a PostgreSQL fanboy that I am, I’ve given a try with PG. Here are the results:
To create the table and populate it (note the 1.0 notation, otherwise it does integer arithmetic):
CREATE TABLE die ( face_id integer NOT NULL, face_value integer NOT NULL, probability double precision NOT NULL, CONSTRAINT die_pkey PRIMARY KEY (face_id) ); INSERT INTO die(face_id, face_value, probability) VALUES (1, 1, 1.0/6 + 1.0/12), (2, 3, 1.0/6 + 1.0/12), (3, 4, 1.0/6 + 1.0/12), (4, 5, 1.0/6 - 1.0/12), (5, 6, 1.0/6 - 1.0/12), (6, 8, 1.0/6 - 1.0/12);
And here is a stored procedure written in plpgsql to calculate the probabilities:
CREATE OR REPLACE FUNCTION calc_probs(depth integer) RETURNS SETOF die AS $$ BEGIN IF (depth <= 1) THEN RETURN QUERY SELECT * FROM die; ELSE RETURN QUERY SELECT MIN(A.face_id) AS face_id, A.face_value + B.face_value AS face_value, SUM(A.probability * B.probability) AS probability FROM die A, (SELECT * FROM calc_probs(depth - 1)) B GROUP BY 2; END IF; END; $$ LANGUAGE plpgsql;
SELECT face_value, probability FROM calc_probs(100) ORDER BY face_value;
I didn’t do any benchmarks, but it should be quite fast. One optimization you could do for such functions in production is to declare it STABLE (or an unsafe optimization: declare it IMMUTABLE if the underlying table changes very infrequently). From the documentation:
STABLE indicates that the function cannot modify the database, and that within a single table scan it will consistently return the same result for the same argument values, but that its result could change across SQL statements. This is the appropriate selection for functions whose results depend on database lookups, parameter variables (such as the current time zone), etc.
Finally, here is a solution using the CTE (Common Table Expression) feature from the upcoming 8.4 release (you can think of the CTE’s like dynamically defined VIEWS – for more details about them you can start at the following links: CTEReadme, Common Table Expressions (WITH and WITH RECURSIVE) and Waiting for 8.4 – Common Table Expressions (WITH queries)):
WITH RECURSIVE p AS ( SELECT 1 AS throws, face_value, probability FROM die UNION ALL SELECT B.throws + 1 AS throws, A.face_value + B.face_value AS face_value, A.probability * B.probability AS probability FROM die A, p B WHERE B.throws < 3 ) SELECT face_value, SUM(probability) FROM p WHERE throws = 2 GROUP BY face_value ORDER BY face_value;
This solution is a little less optimal because it does the GROUPing at the end, but I wasn’t able to include it in the inner select (it kept giving me an error saying that grouping is not supported on the recursive part of the queries). It also makes you repeat the number of dice throws twice. It is possible that it can be written better (this is the first time I’ve experimented with the particular feature) or that it just isn’t suitable for these types of queries.
Toying around with this challenge was fun and it certainly shows that PostgreSQL is on par with most of Oracle’s features.
Picture taken from TooFarNorth’s photostream with permission.