Tuesday, May 1, 2012

Nonviable NoSQL

I've done a ton of reading lately on NoSQL including articles dating back to c. 2000 up to some just recently published and I have to tell you I was impressed.  Most claimed at least 1 or 2 magnitudes better performance, all scaled linearly or near linearly and all have some wonderful features.

Then I got to thinking, if these NoSQL solutions are so great then why do we still use Oracle, Microsoft SQL, DB2, and the like?  Must be something else missing.  Lets think about it...

Going with a NoSQL solution you're trading everything that ACID provides for eventually consistent but darn if it ain't quick!  You get your results faster than ever and who cares if it's not accurate all the time.  It's accurate enough.  Sorry, I know people working in finance and logistics and what they want is the answer.

NoSQL means having a solution that doesn't require a fork-lift to upgrade.  That's the argument anyways.  Scale by adding commodity hardware anyone can pick up.  Reality, you're spinning up many servers as virtual machines that reside on big hardware, as physical servers inside some blade unit or you're using that commodity hardware but need so much of it a fork-lift would come in handy.  Go ahead and argue but the scaling without requiring the fork-lift isn't much of an argument.

What else is sacrificed in NoSQL?  Table joins, sorting, oh yeah, and a standardized language.  All that for the sake of getting possibly inconsistent query results really, really fast from a scalable platform.  Since when did the platform dictate whether it was okay or not to have inconsistent results?  It may be okay for Google, Facebook or Twitter but I doubt application owners in the non-search and non-social space would care for it.

I think what the NoSQL crowd is really griping about is an affordable, scalable database.

Monday, April 30, 2012

Advantages of PostgreSQL Asynchronous Events

Communications is key among processes and threads.  We have IPC's and signals for local system communications but what to do for processes running on different hosts?  There's a host of answers for this ranging from JMS to Jabberd but I was looking for something with the following parameters

  • uses a service already in my environment,
  • message is temporary, and
  • is automatically sent to all listening.
We have a LAPP stack (Linux, Apache, ActiveMQ, PostgreSQL & PHP).  Might argue ActiveMQ would work and yes it would however it's abused already and the messages aren't always as timely as they ought to be.  Also, the purpose of this task was to find a fallback means to communication should ActiveMQ die (which has been known to happen).

Decided on the PostgreSQL NOITFY - LISTEN pair to send simple asynchronous event notifications.  All that's required is a database connection (already there) to a shared database (already done).  The PHP code is pretty straightforward.  Here's a snippit...


#!/usr/bin/php
<?php

// Open some connections to our database
$dbh1 = pg_connect("dbname=postgres");
$dbh2 = pg_connect("dbname=postgres");
$dbh3 = pg_connect("dbname=postgres");

// Prime the connection
Database::asyncListen($dbh2, 'listen_test');
Database::asyncListen($dbh3, 'listen_test');

// Send notification
Database::asyncSend($dbh1, 'listen_test');

// Listen for notification on current connections
Database::asyncListen($dbh2, 'listen_test');
Database::asyncListen($dbh3, 'listen_test');

// Test connection signing on after the event
$dbh4 = pg_connect("dbname=postgres");
Database::asyncListen($dbh4, 'listen_test');

class Database {
  public function asyncSend($conn, $event) {
     pg_query($conn, "NOTIFY {$event};");
     echo "event created\n";
  }

  public function asyncListen($conn, $event) {
     pg_query($conn, "LISTEN {$event};");
     if ( pg_get_notify($conn) !== false ) {
       echo "event '{$event}' registered\n";
     } else {
       echo "nothing\n";
     }
  }
}
?>

Here's the output:
nothing
nothing
event created
event 'listen_test' registered
event 'listen_test' registered
nothing

What's this good for? As we race towards distributed computing there has to be a means to communicate effectively. Such communications within PostgreSQL is handy in that LISTEN is incredibly light thus a body to the message / event is not necessary. Multiple LISTEN's can be executed looking for many events for many things. The name is specific to the message. Rather than multiple constant SELECT's on a table looking for a record, using NOTIFY can tell the code when to check the table.

Just one of those little discoveries I felt the need to scream from the rooftops.