Tuesday, April 23, 2019

PostgreSQL Progress Bar


We've all been there, right?  Large statement, insert / update / delete / select, taking for... ev... er...  We understand it is hard to predict when these things will complete but wouldn't it be neat if there was a way to provide a progress bar?

Obviously, there is a solution.  :)

Let's not get ahead of ourselves.  The solution I am presenting will not include parallelism.  Could it be done?  Certainly with additional scripting external to the database however the problem I am wanting to address is simply providing feedback for a single query.  Keep that objective in mind.


Enter PostgreSQL Version 11 and Transaction Management

Available in PostgreSQL version 11 is the ability to COMMIT and ROLLBACK within a procedure or anonymous code block (DO).  There are limitations to this control including:

  1. CALL or DO only (sorry FUNCTION)
  2. Loops on CURSOR
  3. UPDATE RETURNING

If your statement can be batched, we can break things up in such a way as to provide the ever elusive SQL progress bar.


What is postgres_fdw doing here?

I'm glad this question occurred to you to ask.  PostgreSQL is an ACID compliant MVCC RDBMS.  A single query executes in one transaction and no process outside of that transaction can see what it is doing or, more to the point, what has been done.

What I thought postgres_fdw brought to the table is the ability for a single controlling function to execute DML on a foreign table and thus data made available to other sessions.  That was my thought when I started down this post but it is simply not the case.  When DML on a foreign table is being performed, the remote session is also in a transaction and not until the local session is COMMIT'd is the remote session also COMMIT'd.

Well, buggers.  At least I tried and learned a little.


The Basics

We need something to operate on:

CREATE TABLE output(
  id        serial primary key,
  created   timestamp default now(),
  uu_id     uuid not null
);

Demo

About time I got to it.  Now, let's say we have that huge statement that takes forever.

INSERT INTO batch_output (id, created, uu_id)
SELECT nextval('output_id_seq'), now(), uuid_generate_v1()
  FROM generate_series(1, 1000, 1);

Okay, it's not that big but what if it were and took hours to complete and the boss wants feedback as it progresses?  Could use an anonymous code block?

DO $$
DECLARE
  i  int;
BEGIN
  FOR i IN 1..10
  LOOP
    INSERT INTO output (id, created, uu_id)
    SELECT nextval('output_id_seq'), now(), uuid_generate_v1()
      FROM generate_series(1, 100, 1);

    RAISE NOTICE '% pct done, xact id %', i*10, txid_current();
  END LOOP;
END $$;
NOTICE:  10 pct done, xact id 1551874
NOTICE:  20 pct done, xact id 1551874
NOTICE:  30 pct done, xact id 1551874
NOTICE:  40 pct done, xact id 1551884
NOTICE:  50 pct done, xact id 1551884
NOTICE:  60 pct done, xact id 1551884
NOTICE:  70 pct done, xact id 1551884
NOTICE:  80 pct done, xact id 1551884
NOTICE:  90 pct done, xact id 1551884
NOTICE:  100 pct done, xact id 1551884
DO

No.  Doesn't meet the objective where other processes want access to the data as everything is in a single transaction.  Note xact id 1551884 at every notice.  We need to COMMIT in the loop.

DO $$
DECLARE
  i  int;
BEGIN
  FOR i IN 1..10
  LOOP
    INSERT INTO batch_output (id, created, uu_id)
    SELECT nextval('output_id_seq'), now(), uuid_generate_v1()
      FROM generate_series(1, 100, 1);

    RAISE NOTICE '% pct done, xact id %', i*10, txid_current();

    COMMIT;
  END LOOP;
END $$;
NOTICE:  10 pct done, xact id 1551876
NOTICE:  20 pct done, xact id 1551877
NOTICE:  30 pct done, xact id 1551878
NOTICE:  40 pct done, xact id 1551880
NOTICE:  50 pct done, xact id 1551884
NOTICE:  60 pct done, xact id 1551885
NOTICE:  70 pct done, xact id 1551886
NOTICE:  80 pct done, xact id 1551887
NOTICE:  90 pct done, xact id 1551888
NOTICE:  100 pct done, xact id 1551890
DO

What's going on?

  1. Obviously we're looping
  2. Our batch sizes are 100
  3. Output is provided showing progress and transaction id
  4. Transaction is COMMIT'd
To wrap up the illustration, here is a list of timestamps for records created:

SELECT created,count(*) FROM output GROUP BY 1;
          created           | count
----------------------------+-------
 2019-04-23 13:07:41.476109 |   100
 2019-04-23 13:07:41.25541  |   100
 2019-04-23 13:07:41.30827  |   100
 2019-04-23 13:07:41.49809  |   100
 2019-04-23 13:07:41.392188 |   100
 2019-04-23 13:07:41.521092 |   100
 2019-04-23 13:07:41.447075 |   100
 2019-04-23 13:07:41.567168 |   100
 2019-04-23 13:07:41.542118 |   100
 2019-04-23 13:07:41.277067 |   100
(10 rows)

It's quite that simple.  


Caveats

In addition to those mentioned above, two things are worth mentioning.

First, a child stored procedures called by a parent stored procedure or anonymous code block may COMMIT or ROLLBACK as long as CALL is used.  Okay, that's not too hard to remember.

Second, a child stored procedure called by a FUNCTION may not perform transaction controls.


I hope you learned something here.  I certainly did.

No comments: