We've all been there, right? Large statement, insert / update / delete / select, taking for... ev... er... We understand it is hard to predict when these things will complete but wouldn't it be neat if there was a way to provide a progress bar?
Obviously, there is a solution. :)
Let's not get ahead of ourselves. The solution I am presenting will not include parallelism. Could it be done? Certainly with additional scripting external to the database however the problem I am wanting to address is simply providing feedback for a single query. Keep that objective in mind.
Enter PostgreSQL Version 11 and Transaction Management
Available in PostgreSQL version 11 is the ability to COMMIT and ROLLBACK within a procedure or anonymous code block (DO). There are limitations to this control including:- CALL or DO only (sorry FUNCTION)
- Loops on CURSOR
- UPDATE RETURNING
If your statement can be batched, we can break things up in such a way as to provide the ever elusive SQL progress bar.
What is postgres_fdw doing here?
I'm glad this question occurred to you to ask. PostgreSQL is an ACID compliant MVCC RDBMS. A single query executes in one transaction and no process outside of that transaction can see what it is doing or, more to the point, what has been done.What I thought postgres_fdw brought to the table is the ability for a single controlling function to execute DML on a foreign table and thus data made available to other sessions. That was my thought when I started down this post but it is simply not the case. When DML on a foreign table is being performed, the remote session is also in a transaction and not until the local session is COMMIT'd is the remote session also COMMIT'd.
Well, buggers. At least I tried and learned a little.
The Basics
We need something to operate on:CREATE TABLE output(
id serial primary key,
created timestamp default now(),
uu_id uuid not null
);
Demo
About time I got to it. Now, let's say we have that huge statement that takes forever.INSERT INTO batch_output (id, created, uu_id)
SELECT nextval('output_id_seq'), now(), uuid_generate_v1()
FROM generate_series(1, 1000, 1);
DO $$
DECLARE
i int;
BEGIN
FOR i IN 1..10
LOOP
INSERT INTO output (id, created, uu_id)
SELECT nextval('output_id_seq'), now(), uuid_generate_v1()
FROM generate_series(1, 100, 1);
RAISE NOTICE '% pct done, xact id %', i*10, txid_current();
END LOOP;
END $$;
NOTICE: 10 pct done, xact id 1551874
NOTICE: 20 pct done, xact id 1551874
NOTICE: 30 pct done, xact id 1551874
NOTICE: 40 pct done, xact id 1551884
NOTICE: 50 pct done, xact id 1551884
NOTICE: 60 pct done, xact id 1551884
NOTICE: 70 pct done, xact id 1551884
NOTICE: 80 pct done, xact id 1551884
NOTICE: 90 pct done, xact id 1551884
NOTICE: 100 pct done, xact id 1551884
DO
No. Doesn't meet the objective where other processes want access to the data as everything is in a single transaction. Note xact id 1551884 at every notice. We need to COMMIT in the loop.
DO $$
DECLARE
i int;
BEGIN
FOR i IN 1..10
LOOP
INSERT INTO batch_output (id, created, uu_id)
SELECT nextval('output_id_seq'), now(), uuid_generate_v1()
FROM generate_series(1, 100, 1);
RAISE NOTICE '% pct done, xact id %', i*10, txid_current();
COMMIT;
END LOOP;
END $$;
NOTICE: 10 pct done, xact id 1551876
NOTICE: 20 pct done, xact id 1551877
NOTICE: 30 pct done, xact id 1551878
NOTICE: 40 pct done, xact id 1551880
NOTICE: 50 pct done, xact id 1551884
NOTICE: 60 pct done, xact id 1551885
NOTICE: 70 pct done, xact id 1551886
NOTICE: 80 pct done, xact id 1551887
NOTICE: 90 pct done, xact id 1551888
NOTICE: 100 pct done, xact id 1551890
DO
What's going on?
- Obviously we're looping
- Our batch sizes are 100
- Output is provided showing progress and transaction id
- Transaction is COMMIT'd
To wrap up the illustration, here is a list of timestamps for records created:
SELECT created,count(*) FROM output GROUP BY 1;
created | count
----------------------------+-------
2019-04-23 13:07:41.476109 | 100
2019-04-23 13:07:41.25541 | 100
2019-04-23 13:07:41.30827 | 100
2019-04-23 13:07:41.49809 | 100
2019-04-23 13:07:41.392188 | 100
2019-04-23 13:07:41.521092 | 100
2019-04-23 13:07:41.447075 | 100
2019-04-23 13:07:41.567168 | 100
2019-04-23 13:07:41.542118 | 100
2019-04-23 13:07:41.277067 | 100
(10 rows)
It's quite that simple.
Caveats
In addition to those mentioned above, two things are worth mentioning.First, a child stored procedures called by a parent stored procedure or anonymous code block may COMMIT or ROLLBACK as long as CALL is used. Okay, that's not too hard to remember.
Second, a child stored procedure called by a FUNCTION may not perform transaction controls.
I hope you learned something here. I certainly did.

No comments:
Post a Comment