For specific inquiries please contact SATS RDBA Services or call 650 889-5533.
Sometimes we would like to allow frequent access to a big and active tables. However we quite quickly
realize those big joins along with many active connections do not perform as we would expect.
For the live sample - lets take two large tables - orders and customers.
Table "public.customers"
Column | Type | Modifiers
----------------------+-----------------------+---------------------------------------------------------------
customerid | integer | not null default nextval('customers_customerid_seq'::regclass)
firstname | character varying(50) | not null
lastname | character varying(50) | not null
address1 | character varying(50) | not null
address2 | character varying(50) |
city | character varying(50) | not null
state | character varying(50) |
zip | integer |
country | character varying(50) | not null
region | smallint | not null
email | character varying(50) |
phone | character varying(50) |
creditcardtype | integer | not null
creditcard | character varying(50) | not null
creditcardexpiration | character varying(50) | not null
username | character varying(50) | not null
password | character varying(50) | not null
age | smallint |
income | integer |
gender | character varying(1) |
Indexes: "customers_pkey" PRIMARY KEY, btree (customerid)
Referenced by: TABLE "cust_hist" CONSTRAINT "fk_cust_hist_customerid" FOREIGN KEY (customerid) REFERENCES customers(customerid) ON DELETE CASCADE
TABLE "orders" CONSTRAINT "fk_customerid" FOREIGN KEY (customerid) REFERENCES customers(customerid) ON DELETE SET NULL
Table "public.orders"
Column | Type | Modifiers
-------------+---------------+----------------------------------------------------------
orderid | integer | not null default nextval('orders_orderid_seq'::regclass)
orderdate | date | not null
customerid | integer |
netamount | numeric(12,2) | not null
tax | numeric(12,2) | not null
totalamount | numeric(12,2) | not null
Indexes:
"orders_pkey" PRIMARY KEY, btree (orderid)
"ix_order" btree (orderdate)
"ix_order_custid" btree (customerid)
Foreign-key constraints:
"fk_customerid" FOREIGN KEY (customerid) REFERENCES customers(customerid) ON DELETE SET NULL
Referenced by: TABLE "orderlines" CONSTRAINT "fk_orderid" FOREIGN KEY (orderid) REFERENCES orders(orderid) ON DELETE CASCADE
Lets there be lots of rows in both tables: select count(*) from customers; -> 5120000
and select count(*) from orders; 12288000 rows.
Management needs to see a list of "heavy shoppers" for last 7 days displayed quite often. But select below
which provides data wanted, runs long time:
SELECT o.customerid, c.firstname, c.lastname, sum(o.totalamount) SUM
FROM orders o, customers c
WHERE c.customerid=o.customerid and o.orderdate > now() - interval '7 day'
GROUP BY o.customerid, c.customerid ORDER BY SUM DESC LIMIT 100;
Problem with this select that it runs 6 seconds. Total runtime: 5602.532 ms (18 rows)
What can we do? indexing and partitioning will only get you so far, what we can do is to "cache" a view to fit the select
statement (frequently used and natively available in ORACLE) - called Materialized Views (or matviews)
While no native matview support exists in PostgreSQL as of version 9.4, it is relatively easy to create a pglsql
logic to control the situation. The logic below uses matviews and refreshes them into tables as it goes.
Create the mat view table (data dictionary) this will include the original
view for each mat view and the last refresh date and duration as well.
CREATE TABLE matviews (
mv NAME NOT NULL PRIMARY KEY
, view NAME NOT NULL
, last_refresh TIMESTAMP WITH TIME ZONE
, refresh_time INTEGER
);
CREATE OR REPLACE FUNCTION create_matview(NAME, NAME)
RETURNS VOID
SECURITY DEFINER
LANGUAGE plpgsql AS '
DECLARE
matview ALIAS FOR $1;
view_name ALIAS FOR $2;
entry matviews%ROWTYPE;
BEGIN
SELECT * INTO entry FROM matviews WHERE mv = matview;
IF FOUND THEN
RAISE EXCEPTION ''Materialized view ''''%'''' already exists.'',
matview;
END IF;
EXECUTE ''REVOKE ALL ON '' || view_name || '' FROM PUBLIC'';
EXECUTE ''GRANT SELECT ON '' || view_name || '' TO PUBLIC'';
EXECUTE ''CREATE TABLE '' || matview || '' AS SELECT * FROM '' ||view_name;
EXECUTE ''REVOKE ALL ON '' || matview || '' FROM PUBLIC'';
EXECUTE ''GRANT SELECT ON '' || matview || '' TO PUBLIC'';
INSERT INTO matviews (mv, view, last_refresh,refresh_time)
VALUES (matview, view_name, CURRENT_TIMESTAMP,0);
RETURN;
END
';
CREATE OR REPLACE FUNCTION drop_matview(NAME) RETURNS VOID
SECURITY DEFINER
LANGUAGE plpgsql AS '
DECLARE
matview ALIAS FOR $1;
entry matviews%ROWTYPE;
BEGIN
SELECT * INTO entry FROM matviews WHERE mv = matview;
IF NOT FOUND THEN
RAISE EXCEPTION ''Materialized view % does not exist.'', matview;
END IF;
EXECUTE ''DROP TABLE '' || matview;
DELETE FROM matviews WHERE mv=matview;
RETURN;
END
';
CREATE OR REPLACE FUNCTION refresh_matview(name) RETURNS VOID
SECURITY DEFINER
LANGUAGE plpgsql AS '
DECLARE
matview ALIAS FOR $1;
entry matviews%ROWTYPE;
start_time TIMESTAMP WITH TIME ZONE;
BEGIN
start_time:= timeofday()::timestamp;
RAISE NOTICE ''Matview Refresh Start Time is %s'',start_time;
SELECT * INTO entry FROM matviews WHERE mv = matview;
IF NOT FOUND THEN
RAISE EXCEPTION ''Materialized view % does not exist.'', matview;
END IF;
EXECUTE ''DELETE FROM '' || matview;
EXECUTE ''INSERT INTO '' || matview
|| '' SELECT * FROM '' || entry.view;
RAISE NOTICE ''Matview Refresh End Time is %s (total %seconds)'',
timeofday()::timestamp,round(extract(epoch from(timeofday()::timestamp-start_time)));
UPDATE matviews
SET last_refresh= timeofday()::timestamp,
refresh_time= round(extract(epoch from ( timeofday()::timestamp-start_time)))
WHERE mv=matview;
RETURN;
END
';
To create the matview we will use the create_matview function.
bar=# SELECT create_matview('mv_top_customer', 'v_top_customer');
create_matview
----------------
(1 row)
Time: 5559.486 ms
After that the mv_top_customer matview is created and could be used. You can even create indexes
on the matview table as needed (though it is suggested to drop them before refresh in some cases).
Now select from the created matview will run in a fraction of the time: (vs original select
from source tables)
bar=# explain analyze select * from mv_top_customer;
Seq Scan on mv_top_customer (cost=0.00..12.70 rows=270 width=272) (actual time=0.008..0.020 rows=100 loops=1)
Total runtime: 0.044 ms
(2 rows)
Time: 0.397 ms
bar=# select refresh_matview('mv_top_customer');
NOTICE: Matview Refresh Start Time is 2013-10-21 23:05:03.297062+03s
NOTICE: Matview Refresh End Time is 2013-10-21 23:05:08.675053s (total 5 seconds)
refresh_matview
-----------------
(1 row)
Time: 5398.158 ms
Note that refresh still runs the original 6 seconds but we don't care as it is "behind the scenes"
Last step is to create a simple crontab
script to refresh the matview as needed, something along the lines of following will work:
postgres@test-01:~$ crontab -l
0 * * * * echo "select refresh_matview('mv_top_customer')" | psql -d bar > /tmp/refresh.log 2>&1
The matviews table will let you know once how long the last refresh took and when the rest refresh occurred.
bar=# select * from matviews;
mv | view | last_refresh | refresh_time
-----------------+----------------+-------------------------------+--------------
mv_top_customer | v_top_customer | 2013-10-21 23:05:08.675362+03 | 5
If needed, matview could be dropped using drop_matview('mv_top_customer')
You can change the origin view definition (v_top_customer in this case) and this will be updated in the next matview.
Just note that every change in the view columns or table defenitons into the view will break the
matview and it will have to be dropped and recreated
Questions/Comments? - feel free to contact Cerberus (which is me) on
RDBA.net forums under Oracle DBA corner.