I have a table in a PostgreSQL database called feeds_up. It looks like this:
| feed_url | isup | hasproblems | observed timestamp with tz | id (pk)|
|----------|------|-------------|-------------------------------|--------|
| http://b.| t | f | 2013-02-27 16:34:46.327401+11 | 15235 |
| http://f.| f | t | 2013-02-27 16:31:25.415126+11 | 15236 |
It has something like 300k lines growing in ~ 20 rows every five minutes. I have a request that works very often (every page load)
select distinct on (feed_url) feed_url, isUp, hasProblems
from feeds_up
where observed <= '2013-02-27T05:38:00.000Z'
order by feed_url, observed desc;
I will give an example of time, this time is parameterized. An analysis of the explanation is at explain.depesz.com . It takes about 8 seconds. Crazy!
There are only about 20 unique values for feed_url, so this seems really inefficient. I thought I would be stupid and try the FOR loop in a function.
CREATE OR REPLACE FUNCTION feedStatusAtDate(theTime timestamp with time zone) RETURNS SETOF feeds_up AS
$BODY$
DECLARE
url feeds_list%rowtype;
BEGIN
FOR url IN SELECT * FROM feeds_list
LOOP
RETURN QUERY SELECT * FROM feeds_up
WHERE observed <= theTime
AND feed_url = url.feed_url
ORDER BY observed DESC LIMIT 1;
END LOOP;
END;
$BODY$ language plpgsql;
select * from feedStatusAtDate('2013-02-27T05:38:00.000Z');
It only takes 307 ms!
FOR SQL , , , ? ? , FOR ?
ETA
Postgres: PostgreSQL 9.1.5 i686-pc-linux-gnu, gcc (SUSE Linux) 4.3.4 [gcc-4_3-branch revision 152973], 32-
feeds_up:
CREATE INDEX feeds_up_url
ON feeds_up
USING btree
(feed_url COLLATE pg_catalog."default");
CREATE INDEX feeds_up_url_observed
ON feeds_up
USING btree
(feed_url COLLATE pg_catalog."default", observed DESC);
CREATE INDEX feeds_up_observed
ON public.feeds_up
USING btree
(observed DESC);