Gj’s SQL blog

December 2, 2010

are window functions always better ?

Filed under: Uncategorized — Grzegorz Jaskiewicz @ 9:42 pm

Friend of mine asked me a very simple question. He needs to get a top row for every group of rows from a table.

Here’s the table:

\d+ "table"
                           Table "public.table"
 Column |            Type             | Modifiers | Storage | Description 
--------+-----------------------------+-----------+---------+-------------
 id     | integer                     |           | plain   | 
 value  | integer                     |           | plain   | 
 ts     | timestamp without time zone |           | plain   | 
Indexes:
    "foobar_xz" btree (id)
    "foobar_yz_ts" btree (ts)
Has OIDs: no

Lets stuff it with loads of data:

insert into "table"(id, value, ts) select a/10, random()*666, now() + (((random()*1345324532)-5234523)::int::text ||' seconds ')::interval from generate_series(1,10000000) a;
(vacuumed etc)
postgres=# \dt+ "table"
                    List of relations
 Schema | Name  | Type  |  Owner   |  Size  | Description 
--------+-------+-------+----------+--------+-------------
 public | table | table | postgres | 422 MB | 
(1 row)


Now lets run two queries, first one using window functions, second one using a simple sub-select.


postgres=# explain analyze SELECT id, value, ts FROM (select id, value, ts, rank() over (partition by id order by ts desc) from "table") a WHERE a.rank=1;
                                                                QUERY PLAN                                                                
------------------------------------------------------------------------------------------------------------------------------------------
 Subquery Scan on a  (cost=1829434.33..2154434.33 rows=50000 width=16) (actual time=26780.491..59825.700 rows=1000001 loops=1)
   Filter: (a.rank = 1)
   ->  WindowAgg  (cost=1829434.33..2029434.33 rows=10000000 width=16) (actual time=26780.467..56652.891 rows=10000000 loops=1)
         ->  Sort  (cost=1829434.33..1854434.33 rows=10000000 width=16) (actual time=26780.449..37195.121 rows=10000000 loops=1)
               Sort Key: "table".id, "table".ts
               Sort Method:  external merge  Disk: 254032kB
               ->  Seq Scan on "table"  (cost=0.00..154055.00 rows=10000000 width=16) (actual time=0.005..3312.756 rows=10000000 loops=1)
 Total runtime: 59988.587 ms
(8 rows)

Time: 60067.760 ms
postgres=# explain analyze select id,value,ts from "table" where (id,ts) in (select id, max(ts) from "table" group by id);
                                                                          QUERY PLAN                                                                           
---------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Semi Join  (cost=365895.10..680131.10 rows=1 width=16) (actual time=17783.725..45548.519 rows=1000001 loops=1)
   Hash Cond: ((public."table".id = public."table".id) AND (public."table".ts = (max(public."table".ts))))
   ->  Seq Scan on "table"  (cost=0.00..154055.00 rows=10000000 width=16) (actual time=0.020..6671.658 rows=10000000 loops=1)
   ->  Hash  (cost=354760.51..354760.51 rows=574106 width=12) (actual time=17780.585..17780.585 rows=1000001 loops=1)
         Buckets: 4096  Batches: 32  Memory Usage: 988kB
         ->  GroupAggregate  (cost=0.00..349019.45 rows=574106 width=12) (actual time=81.539..15958.085 rows=1000001 loops=1)
               ->  Index Scan using foobar_xz on "table"  (cost=0.00..291843.13 rows=10000000 width=12) (actual time=81.493..8288.009 rows=10000000 loops=1)
 Total runtime: 45706.028 ms
(8 rows)

Time: 45749.765 ms


Now even I was surprised to see, that the simple query was much quicker than the window function.
Please, leave comments as to what could have I done better on the window function front.
Of course, the question would be much different if he wanted - say 3 top results. 
This of course isn't suppose to be window function advocacy post, but rather post that advocates healthy choice and please do make sure that you
at least test 2 types of approach to solve your problem before committing to anything. 

And fly safe ! 🙂   (ooops, wrong blog 😉 )

 

Advertisements

14 Comments »

  1. Notice that the windowing function version doesn’t use indexes but rather performs a full table scan and then an on-disk sort, which the other one avoids. From what I’ve seen I’ve gotten the impression windowing functions can’t utilize indexes in as many cases as they should be, though I can’t back that up as I’ve never investigated it when I’ve seen it. Either case I’m careful with their use when I’m processing large amounts of data.

    Comment by Thor Michael Støre — December 3, 2010 @ 12:13 am

    • Makes me regret that I can’t do the window function bit in WHERE clause. That would probably speed things up too.

      Comment by Grzegorz Jaskiewicz — December 3, 2010 @ 8:27 am

  2. GJ,
    Which OS are you running on and which version of PostgreSQL.

    On my Windows 7 box running PostgreSQL 9.0 32-bit the Window function example is much faster — 21,356 ms

    The other is: 132,878 ms

    I suppose we aren’t really comparing exactly the same query since the data is random so could be sensitive to data distribution:

    Here is my explain output:
    window version:
    Subquery Scan on a (cost=2146685.83..2471685.83 rows=50000 width=20) (actual time=9829.461..20614.753 rows=1000001 loops=1)
    Filter: (a.rank = 1)
    -> WindowAgg (cost=2146685.83..2346685.83 rows=10000000 width=20) (actual time=9829.455..19368.341 rows=10000000 loops=1)
    -> Sort (cost=2146685.83..2171685.83 rows=10000000 width=20) (actual time=9829.444..11579.759 rows=10000000 loops=1)
    Sort Key: “table”.id, “table”.ts
    Sort Method: external sort Disk: 332200kB
    -> Seq Scan on “table” (cost=0.00..163695.00 rows=10000000 width=20) (actual time=0.075..1099.927 rows=10000000 loops=1)
    Total runtime: 20673.263 ms

    — the simple version:
    Hash Semi Join (cost=393865.36..737108.36 rows=1 width=20) (actual time=10686.814..132847.442 rows=1000001 loops=1)
    Hash Cond: ((public.”table”.id = public.”table”.id) AND (public.”table”.ts = (max(public.”table”.ts))))
    -> Seq Scan on “table” (cost=0.00..163695.00 rows=10000000 width=20) (actual time=0.025..1571.613 rows=10000000 loops=1)
    -> Hash (cost=384258.15..384258.15 rows=483147 width=12) (actual time=10568.631..10568.631 rows=1000001 loops=1)
    Buckets: 4096 Batches: 64 (originally 32) Memory Usage: 1025kB
    -> GroupAggregate (cost=0.00..379426.68 rows=483147 width=12) (actual time=0.783..4810.574 rows=1000001 loops=1)
    -> Index Scan using idx_id on “table” (cost=0.00..323387.35 rows=10000000 width=12) (actual time=0.318..2262.283 rows=10000000 loops=1)
    Total runtime: 132878.601 ms

    Comment by Regina — December 3, 2010 @ 12:54 am

    • >> Memory Usage: 1025kB
      Your work_mem has to be larger, try 40MB.

      Comment by Frank Heikens — December 3, 2010 @ 1:50 pm

      • 1025kB is 1M + 1kB 🙂 I’ve set it to be 650MB, and only than the window function version starts to use memory to sort things rather than disk, but uses a lot more of it.

        Comment by Grzegorz Jaskiewicz — December 3, 2010 @ 3:15 pm

  3. The difference seems seq scan for window query and index scan for group-by at the bottom. You may want create index on (id, ts).

    Comment by Hitoshi Harada — December 3, 2010 @ 12:58 am

    • For the record, this is git-trunk, and its been tested on my laptop (mac os x).
      Dumping existing index, and creating new one on the (id,ts) doesn’t help at all. In fact, it makes the ‘simple’ way slower.

      I’ll update postgresql, and try it this time on better hardware (and on linux) with the postgresql trunk.

      Comment by Grzegorz Jaskiewicz — December 3, 2010 @ 8:26 am

      • Try this index:
        create index id_ts_desc on “table” (id, ts desc);

        This is an index that can be used by the windowing function, see (partition by id order by ts desc).

        Comment by Frank Heikens — December 3, 2010 @ 10:56 am

    • I reproduced Grzegroz’s findings and then tried that before making my previous reply, but it didn’t use that index. This was on 8.4 though.

      Comment by Thor Michael Støre — December 3, 2010 @ 9:35 am

  4. with the desc index, much better.
    I still wish I could use window function in where, that would probably make it bit faster too than having to have a subselect.

    explain analyze SELECT id, value, ts FROM (select id, value, ts, rank() over (partition by id order by ts desc) from “table”) a WHERE a.rank=1;
    QUERY PLAN
    —————————————————————————————————————————————————–
    Subquery Scan on a (cost=0.00..706717.44 rows=50000 width=16) (actual time=21.832..47278.751 rows=1000001 loops=1)
    Filter: (a.rank = 1)
    -> WindowAgg (cost=0.00..581717.44 rows=10000000 width=16) (actual time=21.825..38571.649 rows=10000000 loops=1)
    -> Index Scan using foobar on “table” (cost=0.00..406717.44 rows=10000000 width=16) (actual time=21.809..13177.839 rows=10000000 loops=1)
    Total runtime: 48002.062 ms
    (5 rows)

    Time: 48086.769 ms
    # explain analyze select id,value,ts from “table” where (id,ts) in (select id, max(ts) from “table” group by id);
    QUERY PLAN
    ——————————————————————————————————————————————
    Hash Semi Join (cost=224764.04..448819.04 rows=1 width=16) (actual time=22265.164..56116.035 rows=1000001 loops=1)
    Hash Cond: ((public.”table”.id = public.”table”.id) AND (public.”table”.ts = (max(public.”table”.ts))))
    -> Seq Scan on “table” (cost=0.00..154055.00 rows=10000000 width=16) (actual time=0.034..8500.918 rows=10000000 loops=1)
    -> Hash (cost=216480.42..216480.42 rows=552241 width=12) (actual time=22264.094..22264.094 rows=1000001 loops=1)
    Buckets: 65536 Batches: 1 Memory Usage: 31251kB
    -> HashAggregate (cost=204055.00..210958.01 rows=552241 width=12) (actual time=20016.248..21228.785 rows=1000001 loops=1)
    -> Seq Scan on “table” (cost=0.00..154055.00 rows=10000000 width=12) (actual time=0.012..8320.798 rows=10000000 loops=1)
    Total runtime: 56850.245 ms
    (8 rows)

    Time: 57098.760 ms

    Comment by Gregg Jaskiewicz — December 3, 2010 @ 11:16 am

  5. On my Ubuntu Linux 9.10 box
    Running Version string PostgreSQL 8.4.5 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.4.real (Ubuntu 4.4.1-4ubuntu9) 4.4.1, 64-bit
    Linux Kernel 2.6.31-22-server on x86_64 32-bit Intel(R) Core(TM)2 Duo CPU E6550 @ 2.33GHz, 2 cores
    The Window function:
    “Total runtime: 37762.920 ms” First
    “Total runtime: 35551.299 ms” Second
    The other simple sub select is: “Total runtime: 32353.508 ms”
    “Total runtime: 30902.788 ms”

    Comment by Hugo Rafael Lesme Marquez — December 3, 2010 @ 5:29 pm

  6. Actually the fastest way to do this query if you have indexes is a loose index scan. But such are not very pretty in PostgreSQL since you have to use a recursive CTE to simulate it.

    http://wiki.postgresql.org/wiki/Loose_indexscan

    Comment by Andreas — December 7, 2010 @ 7:26 pm

  7. Second query gives different results if (id, ts) = (id, max(ts)) is not unique, so it needs additional constraints.

    And yet another approach, which may be faster:


    -- Create an aggreagate that returns the first item, non-strict (NULL ok)
    CREATE OR REPLACE FUNCTION first ( anyelement, anyelement )
    RETURNS anyelement AS $$
    SELECT $1;
    $$ LANGUAGE SQL IMMUTABLE;

    CREATE AGGREGATE first (
    sfunc = first,
    basetype = anyelement,
    stype = anyelement
    );

    create index table_id_ts on "table"(id, ts);
    explain analyze select id, first(value) as value, first(ts) as ts from (SELECT * from "table" order by id, ts) as sub group by id;
    explain analyze select (sub.r :: "table").* from (select first("table" ORDER BY ts) as r from "table" group by id) as sub;
    explain analyze select id, first(value ORDER BY ts) as val, first(ts ORDER BY ts) as ts from "table" group by id;

    First select can use combined index, 2nd and 3rd currently cannot, it seems.
    2nd and 3rd queries use pg-9 syntax.

    Comment by Konstantin — December 10, 2010 @ 11:40 am

    • See my previous post Konstantin 😉 I pretty much showed that there.

      Comment by Grzegorz Jaskiewicz — December 10, 2010 @ 11:41 am


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: