Gj’s SQL blog

December 2, 2010

are window functions always better ?

Filed under: Uncategorized — Grzegorz Jaskiewicz @ 9:42 pm

Friend of mine asked me a very simple question. He needs to get a top row for every group of rows from a table.

Here’s the table:

\d+ "table"
                           Table "public.table"
 Column |            Type             | Modifiers | Storage | Description 
--------+-----------------------------+-----------+---------+-------------
 id     | integer                     |           | plain   | 
 value  | integer                     |           | plain   | 
 ts     | timestamp without time zone |           | plain   | 
Indexes:
    "foobar_xz" btree (id)
    "foobar_yz_ts" btree (ts)
Has OIDs: no

Lets stuff it with loads of data:

insert into "table"(id, value, ts) select a/10, random()*666, now() + (((random()*1345324532)-5234523)::int::text ||' seconds ')::interval from generate_series(1,10000000) a;
(vacuumed etc)
postgres=# \dt+ "table"
                    List of relations
 Schema | Name  | Type  |  Owner   |  Size  | Description 
--------+-------+-------+----------+--------+-------------
 public | table | table | postgres | 422 MB | 
(1 row)


Now lets run two queries, first one using window functions, second one using a simple sub-select.


postgres=# explain analyze SELECT id, value, ts FROM (select id, value, ts, rank() over (partition by id order by ts desc) from "table") a WHERE a.rank=1;
                                                                QUERY PLAN                                                                
------------------------------------------------------------------------------------------------------------------------------------------
 Subquery Scan on a  (cost=1829434.33..2154434.33 rows=50000 width=16) (actual time=26780.491..59825.700 rows=1000001 loops=1)
   Filter: (a.rank = 1)
   ->  WindowAgg  (cost=1829434.33..2029434.33 rows=10000000 width=16) (actual time=26780.467..56652.891 rows=10000000 loops=1)
         ->  Sort  (cost=1829434.33..1854434.33 rows=10000000 width=16) (actual time=26780.449..37195.121 rows=10000000 loops=1)
               Sort Key: "table".id, "table".ts
               Sort Method:  external merge  Disk: 254032kB
               ->  Seq Scan on "table"  (cost=0.00..154055.00 rows=10000000 width=16) (actual time=0.005..3312.756 rows=10000000 loops=1)
 Total runtime: 59988.587 ms
(8 rows)

Time: 60067.760 ms
postgres=# explain analyze select id,value,ts from "table" where (id,ts) in (select id, max(ts) from "table" group by id);
                                                                          QUERY PLAN                                                                           
---------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Semi Join  (cost=365895.10..680131.10 rows=1 width=16) (actual time=17783.725..45548.519 rows=1000001 loops=1)
   Hash Cond: ((public."table".id = public."table".id) AND (public."table".ts = (max(public."table".ts))))
   ->  Seq Scan on "table"  (cost=0.00..154055.00 rows=10000000 width=16) (actual time=0.020..6671.658 rows=10000000 loops=1)
   ->  Hash  (cost=354760.51..354760.51 rows=574106 width=12) (actual time=17780.585..17780.585 rows=1000001 loops=1)
         Buckets: 4096  Batches: 32  Memory Usage: 988kB
         ->  GroupAggregate  (cost=0.00..349019.45 rows=574106 width=12) (actual time=81.539..15958.085 rows=1000001 loops=1)
               ->  Index Scan using foobar_xz on "table"  (cost=0.00..291843.13 rows=10000000 width=12) (actual time=81.493..8288.009 rows=10000000 loops=1)
 Total runtime: 45706.028 ms
(8 rows)

Time: 45749.765 ms


Now even I was surprised to see, that the simple query was much quicker than the window function.
Please, leave comments as to what could have I done better on the window function front.
Of course, the question would be much different if he wanted - say 3 top results. 
This of course isn't suppose to be window function advocacy post, but rather post that advocates healthy choice and please do make sure that you
at least test 2 types of approach to solve your problem before committing to anything. 

And fly safe ! :)   (ooops, wrong blog ;) )

 

November 27, 2010

limited array_agg output on 8.3

Filed under: Uncategorized — Grzegorz Jaskiewicz @ 4:30 pm

Someone asked about it on postgresql-general mailing list. Here’s my way of doing it.

 

We need the limit_array_append() function. So here it is:

CREATE FUNCTION limit_array_append(anyarray, anyelement, integer) RETURNS anyarray AS
$_$
   SELECT (array_append($1, $2))[1:$3];
$_$ LANGUAGE sql IMMUTABLE;

The array_agg() function is obviously predefined on 8.3.

Than we need to define the aggregate:

CREATE AGGREGATE limit_array_agg(anyelement, integer)
(
   SFUNC = limit_array_append,
   STYPE = anyarray,
   INITCOND = '{}'
);

 

The result is beautiful limited array_agg. Same thing can be obviously done much better, both code wise, and performance wise on 8.4+ with window functions.

To test just do:

select limit_array_agg((random()*626)::integer, 3) FROM generate_series(1, 20) i GROUP BY (i/7)::integer; 
limit_array_agg 
----------------- 
{546,360,139} 
{411,287,339} 
{359,288,95}

(3 rows)

Easy, ain’t ?


April 19, 2009

how to speed up index on bytea, text, etc.

Filed under: index, postgresql — Grzegorz Jaskiewicz @ 5:43 pm

Well,

I recently had to create db to store some binary information. Now, these entries were to be unique in their category.

Average size of bytea field was to be 2400 bytes (about 2kb). Everything was fine, until table filled with 1.3M unique entries, and I was really astound how slow insertion become.

(more…)

on using different types in joins, and performance of it

Filed under: index, joins, postgresql — Grzegorz Jaskiewicz @ 4:44 pm

I get this question quite often from folks I work with, or people I join in their projects, usually with database designed by someone who is not very experienced in database world.

Most of them, don’t understand why the hell I am pressing for using bigint/int as a key to a table.

Here’s simple example, and my own theory behind it.

(more…)

The power of Views

Filed under: postgresql, views — Grzegorz Jaskiewicz @ 11:52 am

So, views.

What are they useful for?  how can we utilize such a power of views ?

(more…)

Unions and order

Filed under: postgresql, unions — Grzegorz Jaskiewicz @ 2:45 am

Quick note, on something that might save you few minutes.

It’s about UNION [ALL]. And why you should never assume order of output to be predetermined by order of tables in query.

(more…)

Welcome!

Filed under: postgresql — Grzegorz Jaskiewicz @ 2:20 am

Welcome!

I decided to document at least some attempts with SQL here.

I will mainly talk about PostgreSQL, because that’s what I use.

(more…)

The Silver is the New Black Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.