PostgreSQL的window函数整理

php中文网
发布: 2016-06-07 17:59:15
原创
1139人浏览过

简言之,聚合函数返回的各个分组的结果,窗口函数则为每一行返回结果,示例如下: 1.建示例表,初始化数据 DROP TABLE IF EXISTS empsalary; CREATE TABLE empsalary( depname varchar, empno bigint, salary int, enroll_date date ); INSERT INTO empsalary V

简言之,聚合函数返回的各个分组的结果,窗口函数则为每一行返回结果,示例如下:
1.建示例表,初始化数据

DROP TABLE IF EXISTS empsalary;
CREATE TABLE empsalary(
  depname varchar,
  empno bigint,
  salary int,
  enroll_date date
);
INSERT INTO empsalary VALUES('develop',10, 5200, '2007/08/01');
INSERT INTO empsalary VALUES('sales', 1, 5000, '2006/10/01');
INSERT INTO empsalary VALUES('personnel', 5, 3500, '2007/12/10');
INSERT INTO empsalary VALUES('sales', 4, 4800, '2007/08/08');
INSERT INTO empsalary VALUES('sales', 6, 5500, '2007/01/02');
INSERT INTO empsalary VALUES('personnel', 2, 3900, '2006/12/23');
INSERT INTO empsalary VALUES('develop', 7, 4200, '2008/01/01');
INSERT INTO empsalary VALUES('develop', 9, 4500, '2008/01/01');
INSERT INTO empsalary VALUES('sales', 3, 4800, '2007/08/01');
INSERT INTO empsalary VALUES('develop', 8, 6000, '2006/10/01');
INSERT INTO empsalary VALUES('develop', 11, 5200, '2007/08/15');

postgres=# select * from empsalary ;
  depname  | empno | salary | enroll_date
-----------+-------+--------+-------------
develop   |    10 |   5200 | 2007-08-01
sales     |     1 |   5000 | 2006-10-01
personnel |     5 |   3500 | 2007-12-10
sales     |     4 |   4800 | 2007-08-08
sales     |     6 |   5500 | 2007-01-02
personnel |     2 |   3900 | 2006-12-23
develop   |     7 |   4200 | 2008-01-01
develop   |     9 |   4500 | 2008-01-01
sales     |     3 |   4800 | 2007-08-01
develop   |     8 |   6000 | 2006-10-01
develop   |    11 |   5200 | 2007-08-15
(11 rows)2.统计示例
a.统计各部门的总薪水,平均薪水和部门的详细情况
postgres=# select sum(salary) OVER (PARTITION BY depname),avg(salary) OVER (PARTITION BY depname),* from empsalary;
  sum  |          avg          |  depname  | empno | salary | enroll_date
-------+-----------------------+-----------+-------+--------+-------------
25100 | 5020.0000000000000000 | develop   |    10 |   5200 | 2007-08-01
25100 | 5020.0000000000000000 | develop   |     7 |   4200 | 2008-01-01
25100 | 5020.0000000000000000 | develop   |     9 |   4500 | 2008-01-01
25100 | 5020.0000000000000000 | develop   |     8 |   6000 | 2006-10-01
25100 | 5020.0000000000000000 | develop   |    11 |   5200 | 2007-08-15
  7400 | 3700.0000000000000000 | personnel |     2 |   3900 | 2006-12-23
  7400 | 3700.0000000000000000 | personnel |     5 |   3500 | 2007-12-10
20100 | 5025.0000000000000000 | sales     |     3 |   4800 | 2007-08-01
20100 | 5025.0000000000000000 | sales     |     1 |   5000 | 2006-10-01
20100 | 5025.0000000000000000 | sales     |     4 |   4800 | 2007-08-08
20100 | 5025.0000000000000000 | sales     |     6 |   5500 | 2007-01-02
(11 rows)b.统计人员在所在部门的薪水排名情况
postgres=# select rank() OVER (PARTITION BY depname ORDER BY salary),* from empsalary;
rank |  depname  | empno | salary | enroll_date
------+-----------+-------+--------+-------------
    1 | develop   |     7 |   4200 | 2008-01-01
    2 | develop   |     9 |   4500 | 2008-01-01
    3 | develop   |    10 |   5200 | 2007-08-01
    3 | develop   |    11 |   5200 | 2007-08-15
    5 | develop   |     8 |   6000 | 2006-10-01
    1 | personnel |     5 |   3500 | 2007-12-10
    2 | personnel |     2 |   3900 | 2006-12-23
    1 | sales     |     4 |   4800 | 2007-08-08
    1 | sales     |     3 |   4800 | 2007-08-01
    3 | sales     |     1 |   5000 | 2006-10-01
    4 | sales     |     6 |   5500 | 2007-01-02
(11 rows)3.一个有趣的例子 注意使用order by,结果会两样
create table foo(a int,b int) ;
insert into foo values (1,1);
insert into foo values (1,1);
insert into foo values (2,1);
insert into foo values (4,1);
insert into foo values (2,1);
insert into foo values (4,1);
insert into foo values (5,1);
insert into foo values (11,3);
insert into foo values (12,3);
insert into foo values (22,3);
insert into foo values (16,3);
insert into foo values (16,3);
insert into foo values (16,3);

postgres=# select sum(a) over (partition by b), a, b from foo;
sum | a  | b
-----+----+---
  19 |  1 | 1
  19 |  1 | 1
  19 |  2 | 1
  19 |  4 | 1
  19 |  2 | 1
  19 |  4 | 1
  19 |  5 | 1
  93 | 11 | 3
  93 | 12 | 3
  93 | 22 | 3
  93 | 16 | 3
  93 | 16 | 3
  93 | 16 | 3
(13 rows)

postgres=# select sum(a) over (partition by b order by a), a, b from foo;
sum | a  | b
-----+----+---
   2 |  1 | 1
   2 |  1 | 1
   6 |  2 | 1
   6 |  2 | 1
  14 |  4 | 1
  14 |  4 | 1
  19 |  5 | 1
  11 | 11 | 3
  23 | 12 | 3
  71 | 16 | 3
  71 | 16 | 3
  71 | 16 | 3
  93 | 22 | 3
(13 rows)

postgres=# select a, b, sum(a) over (partition by b order by a ROWS
postgres(# BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) from foo;
a  | b | sum
----+---+-----
  1 | 1 |  19
  1 | 1 |  19
  2 | 1 |  19
  2 | 1 |  19
  4 | 1 |  19
  4 | 1 |  19
  5 | 1 |  19
11 | 3 |  93
12 | 3 |  93
16 | 3 |  93
16 | 3 |  93
16 | 3 |  93
22 | 3 |  93
(13 rows)官网中的解释是: By default, if ORDER BY is supplied then the frame consists of all rows from the start of the partition up through the current row, plus any following rows that are equal to the current row according to the ORDER BY clause. When ORDER BY is omitted the default frame consists of all rows in the partition.
默认情况下,带了order by 参数会从分组的起始值开始一直叠加,直到当前值为止,当忽略order by 参数则会计算分组中所有值的和。

4.其他的窗口函数
row_number(): 从当前开始,不间断,如1,2,3,4,5,6
rank() :从当前开始,会间断,如1,2,2,4,5,6
dense_rank():从当前开始不间断,但会重复,如1,2,2,3,4,5
percent_rank():从当前开始,计算在分组中的比例,如0,0.25,0.25,0.75,1,0,1 从0-1不断地循环
cume_dist():当前行的排序除以分组的数量,如分组有4行,则值为0.25,0.5,0.75,1
ntile(num_buckets integer):从1到当前值,除以分组的的数量,尽可能使分布平均
lag(value any [, offset integer [, default any ]]):偏移量函数,取滞后值,如lag(column_name,2,0)表示字段偏移量为2,没有则用default值代替,这里是0,不写默认是null
lead(value any [, offset integer [, default any ]]):偏移量函数,取提前值,类上 first_value(value any):返回窗口框架中的第一个值
last_value(value any):返回窗口框架中的最后一个值
nth_value(value any, nth integer):返回窗口框架中的指定值,如nth_value(salary,2),则表示返回字段salary的第二个窗口函数值

5.其他窗口函数示例
postgres=# select row_number() over (partition by depname order by salary desc),* from empsalary;
row_number |  depname  | empno | salary | enroll_date
------------+-----------+-------+--------+-------------
          1 | develop   |     8 |   6000 | 2006-10-01
          2 | develop   |    10 |   5200 | 2007-08-01
          3 | develop   |    11 |   5200 | 2007-08-15
          4 | develop   |     9 |   4500 | 2008-01-01
          5 | develop   |     7 |   4200 | 2008-01-01
          1 | personnel |     2 |   3900 | 2006-12-23
          2 | personnel |     5 |   3500 | 2007-12-10
          1 | sales     |     6 |   5500 | 2007-01-02
          2 | sales     |     1 |   5000 | 2006-10-01
          3 | sales     |     3 |   4800 | 2007-08-01
          4 | sales     |     4 |   4800 | 2007-08-08
(11 rows)

postgres=# select rank() over(partition by depname order by salary desc),* from empsalary;
rank |  depname  | empno | salary | enroll_date
------+-----------+-------+--------+-------------
    1 | develop   |     8 |   6000 | 2006-10-01
    2 | develop   |    10 |   5200 | 2007-08-01
    2 | develop   |    11 |   5200 | 2007-08-15
    4 | develop   |     9 |   4500 | 2008-01-01
    5 | develop   |     7 |   4200 | 2008-01-01
    1 | personnel |     2 |   3900 | 2006-12-23
    2 | personnel |     5 |   3500 | 2007-12-10
    1 | sales     |     6 |   5500 | 2007-01-02
    2 | sales     |     1 |   5000 | 2006-10-01
    3 | sales     |     3 |   4800 | 2007-08-01
    3 | sales     |     4 |   4800 | 2007-08-08
(11 rows)


postgres=# select dense_rank() over(partition by depname order by salary desc),* from empsalary;
dense_rank |  depname  | empno | salary | enroll_date
------------+-----------+-------+--------+-------------
          1 | develop   |     8 |   6000 | 2006-10-01
          2 | develop   |    10 |   5200 | 2007-08-01
          2 | develop   |    11 |   5200 | 2007-08-15
          3 | develop   |     9 |   4500 | 2008-01-01
          4 | develop   |     7 |   4200 | 2008-01-01
          1 | personnel |     2 |   3900 | 2006-12-23
          2 | personnel |     5 |   3500 | 2007-12-10
          1 | sales     |     6 |   5500 | 2007-01-02
          2 | sales     |     1 |   5000 | 2006-10-01
          3 | sales     |     3 |   4800 | 2007-08-01
          3 | sales     |     4 |   4800 | 2007-08-08
(11 rows)

postgres=# select percent_rank() over(partition by depname order by salary desc),* from empsalary;
   percent_rank    |  depname  | empno | salary | enroll_date
-------------------+-----------+-------+--------+-------------
                 0 | develop   |     8 |   6000 | 2006-10-01
              0.25 | develop   |    10 |   5200 | 2007-08-01
              0.25 | develop   |    11 |   5200 | 2007-08-15
              0.75 | develop   |     9 |   4500 | 2008-01-01
                 1 | develop   |     7 |   4200 | 2008-01-01
                 0 | personnel |     2 |   3900 | 2006-12-23
                 1 | personnel |     5 |   3500 | 2007-12-10
                 0 | sales     |     6 |   5500 | 2007-01-02
0.333333333333333 | sales     |     1 |   5000 | 2006-10-01
0.666666666666667 | sales     |     3 |   4800 | 2007-08-01
0.666666666666667 | sales     |     4 |   4800 | 2007-08-08
(11 rows)

postgres=# select cume_dist()over(partition by depname order by salary desc),* from empsalary;
cume_dist |  depname  | empno | salary | enroll_date
-----------+-----------+-------+--------+-------------
       0.2 | develop   |     8 |   6000 | 2006-10-01
       0.6 | develop   |    10 |   5200 | 2007-08-01
       0.6 | develop   |    11 |   5200 | 2007-08-15
       0.8 | develop   |     9 |   4500 | 2008-01-01
         1 | develop   |     7 |   4200 | 2008-01-01
       0.5 | personnel |     2 |   3900 | 2006-12-23
         1 | personnel |     5 |   3500 | 2007-12-10
      0.25 | sales     |     6 |   5500 | 2007-01-02
       0.5 | sales     |     1 |   5000 | 2006-10-01
         1 | sales     |     3 |   4800 | 2007-08-01
         1 | sales     |     4 |   4800 | 2007-08-08
(11 rows)

postgres=# select ntile(3)over(partition by depname order by salary desc),* from empsalary;
ntile |  depname  | empno | salary | enroll_date
-------+-----------+-------+--------+-------------
     1 | develop   |     8 |   6000 | 2006-10-01
     1 | develop   |    10 |   5200 | 2007-08-01
     2 | develop   |    11 |   5200 | 2007-08-15
     2 | develop   |     9 |   4500 | 2008-01-01
     3 | develop   |     7 |   4200 | 2008-01-01
     1 | personnel |     2 |   3900 | 2006-12-23
     2 | personnel |     5 |   3500 | 2007-12-10
     1 | sales     |     6 |   5500 | 2007-01-02
     1 | sales     |     1 |   5000 | 2006-10-01
     2 | sales     |     3 |   4800 | 2007-08-01
     3 | sales     |     4 |   4800 | 2007-08-08
(11 rows)

postgres=# select lag(salary,2,null)over(partition by depname order by salary desc),* from empsalary;
lag  |  depname  | empno | salary | enroll_date
------+-----------+-------+--------+-------------
      | develop   |     8 |   6000 | 2006-10-01
      | develop   |    10 |   5200 | 2007-08-01
6000 | develop   |    11 |   5200 | 2007-08-15
5200 | develop   |     9 |   4500 | 2008-01-01
5200 | develop   |     7 |   4200 | 2008-01-01
      | personnel |     2 |   3900 | 2006-12-23
      | personnel |     5 |   3500 | 2007-12-10
      | sales     |     6 |   5500 | 2007-01-02
      | sales     |     1 |   5000 | 2006-10-01
5500 | sales     |     3 |   4800 | 2007-08-01
5000 | sales     |     4 |   4800 | 2007-08-08
(11 rows)

postgres=# select first_value(salary)over(partition by depname order by salary desc),* from empsalary;
first_value |  depname  | empno | salary | enroll_date
-------------+-----------+-------+--------+-------------
        6000 | develop   |     8 |   6000 | 2006-10-01
        6000 | develop   |    10 |   5200 | 2007-08-01
        6000 | develop   |    11 |   5200 | 2007-08-15
        6000 | develop   |     9 |   4500 | 2008-01-01
        6000 | develop   |     7 |   4200 | 2008-01-01
        3900 | personnel |     2 |   3900 | 2006-12-23
        3900 | personnel |     5 |   3500 | 2007-12-10
        5500 | sales     |     6 |   5500 | 2007-01-02
        5500 | sales     |     1 |   5000 | 2006-10-01
        5500 | sales     |     3 |   4800 | 2007-08-01
        5500 | sales     |     4 |   4800 | 2007-08-08
(11 rows)

postgres=# select last_value(salary)over(partition by depname order by salary desc),* from empsalary;
last_value |  depname  | empno | salary | enroll_date
------------+-----------+-------+--------+-------------
       6000 | develop   |     8 |   6000 | 2006-10-01
       5200 | develop   |    10 |   5200 | 2007-08-01
       5200 | develop   |    11 |   5200 | 2007-08-15
       4500 | develop   |     9 |   4500 | 2008-01-01
       4200 | develop   |     7 |   4200 | 2008-01-01
       3900 | personnel |     2 |   3900 | 2006-12-23
       3500 | personnel |     5 |   3500 | 2007-12-10
       5500 | sales     |     6 |   5500 | 2007-01-02
       5000 | sales     |     1 |   5000 | 2006-10-01
       4800 | sales     |     3 |   4800 | 2007-08-01
       4800 | sales     |     4 |   4800 | 2007-08-08
(11 rows)

postgres=# select nth_value(salary,2)over(partition by depname order by salary desc),* from empsalary;
nth_value |  depname  | empno | salary | enroll_date
-----------+-----------+-------+--------+-------------
           | develop   |     8 |   6000 | 2006-10-01
      5200 | develop   |    10 |   5200 | 2007-08-01
      5200 | develop   |    11 |   5200 | 2007-08-15
      5200 | develop   |     9 |   4500 | 2008-01-01
      5200 | develop   |     7 |   4200 | 2008-01-01
           | personnel |     2 |   3900 | 2006-12-23
      3500 | personnel |     5 |   3500 | 2007-12-10
           | sales     |     6 |   5500 | 2007-01-02
      5000 | sales     |     1 |   5000 | 2006-10-01
      5000 | sales     |     3 |   4800 | 2007-08-01
      5000 | sales     |     4 |   4800 | 2007-08-08
(11 rows)当一个查询涉及多个窗口函数的时候,可以用别名的办法来使用,更简单:
postgres=# select sum(salary)over w,avg(salary) over w,* from empsalary window w as (partition by depname order by salary desc);
  sum  |          avg          |  depname  | empno | salary | enroll_date
-------+-----------------------+-----------+-------+--------+-------------
  6000 | 6000.0000000000000000 | develop   |     8 |   6000 | 2006-10-01
16400 | 5466.6666666666666667 | develop   |    10 |   5200 | 2007-08-01
16400 | 5466.6666666666666667 | develop   |    11 |   5200 | 2007-08-15
20900 | 5225.0000000000000000 | develop   |     9 |   4500 | 2008-01-01
25100 | 5020.0000000000000000 | develop   |     7 |   4200 | 2008-01-01
  3900 | 3900.0000000000000000 | personnel |     2 |   3900 | 2006-12-23
  7400 | 3700.0000000000000000 | personnel |     5 |   3500 | 2007-12-10
  5500 | 5500.0000000000000000 | sales     |     6 |   5500 | 2007-01-02
10500 | 5250.0000000000000000 | sales     |     1 |   5000 | 2006-10-01
20100 | 5025.0000000000000000 | sales     |     3 |   4800 | 2007-08-01
20100 | 5025.0000000000000000 | sales     |     4 |   4800 | 2007-08-08
(11 rows)这个写法和下面的是一样的,不过更简单
SELECT sum(salary) OVER (PARTITION BY depname ORDER BY salary DESC), avg(salary) OVER (PARTITION BY depname ORDER BY salary DESC),* FROM empsalary; 
最佳 Windows 性能的顶级免费优化软件
最佳 Windows 性能的顶级免费优化软件

每个人都需要一台速度更快、更稳定的 PC。随着时间的推移,垃圾文件、旧注册表数据和不必要的后台进程会占用资源并降低性能。幸运的是,许多工具可以让 Windows 保持平稳运行。

下载
来源:php中文网
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系admin@php.cn
最新问题
开源免费商场系统广告
热门教程
更多>
最新下载
更多>
网站特效
网站源码
网站素材
前端模板
关于我们 免责申明 意见反馈 讲师合作 广告合作 最新更新
php中文网:公益在线php培训,帮助PHP学习者快速成长!
关注服务号 技术交流群
PHP中文网订阅号
每天精选资源文章推送
PHP中文网APP
随时随地碎片化学习
PHP中文网抖音号
发现有趣的

Copyright 2014-2025 https://www.php.cn/ All Rights Reserved | php.cn | 湘ICP备2023035733号