The PostgreSQL relational database management system (RDBMS) extends the standard SQL3 it implements with several conveniences for application development. Among the most useful extensions are window functions, which perform a calculation across a set of table rows that are somehow related to the current row.
Suppose you want to know the total media market by state from a reference tabulated in terms of city (or metropolitan) listings. You can write a query with a window function like this:
SELECT state_name, city_name, market_size, SUM(market_size) OVER (PARTITION BY state_name) FROM markets;
and see something like:
| state_name | city_name | market_size | sum(market_size) |
|---|---|---|---|
| California | Bay Area | 4.3 | 20.3 |
| California | Los Angeles | 12.8 | 20.3 |
| California | San Diego | 3.1 | 20.3 |
| Illinois | Chicago | 9.5 | 9.5 |
| New York | NYC | 18.9 | 20.0 |
| New York | Buffalo | 1.1 | 20.0 |
| … |
A slightly more involved expression might report, for instance:
- quantitative comparison of sales to a particular customer with the average for all customers in the same sector;
- the ranks of competitive swimmers within their own heats;
- statistics on student performance only for classes that were among the top four in attendance within their departments;
- normalized reactivity for certain biological reagants that lie within a specified range of molecular weights.
The examples above are typical of questions that come up in real-world reporting requirements. Window functions make their expression concise and maintainable.
Syntax of Window Functions
In PostgreSQL, you can create a window function by replacing any use of a function in a SELECT list, or its associated ORDER BY clause, with an OVER clause immediately after the function. In an ordinary select statement such as:
SELECT emp_id, salary, AVG(salary) FROM salaries;
the average is over all reported employees. Contrast this with a statement that uses a windows function:
SELECT emp_id, salary, AVG(salary) OVER(PARTITION by site) FROM salaries;
The first two columns of these two reports will be identical. The third column in the latter case, though, will show averages only for people at the same site. All employees working in Miami share one average, all those in Toronto have a different average, and so on.
Some window functions depend on the order of their inputs. You can control this order with an optional ORDER BY clause. ORDER BY can be particularly useful when considering budget problems, as this example shows:
SELECT request, sum(request) FROM projects ORDER BY priority;
| request | sum(request) |
|---|---|
| 35 | 250 |
| 110 | 250 |
| 25 | 250 |
| 80 | 250 |
Now change the query to:
SELECT request, sum(request) OVER (ORDER BY priority) FROM projects ORDER BY priority;
and notice the subtly different result:
| request | sum(request) |
|---|---|
| 35 | 35 |
| 110 | 145 |
| 25 | 170 |
| 80 | 250 |
One way to read this table in English is: “If we fund only the top-priority project, our total expense will be 35; if we fund the top two in priority, we’ll spend 145; …” This is a view of data that decision-makers frequently prize – and a window function makes it easy to see!
Alternatives
It’s certainly possible to compute the same results without window functions; plenty of applications do. Many retrieve items from a table into a host language such as Java or Ruby, then use procedural logic in the host language to calculate the same “windows.” Among other disadvantages, this approach has the potential to transmit large volumes of undisplayed intermediate data over the network, and thus slow overall performance.
Another alternative is to use stored procedures or user-defined functions on the RDBMS side. While this minimizes runtime network traffic, it generally still falls short of window functions in clarity and query performance. When written from scratch, code to compute a window function often takes four to 10 times as many lines as the simpler window function-based query.
Scope and Status of Window Functions in PostgreSQL
The SQL/OLAP amendment to SQL:1999 (SQL:1999 is a synonym for SQL3) introduced a <window> clause to act on collections or partitions of data. Not long after this, the SQL:2003 standard formalized window functions as described here. The subsequent SQL:2008 standard (labeled “SQL-2008″ in some Postgres documents) slightly refined the definition of window functions.
PostgreSQL gained its window functions with the release of 8.4 in mid-2009. Postgres targets the SQL:2008 definition with its implementation. The principal difference between the Postgres implementation and the SQL:2008 definition has to do with specialized treatment of NULLS; see the Note at the bottom of the Window Functions documentation page for more details. Also, in Postgres, any built-in or user-defined aggregate function, such as count, max, bit_or, or xmlagg, is available for computation as a window function.
Many of the queries or reports that arise in business intelligence (BI) and analytical applications can be expressed naturally in terms of window functions. When you prepare a table where a common value ties together several distinct rows, it’s worth your time to investigate whether your calculation can best be done as a window function. You will save both your own development time, and runtime resource usage, when you write window functions. They’re among the best and most useful standardized extensions to SQL3.
Related posts:
- Tips for Writing Safe, Secure PHP Code
- Pros Tips for Extending LibreOffice
- Internet Law Part II: Patent, Trademark and Trade Secret Law
- How to Access a PostgreSQL Database from Any Language
- More Fun with Vimscript
Related Open-Source Packages
| PostgreSQL: | See all PostgreSQL Articles » | Get PostgreSQL Support at OLEX » |
|---|---|---|
| Ruby: | See all Ruby Articles » | Get Ruby Support at OLEX » |













