Window Functions: PostgreSQL’s Best-Kept Secret

By on Friday, November 18th, 2011 in Technical | Related Software Packages: , | Keywords: ,

The PostgreSQL relational database management system (RDBMS) extends the standard SQL3 it implements with several conveniences for application development. Among the most useful extensions are window functions, which perform a calculation across a set of table rows that are somehow related to the current row.

Suppose you want to know the total media market by state from a reference tabulated in terms of city (or metropolitan) listings. You can write a query with a window function like this:

SELECT state_name, city_name, market_size, SUM(market_size)
OVER (PARTITION BY state_name) FROM markets;

and see something like:

Example population data
state_name city_name market_size sum(market_size)
California Bay Area 4.3 20.3
California Los Angeles 12.8 20.3
California San Diego 3.1 20.3
Illinois Chicago 9.5 9.5
New York NYC 18.9 20.0
New York Buffalo 1.1 20.0

A slightly more involved expression might report, for instance:

  • quantitative comparison of sales to a particular customer with the average for all customers in the same sector;
  • the ranks of competitive swimmers within their own heats;
  • statistics on student performance only for classes that were among the top four in attendance within their departments;
  • normalized reactivity for certain biological reagants that lie within a specified range of molecular weights.

The examples above are typical of questions that come up in real-world reporting requirements. Window functions make their expression concise and maintainable.

Syntax of Window Functions

In PostgreSQL, you can create a window function by replacing any use of a function in a SELECT list, or its associated ORDER BY clause, with an OVER clause immediately after the function. In an ordinary select statement such as:

SELECT emp_id, salary, AVG(salary) FROM salaries;

the average is over all reported employees. Contrast this with a statement that uses a windows function:

SELECT emp_id, salary, AVG(salary) OVER(PARTITION by site) FROM salaries;

The first two columns of these two reports will be identical. The third column in the latter case, though, will show averages only for people at the same site. All employees working in Miami share one average, all those in Toronto have a different average, and so on.

Some window functions depend on the order of their inputs. You can control this order with an optional ORDER BY clause. ORDER BY can be particularly useful when considering budget problems, as this example shows:

 SELECT request, sum(request)
 FROM projects ORDER BY priority;
Example proposal data
request sum(request)
35 250
110 250
25 250
80 250

Now change the query to:

SELECT request, sum(request) OVER (ORDER BY priority)
FROM projects ORDER BY priority;

and notice the subtly different result:

Example cumulative proposal data
request sum(request)
35 35
110 145
25 170
80 250

One way to read this table in English is: “If we fund only the top-priority project, our total expense will be 35; if we fund the top two in priority, we’ll spend 145; …” This is a view of data that decision-makers frequently prize – and a window function makes it easy to see!

Alternatives

It’s certainly possible to compute the same results without window functions; plenty of applications do. Many retrieve items from a table into a host language such as Java or Ruby, then use procedural logic in the host language to calculate the same “windows.” Among other disadvantages, this approach has the potential to transmit large volumes of undisplayed intermediate data over the network, and thus slow overall performance.

Another alternative is to use stored procedures or user-defined functions on the RDBMS side. While this minimizes runtime network traffic, it generally still falls short of window functions in clarity and query performance. When written from scratch, code to compute a window function often takes four to 10 times as many lines as the simpler window function-based query.

Scope and Status of Window Functions in PostgreSQL

The SQL/OLAP amendment to SQL:1999 (SQL:1999 is a synonym for SQL3) introduced a <window> clause to act on collections or partitions of data. Not long after this, the SQL:2003 standard formalized window functions as described here. The subsequent SQL:2008 standard (labeled “SQL-2008″ in some Postgres documents) slightly refined the definition of window functions.

PostgreSQL gained its window functions with the release of 8.4 in mid-2009. Postgres targets the SQL:2008 definition with its implementation. The principal difference between the Postgres implementation and the SQL:2008 definition has to do with specialized treatment of NULLS; see the Note at the bottom of the Window Functions documentation page for more details. Also, in Postgres, any built-in or user-defined aggregate function, such as count, max, bit_or, or xmlagg, is available for computation as a window function.

Many of the queries or reports that arise in business intelligence (BI) and analytical applications can be expressed naturally in terms of window functions. When you prepare a table where a common value ties together several distinct rows, it’s worth your time to investigate whether your calculation can best be done as a window function. You will save both your own development time, and runtime resource usage, when you write window functions. They’re among the best and most useful standardized extensions to SQL3.

Download the Free OSS Discovery Scanning Tool

Related posts:

  1. Tips for Writing Safe, Secure PHP Code
  2. Pros Tips for Extending LibreOffice
  3. Internet Law Part II: Patent, Trademark and Trade Secret Law
  4. How to Access a PostgreSQL Database from Any Language
  5. More Fun with Vimscript

Related Open-Source Packages

PostgreSQL: See all PostgreSQL Articles » Get PostgreSQL Support at OLEX »
Ruby: See all Ruby Articles » Get Ruby Support at OLEX »

Cameron Laird

Cameron Laird has been developing software, especially in the area of data management, and writing about computing subjects, since before SQL existed as a standard. He keeps an eye on data integrity as vice president for boutique development consultancy Phaseit, Inc.

Leave a Reply

© 2012 OpenLogic, Inc. | Licensing | Privacy Policy | Terms of Use

Bad Behavior has blocked 2289 access attempts in the last 7 days.