Skip to content

Window Functions

From the official documentation of PostgreSQL:

Window functions provide the ability to perform calculations across sets of rows that are related to the current query row.

Window functions are a subset of standard functions and hence generate a value for every group of rows that are associated with the current row by some relation.

Window functions require a window specification (WindowSpec) that defines which rows are included in a window (frame, i.e. the set of rows that are associated with the current row by some relation).

Window utility is used to create a WindowSpec to be refined further using Window operators.

import org.apache.spark.sql.expressions.Window
val byHTokens = Window.partitionBy('token startsWith "h")
import org.apache.spark.sql.expressions.WindowSpec
assert(byHTokens.isInstanceOf[WindowSpec])
import org.apache.spark.sql.expressions.Window
val windowSpec = Window
  .partitionBy($"orderId")
  .orderBy($"time")

With a WindowSpec defined, Column.over operator is used to associate the WindowSpec with aggregate or window functions.

import org.apache.spark.sql.functions.rank
rank.over(byHTokens)
import org.apache.spark.sql.functions.first
first.over(windowSpec)

Execution

Window functions are executed by WindowExec unary physical operator.

Limitations

WindowSpecDefinition expression enforces the following requirements on WindowFrames:

  1. No UnspecifiedFrames are allowed (and should be resolved during analysis)
  2. A range window frame cannot be used in an unordered window specification.
  3. A range window frame with value boundaries cannot be used in a window specification with multiple order by expressions
  4. The data type in the order specification ought to match the data type of the range frame