From the official documentation of PostgreSQL:
Window functions provide the ability to perform calculations across sets of rows that are related to the current query row.
Since window functions are a subset of standard functions in Spark SQL, they generate a value for every group of rows that are associated with the current row by some relation.
Window functions require a window specification (WindowSpec) that defines which rows are included in a window (frame, i.e. the set of rows that are associated with the current row by some relation).
Window utility is used to create a
WindowSpec to be refined further using Window operators.
import org.apache.spark.sql.expressions.Window val byHTokens = Window.partitionBy('token startsWith "h")
import org.apache.spark.sql.expressions.WindowSpec assert(byHTokens.isInstanceOf[WindowSpec])
import org.apache.spark.sql.expressions.Window val windowSpec = Window .partitionBy($"orderId") .orderBy($"time")
WindowSpec defined, Column.over operator is used to associate the
WindowSpec with aggregate or window functions.
import org.apache.spark.sql.functions.rank rank.over(byHTokens)
import org.apache.spark.sql.functions.first first.over(windowSpec)
WindowSpecDefinition expression enforces the following requirements on WindowFrames:
UnspecifiedFrames are allowed (and should be resolved during analysis)
- A range window frame cannot be used in an unordered window specification.
- A range window frame with value boundaries cannot be used in a window specification with multiple order by expressions
- The data type in the order specification ought to match the data type of the range frame