CumeDist¶
CumeDist
is a SizeBasedWindowFunction
and a RowNumberLike
expression that is used for the following:
Demo¶
import org.apache.spark.sql.catalyst.expressions.CumeDist
val cume_dist = CumeDist()
import org.apache.spark.sql.catalyst.expressions.CumeDist
val cume_dist = CumeDist()
scala> println(cume_dist)
cume_dist()
scala> println(cume_dist.evaluateExpression.numberedTreeString)
00 (cast(rowNumber#0 as double) / cast(window__partition__size#1 as double))
01 :- cast(rowNumber#0 as double)
02 : +- rowNumber#0: int
03 +- cast(window__partition__size#1 as double)
04 +- window__partition__size#1: int
prettyName¶
prettyName: String
prettyName
is cume_dist as the user-facing name.
prettyName
is part of the Expression abstraction.
frame¶
frame: WindowFrame
frame
is a SpecifiedWindowFrame
with the following:
RangeFrame
frame typeUnboundedPreceding
lower frame boundaryCurrentRow
upper frame boundary
Note
The frame of a CumeDist
expression is range-based instead of row-based, because it has to return the same value for tie values in a window (equal values per ORDER BY
specification).
frame
is part of the WindowFunction abstraction.
evaluateExpression¶
evaluateExpression: Expression
evaluateExpression
is part of the DeclarativeAggregate abstraction.
evaluateExpression
uses the formula rowNumber / n
where rowNumber
is the row number in a window frame (the number of values before and including the current row) divided by the number of rows in the window frame.