Skip to content

CumeDist

CumeDist is a SizeBasedWindowFunction and a RowNumberLike expression that is used for the following:

Demo

import org.apache.spark.sql.catalyst.expressions.CumeDist
val cume_dist = CumeDist()
import org.apache.spark.sql.catalyst.expressions.CumeDist
val cume_dist = CumeDist()
scala> println(cume_dist)
cume_dist()
scala> println(cume_dist.evaluateExpression.numberedTreeString)
00 (cast(rowNumber#0 as double) / cast(window__partition__size#1 as double))
01 :- cast(rowNumber#0 as double)
02 :  +- rowNumber#0: int
03 +- cast(window__partition__size#1 as double)
04    +- window__partition__size#1: int

prettyName

prettyName: String

prettyName is cume_dist as the user-facing name.

prettyName is part of the Expression abstraction.

frame

frame: WindowFrame

frame is a SpecifiedWindowFrame with the following:

  • RangeFrame frame type
  • UnboundedPreceding lower frame boundary
  • CurrentRow upper frame boundary

Note

The frame of a CumeDist expression is range-based instead of row-based, because it has to return the same value for tie values in a window (equal values per ORDER BY specification).

frame is part of the WindowFunction abstraction.

evaluateExpression

evaluateExpression: Expression

evaluateExpression uses the formula rowNumber / n where rowNumber is the row number in a window frame (the number of values before and including the current row) divided by the number of rows in the window frame.

evaluateExpression is part of the DeclarativeAggregate abstraction.