Skip to content


CumeDist is a SizeBasedWindowFunction and a RowNumberLike expression that is used for the following:


import org.apache.spark.sql.catalyst.expressions.CumeDist
val cume_dist = CumeDist()
import org.apache.spark.sql.catalyst.expressions.CumeDist
val cume_dist = CumeDist()
scala> println(cume_dist)
scala> println(cume_dist.evaluateExpression.numberedTreeString)
00 (cast(rowNumber#0 as double) / cast(window__partition__size#1 as double))
01 :- cast(rowNumber#0 as double)
02 :  +- rowNumber#0: int
03 +- cast(window__partition__size#1 as double)
04    +- window__partition__size#1: int


prettyName: String

prettyName is cume_dist as the user-facing name.

prettyName is part of the Expression abstraction.


frame: WindowFrame

frame is a SpecifiedWindowFrame with the following:

  • RangeFrame frame type
  • UnboundedPreceding lower frame boundary
  • CurrentRow upper frame boundary


The frame of a CumeDist expression is range-based instead of row-based, because it has to return the same value for tie values in a window (equal values per ORDER BY specification).

frame is part of the WindowFunction abstraction.


evaluateExpression: Expression

evaluateExpression uses the formula rowNumber / n where rowNumber is the row number in a window frame (the number of values before and including the current row) divided by the number of rows in the window frame.

evaluateExpression is part of the DeclarativeAggregate abstraction.

Back to top