RepartitionByExpression Logical Operator¶
RepartitionByExpression
is a concrete RepartitionOperation.
RepartitionByExpression
is also called distribute operator.
Creating Instance¶
RepartitionByExpression
takes the following to be created:
- Partition Expressions
- Child LogicalPlan
- Number of partitions
RepartitionByExpression
is created when:
- Dataset.repartition and Dataset.repartitionByRange operators
COALESCE
,REPARTITION
andREPARTITION_BY_RANGE
hints (via ResolveCoalesceHints logical analysis rule)DISTRIBUTE BY
andCLUSTER BY
SQL clauses (via SparkSqlAstBuilder)
Query Planning¶
RepartitionByExpression
is planned to ShuffleExchangeExec physical operator.
Catalyst DSL¶
Catalyst DSL defines distribute operator to create RepartitionByExpression
logical operators.
Partitioning¶
RepartitionByExpression
determines a Partitioning when created.
Maximum Number of Rows¶
maxRows: Option[Long]
maxRows
simply requests the child logical operator for the maximum number of rows.
maxRows
is part of the LogicalPlan abstraction.
shuffle¶
shuffle: Boolean
shuffle
is always true
.
shuffle
is part of the RepartitionOperation abstraction.