RepartitionByExpression Logical Operator¶
RepartitionByExpression is a concrete RepartitionOperation.
RepartitionByExpression is also called distribute operator.
Creating Instance¶
RepartitionByExpression takes the following to be created:
- Partition Expressions
- Child LogicalPlan
- Number of partitions
RepartitionByExpression is created when:
- Dataset.repartition and Dataset.repartitionByRange operators
COALESCE,REPARTITIONandREPARTITION_BY_RANGEhints (via ResolveCoalesceHints logical analysis rule)DISTRIBUTE BYandCLUSTER BYSQL clauses (via SparkSqlAstBuilder)
Query Planning¶
RepartitionByExpression is planned to ShuffleExchangeExec physical operator.
Catalyst DSL¶
Catalyst DSL defines distribute operator to create RepartitionByExpression logical operators.
Partitioning¶
RepartitionByExpression determines a Partitioning when created.
Maximum Number of Rows¶
maxRows: Option[Long]
maxRows simply requests the child logical operator for the maximum number of rows.
maxRows is part of the LogicalPlan abstraction.
shuffle¶
shuffle: Boolean
shuffle is always true.
shuffle is part of the RepartitionOperation abstraction.