Skip to content

RangePartitionIdRewrite Optimization Rule

RangePartitionIdRewrite is an optimization rule (Rule[LogicalPlan]) to rewrite RangePartitionIds to PartitionerExprs.

Creating Instance

RangePartitionIdRewrite takes the following to be created:

RangePartitionIdRewrite is created when:

sampleSizePerPartition Hint

RangePartitionIdRewrite uses spark.sql.execution.rangeExchange.sampleSizePerPartition (Spark SQL) configuration property for samplePointsPerPartitionHint for a RangePartitioner (Spark Core) when executed.

Executing Rule

  plan: LogicalPlan): LogicalPlan

apply is part of the Rule (Spark SQL) abstraction.

apply transforms UnaryNodes with RangePartitionId unary expressions in the given LogicalPlan (from the children first and up).

For every RangePartitionId, apply creates a new logical query plan for sampling (based on the child expression of the RangePartitionId).

import org.apache.spark.sql.functions.lit
val expr = lit(5).expr

import org.apache.spark.sql.catalyst.expressions.Alias
val aliasedExpr = Alias(expr, "__RPI_child_col__")()

apply changes the current call site to the following:

RangePartitionId([childExpr], [numPartitions]) sampling

Call Site

A call site is used in web UI and SparkListeners for all jobs submitted from the current and child threads.

apply creates a RangePartitioner (Spark Core) (with the number of partition of the RangePartitionId, the RDD of the query plan for sampling, ascending flag enabled, and the sampleSizePerPartition hint).

In the end, apply creates a PartitionerExpr with the child expression (of the RangePartitionId) and the RangePartitioner.