Partitioning (Catalyst)¶
Partitioning
is an abstraction of partitioning specifications that hint the Spark Physical Optimizer about the number of partitions and data distribution of the output of a physical operator.
Contract¶
Number of Partitions¶
numPartitions: Int
Used when:
PartitioningPreservingUnaryExecNode
unary physical operators are requested for the outputPartitioning- EnsureRequirements physical optimization is executed
ShuffleExchangeExec
utility is used to prepareShuffleDependencyValidateRequirements
utility is used tovalidateInternal
ShuffledJoin
physical operators are requested for the outputPartitioning (forFullOuter
join type)Partitioning
is requested to satisfies, satisfies0
Implementations¶
BroadcastPartitioning¶
BroadcastPartitioning(
mode: BroadcastMode)
- UnspecifiedDistribution
- BroadcastDistribution with the same BroadcastMode
Created when:
BroadcastDistribution
is requested to create a PartitioningBroadcastExchangeExec
physical operator is requested for the outputPartitioning
HashPartitioning¶
numPartitions: the given numPartitions
- UnspecifiedDistribution
- ClusteredDistribution with all the hashing expressions included in
clustering
expressions
createShuffleSpec: HashShuffleSpec
KeyGroupedPartitioning¶
KeyGroupedPartitioning(
expressions: Seq[Expression],
numPartitions: Int,
partitionValues: Seq[InternalRow] = Seq.empty)
satisfies0: ClusteredDistribution
createShuffleSpec: KeyGroupedShuffleSpec
PartitioningCollection¶
PartitioningCollection(
partitionings: Seq[Partitioning])
compatibleWith: Any Partitioning
that is compatible with one of the input partitionings
guarantees: Any Partitioning
that is guaranteed by any of the input partitionings
numPartitions: Number of partitions of the first Partitioning
in the input partitionings
satisfies: Any Distribution
that is satisfied by any of the input partitionings
createShuffleSpec: ShuffleSpecCollection
RangePartitioning¶
compatibleWith: RangePartitioning
when semantically equal (i.e. underlying expressions are deterministic and canonically equal)
guarantees: RangePartitioning
when semantically equal (i.e. underlying expressions are deterministic and canonically equal)
numPartitions: the given numPartitions
- UnspecifiedDistribution
- OrderedDistribution with
requiredOrdering
that matches the inputordering
- ClusteredDistribution with all the children of the input
ordering
semantically equal to one of theclustering
expressions
createShuffleSpec: RangeShuffleSpec
RoundRobinPartitioning¶
RoundRobinPartitioning(
numPartitions: Int)
compatibleWith: Always false
guarantees: Always false
numPartitions: the given numPartitions
satisfies: UnspecifiedDistribution
SinglePartition¶
compatibleWith: Any Partitioning
with one partition
guarantees: Any Partitioning
with one partition
satisfies: Any Distribution
except BroadcastDistribution
createShuffleSpec: SinglePartitionShuffleSpec
UnknownPartitioning¶
UnknownPartitioning(
numPartitions: Int)
compatibleWith: Always false
guarantees: Always false
numPartitions: the given numPartitions
satisfies: UnspecifiedDistribution
Satisfying Distribution¶
satisfies(
required: Distribution): Boolean
satisfies
is true
when all the following hold:
- The optional required number of partitions of the given Distribution is the number of partitions of this
Partitioning
- satisfies0 holds
Final Method
satisfies
is a Scala final method and may not be overridden in subclasses.
Learn more in the Scala Language Specification.
satisfies
is used when:
- RemoveRedundantSorts physical optimization is executed
- EnsureRequirements physical optimization is executed
- AdaptiveSparkPlanExec leaf physical operator is executed
satisfies0¶
satisfies0(
required: Distribution): Boolean
satisfies0
is true
when either holds:
- The given Distribution is a UnspecifiedDistribution
- The given Distribution is a AllTuples and the number of partitions is
1
.
Note
satisfies0
can be overriden by subclasses if needed (to influence the final satisfies).
createShuffleSpec¶
createShuffleSpec(
distribution: ClusteredDistribution): ShuffleSpec
createShuffleSpec
gives a ShuffleSpec.
createShuffleSpec
throws an IllegalStateException
by default (and is supposed to be overriden by subclasses if needed):
Unexpected partitioning: [className]
createShuffleSpec
is used when:
- EnsureRequirements physical optimization is executed
ValidateRequirements
is requested tovalidate
(for OptimizeSkewedJoin physical optimization and AdaptiveSparkPlanExec physical operator)