Partitioning (Catalyst)¶
Partitioning is an abstraction of partitioning specifications that hint the Spark Physical Optimizer about the number of partitions and data distribution of the output of a physical operator.
Contract¶
Number of Partitions¶
numPartitions: Int
Used when:
PartitioningPreservingUnaryExecNodeunary physical operators are requested for the outputPartitioning- EnsureRequirements physical optimization is executed
ShuffleExchangeExecutility is used to prepareShuffleDependencyValidateRequirementsutility is used tovalidateInternalShuffledJoinphysical operators are requested for the outputPartitioning (forFullOuterjoin type)Partitioningis requested to satisfies, satisfies0
Implementations¶
BroadcastPartitioning¶
BroadcastPartitioning(
mode: BroadcastMode)
- UnspecifiedDistribution
- BroadcastDistribution with the same BroadcastMode
Created when:
BroadcastDistributionis requested to create a PartitioningBroadcastExchangeExecphysical operator is requested for the outputPartitioning
HashPartitioning¶
numPartitions: the given numPartitions
- UnspecifiedDistribution
- ClusteredDistribution with all the hashing expressions included in
clusteringexpressions
createShuffleSpec: HashShuffleSpec
KeyGroupedPartitioning¶
KeyGroupedPartitioning(
expressions: Seq[Expression],
numPartitions: Int,
partitionValues: Seq[InternalRow] = Seq.empty)
satisfies0: ClusteredDistribution
createShuffleSpec: KeyGroupedShuffleSpec
PartitioningCollection¶
PartitioningCollection(
partitionings: Seq[Partitioning])
compatibleWith: Any Partitioning that is compatible with one of the input partitionings
guarantees: Any Partitioning that is guaranteed by any of the input partitionings
numPartitions: Number of partitions of the first Partitioning in the input partitionings
satisfies: Any Distribution that is satisfied by any of the input partitionings
createShuffleSpec: ShuffleSpecCollection
RangePartitioning¶
compatibleWith: RangePartitioning when semantically equal (i.e. underlying expressions are deterministic and canonically equal)
guarantees: RangePartitioning when semantically equal (i.e. underlying expressions are deterministic and canonically equal)
numPartitions: the given numPartitions
- UnspecifiedDistribution
- OrderedDistribution with
requiredOrderingthat matches the inputordering - ClusteredDistribution with all the children of the input
orderingsemantically equal to one of theclusteringexpressions
createShuffleSpec: RangeShuffleSpec
RoundRobinPartitioning¶
RoundRobinPartitioning(
numPartitions: Int)
compatibleWith: Always false
guarantees: Always false
numPartitions: the given numPartitions
satisfies: UnspecifiedDistribution
SinglePartition¶
compatibleWith: Any Partitioning with one partition
guarantees: Any Partitioning with one partition
satisfies: Any Distribution except BroadcastDistribution
createShuffleSpec: SinglePartitionShuffleSpec
UnknownPartitioning¶
UnknownPartitioning(
numPartitions: Int)
compatibleWith: Always false
guarantees: Always false
numPartitions: the given numPartitions
satisfies: UnspecifiedDistribution
Satisfying Distribution¶
satisfies(
required: Distribution): Boolean
satisfies is true when all the following hold:
- The optional required number of partitions of the given Distribution is the number of partitions of this
Partitioning - satisfies0 holds
Final Method
satisfies is a Scala final method and may not be overridden in subclasses.
Learn more in the Scala Language Specification.
satisfies is used when:
- RemoveRedundantSorts physical optimization is executed
- EnsureRequirements physical optimization is executed
- AdaptiveSparkPlanExec leaf physical operator is executed
satisfies0¶
satisfies0(
required: Distribution): Boolean
satisfies0 is true when either holds:
- The given Distribution is a UnspecifiedDistribution
- The given Distribution is a AllTuples and the number of partitions is
1.
Note
satisfies0 can be overriden by subclasses if needed (to influence the final satisfies).
createShuffleSpec¶
createShuffleSpec(
distribution: ClusteredDistribution): ShuffleSpec
createShuffleSpec gives a ShuffleSpec.
createShuffleSpec throws an IllegalStateException by default (and is supposed to be overriden by subclasses if needed):
Unexpected partitioning: [className]
createShuffleSpec is used when:
- EnsureRequirements physical optimization is executed
ValidateRequirementsis requested tovalidate(for OptimizeSkewedJoin physical optimization and AdaptiveSparkPlanExec physical operator)