Skip to content

Partitioning (Catalyst)

Partitioning is an abstraction of partitioning specifications that hint the Spark Physical Optimizer about the number of partitions and data distribution of the output of a physical operator.

Contract

Number of Partitions

numPartitions: Int

Used when:

Implementations

BroadcastPartitioning

BroadcastPartitioning(
  mode: BroadcastMode)

numPartitions: 1

satisfies:

Created when:

HashPartitioning

HashPartitioning

numPartitions: the given numPartitions

satisfies:

createShuffleSpec: HashShuffleSpec

KeyGroupedPartitioning

KeyGroupedPartitioning(
  expressions: Seq[Expression],
  numPartitions: Int,
  partitionValues: Seq[InternalRow] = Seq.empty)

satisfies0: ClusteredDistribution

createShuffleSpec: KeyGroupedShuffleSpec

PartitioningCollection

PartitioningCollection(
  partitionings: Seq[Partitioning])

compatibleWith: Any Partitioning that is compatible with one of the input partitionings

guarantees: Any Partitioning that is guaranteed by any of the input partitionings

numPartitions: Number of partitions of the first Partitioning in the input partitionings

satisfies: Any Distribution that is satisfied by any of the input partitionings

createShuffleSpec: ShuffleSpecCollection

RangePartitioning

compatibleWith: RangePartitioning when semantically equal (i.e. underlying expressions are deterministic and canonically equal)

guarantees: RangePartitioning when semantically equal (i.e. underlying expressions are deterministic and canonically equal)

numPartitions: the given numPartitions

satisfies:

createShuffleSpec: RangeShuffleSpec

RoundRobinPartitioning

RoundRobinPartitioning(
  numPartitions: Int)

compatibleWith: Always false

guarantees: Always false

numPartitions: the given numPartitions

satisfies: UnspecifiedDistribution

SinglePartition

compatibleWith: Any Partitioning with one partition

guarantees: Any Partitioning with one partition

numPartitions: 1

satisfies: Any Distribution except BroadcastDistribution

createShuffleSpec: SinglePartitionShuffleSpec

UnknownPartitioning

UnknownPartitioning(
  numPartitions: Int)

compatibleWith: Always false

guarantees: Always false

numPartitions: the given numPartitions

satisfies: UnspecifiedDistribution

Satisfying Distribution

satisfies(
  required: Distribution): Boolean

satisfies is true when all the following hold:

  1. The optional required number of partitions of the given Distribution is the number of partitions of this Partitioning
  2. satisfies0 holds
Final Method

satisfies is a Scala final method and may not be overridden in subclasses.

Learn more in the Scala Language Specification.

satisfies is used when:

satisfies0

satisfies0(
  required: Distribution): Boolean

satisfies0 is true when either holds:

Note

satisfies0 can be overriden by subclasses if needed (to influence the final satisfies).

createShuffleSpec

createShuffleSpec(
  distribution: ClusteredDistribution): ShuffleSpec

createShuffleSpec gives a ShuffleSpec.

createShuffleSpec throws an IllegalStateException by default (and is supposed to be overriden by subclasses if needed):

Unexpected partitioning: [className]

createShuffleSpec is used when: