Partitioning¶
Partitioning
is an abstraction of output data partitioning requirements (data distribution) of a Spark SQL connector.
Note
This Partitioning
interface for Spark SQL developers mimics the internal Catalyst Partitioning that is converted into with the help of DataSourcePartitioning.
Contract¶
Number of Partitions¶
int numPartitions()
Used when:
- DataSourcePartitioning is requested for the number of partitions
Satisfying Distribution¶
boolean satisfy(
Distribution distribution)
Used when:
- DataSourcePartitioning is asked whether it satisfies a given data distribution
Implementations¶
- KeyGroupedPartitioning
UnknownPartitioning