Dynamic Partition Pruning¶
Dynamic Partition Pruning (DPP) is an optimization of JOIN batch queries of partitioned tables using partition columns in a join condition. The idea is to push filter conditions down to the large fact table and reduce the number of rows to scan.
The best results are expected in JOIN queries between a large fact table and a much smaller dimension table (star-schema queries).
Dynamic Partition Pruning is applied to a query at logical optimization phase using PartitionPruning and CleanupDynamicPruningFilters optimization rules.
Dynamic Partition Pruning optimization is controlled by spark.sql.optimizer.dynamicPartitionPruning.enabled configuration property.
Streaming Queries
Dynamic Partition Pruning is not applied to streaming queries.
Demo¶
Demo: Dynamic Partition Pruning