Skip to content

Dynamic Partition Pruning

Dynamic Partition Pruning (DPP) is an optimization of JOIN batch queries of partitioned tables using partition columns in a join condition. The idea is to push filter conditions down to the large fact table and reduce the number of rows to scan.

The best results are expected in JOIN queries between a large fact table and a much smaller dimension table (star-schema queries).

Dynamic Partition Pruning is applied to a query at logical optimization phase using PartitionPruning and CleanupDynamicPruningFilters optimization rules.

Dynamic Partition Pruning optimization is controlled by spark.sql.optimizer.dynamicPartitionPruning.enabled configuration property.

Streaming Queries

Dynamic Partition Pruning is not applied to streaming queries.

Demo

Demo: Dynamic Partition Pruning

References

Articles

Videos