spark.sql.adaptive Configuration Properties¶
spark.sql.adaptive is a family of the Configuration Properties of Adaptive Query Execution.
advisoryPartitionSizeInBytes¶
spark.sql.adaptive.advisoryPartitionSizeInBytes
The advisory size in bytes of the shuffle partition during adaptive optimization (when spark.sql.adaptive.enabled is enabled). It takes effect when Spark coalesces small shuffle partitions or splits skewed shuffle partition.
Default: 64MB
Fallback Property: spark.sql.adaptive.shuffle.targetPostShuffleInputSize
Use SQLConf.ADVISORY_PARTITION_SIZE_IN_BYTES to reference the name.
autoBroadcastJoinThreshold¶
spark.sql.adaptive.autoBroadcastJoinThreshold
The maximum size (in bytes) of a table to be broadcast when performing a join. -1 turns broadcasting off. The default value is same as spark.sql.autoBroadcastJoinThreshold.
Used only in Adaptive Query Execution
Default: (undefined)
Available as SQLConf.ADAPTIVE_AUTO_BROADCASTJOIN_THRESHOLD value.
coalescePartitions.enabled¶
spark.sql.adaptive.coalescePartitions.enabled
Controls coalescing shuffle partitions
When true and spark.sql.adaptive.enabled is enabled, Spark will coalesce contiguous shuffle partitions according to the target size (specified by spark.sql.adaptive.advisoryPartitionSizeInBytes), to avoid too many small tasks.
Default: true
Use SQLConf.coalesceShufflePartitionsEnabled method to access the current value.
coalescePartitions.minPartitionSize¶
spark.sql.adaptive.coalescePartitions.minPartitionSize
The minimum size (in bytes unless specified) of shuffle partitions after coalescing. This is useful when the adaptively calculated target size is too small during partition coalescing
Default: 1MB
Use SQLConf.coalesceShufflePartitionsEnabled method to access the current value.
coalescePartitions.minPartitionSize¶
spark.sql.adaptive.coalescePartitions.minPartitionSize
The minimum size (in bytes) of shuffle partitions after coalescing.
Useful when the adaptively calculated target size is too small during partition coalescing.
Default: (undefined)
Must be positive
Used when:
- CoalesceShufflePartitions adaptive physical optimization is executed
coalescePartitions.initialPartitionNum¶
spark.sql.adaptive.coalescePartitions.initialPartitionNum
The initial number of shuffle partitions before coalescing.
By default it equals to spark.sql.shuffle.partitions. If not set, the default value is the default parallelism of the Spark cluster. This configuration only has an effect when spark.sql.adaptive.enabled and spark.sql.adaptive.coalescePartitions.enabled are both enabled.
Default: (undefined)
coalescePartitions.parallelismFirst¶
spark.sql.adaptive.coalescePartitions.parallelismFirst
When true, Spark does not respect the target size specified by spark.sql.adaptive.advisoryPartitionSizeInBytes when coalescing contiguous shuffle partitions, but adaptively calculate the target size according to the default parallelism of the Spark cluster. The calculated size is usually smaller than the configured target size. This is to maximize the parallelism and avoid performance regression when enabling adaptive query execution. It's recommended to set this config to false and respect the configured target size.
Default: true
Use SQLConf.coalesceShufflePartitionsEnabled method to access the current value.
customCostEvaluatorClass¶
spark.sql.adaptive.customCostEvaluatorClass
The fully-qualified class name of the CostEvaluator in Adaptive Query Execution
Default: SimpleCostEvaluator
Use SQLConf.ADAPTIVE_CUSTOM_COST_EVALUATOR_CLASS method to access the property (in a type-safe way).
Used when:
AdaptiveSparkPlanExecphysical operator is requested for the AQE cost evaluator
enabled¶
spark.sql.adaptive.enabled
Enables Adaptive Query Execution
Default: true
Use SQLConf.adaptiveExecutionEnabled method to access the current value.
fetchShuffleBlocksInBatch¶
spark.sql.adaptive.fetchShuffleBlocksInBatch
(internal) Whether to fetch the contiguous shuffle blocks in batch. Instead of fetching blocks one by one, fetching contiguous shuffle blocks for the same map task in batch can reduce IO and improve performance. Note, multiple contiguous blocks exist in single "fetch request only happen when spark.sql.adaptive.enabled and spark.sql.adaptive.coalescePartitions.enabled are both enabled. This feature also depends on a relocatable serializer, the concatenation support codec in use and the new version shuffle fetch protocol.
Default: true
Use SQLConf.fetchShuffleBlocksInBatch method to access the current value.
forceApply¶
spark.sql.adaptive.forceApply
(internal) When true (together with spark.sql.adaptive.enabled enabled), Spark will force apply adaptive query execution for all supported queries.
Default: false
Use SQLConf.ADAPTIVE_EXECUTION_FORCE_APPLY method to access the property (in a type-safe way).
forceOptimizeSkewedJoin¶
spark.sql.adaptive.forceOptimizeSkewedJoin
Enables OptimizeSkewedJoin physical optimization to be executed even if it introduces extra shuffle
Default: false
Requires spark.sql.adaptive.skewJoin.enabled to be enabled
Use SQLConf.ADAPTIVE_FORCE_OPTIMIZE_SKEWED_JOIN to access the property (in a type-safe way).
Used when:
AdaptiveSparkPlanExecphysical operator is requested for the AQE cost evaluator (and creates a SimpleCostEvaluator)- OptimizeSkewedJoin physical optimization is executed
localShuffleReader.enabled¶
spark.sql.adaptive.localShuffleReader.enabled
When true (and spark.sql.adaptive.enabled is true), Spark SQL tries to use local shuffle reader to read the shuffle data when the shuffle partitioning is not needed, for example, after converting sort-merge join to broadcast-hash join.
Default: true
Use SQLConf.LOCAL_SHUFFLE_READER_ENABLED to access the property (in a type-safe way)
logLevel¶
spark.sql.adaptive.logLevel
(internal) Log level for adaptive execution logging of plan changes. The value can be TRACE, DEBUG, INFO, WARN or ERROR.
Default: DEBUG
Use SQLConf.adaptiveExecutionLogLevel for the current value
maxShuffledHashJoinLocalMapThreshold¶
spark.sql.adaptive.maxShuffledHashJoinLocalMapThreshold
The maximum size (in bytes) per partition that can be allowed to build local hash map. If this value is not smaller than spark.sql.adaptive.advisoryPartitionSizeInBytes and all the partition size are not larger than this config, join selection prefer to use shuffled hash join instead of sort merge join regardless of the value of spark.sql.join.preferSortMergeJoin.
Default: 0
Available as SQLConf.ADAPTIVE_MAX_SHUFFLE_HASH_JOIN_LOCAL_MAP_THRESHOLD
nonEmptyPartitionRatioForBroadcastJoin¶
spark.sql.adaptive.nonEmptyPartitionRatioForBroadcastJoin
(internal) A relation with a non-empty partition ratio (the number of non-empty partitions to all partitions) lower than this config will not be considered as the build side of a broadcast-hash join in Adaptive Query Execution regardless of the size.
Effective with spark.sql.adaptive.enabled true
Default: 0.2
Use SQLConf.nonEmptyPartitionRatioForBroadcastJoin method to access the current value.
optimizer.excludedRules¶
spark.sql.adaptive.optimizer.excludedRules
A comma-separated list of rules (names) to be disabled (excluded) in the AQE Logical Optimizer
Default: undefined
Use SQLConf.ADAPTIVE_OPTIMIZER_EXCLUDED_RULES to reference the property.
Used when:
AQEOptimizeris requested for the batches
optimizeSkewsInRebalancePartitions.enabled¶
When true and spark.sql.adaptive.enabled is true, Spark SQL will optimize the skewed shuffle partitions in RebalancePartitions and split them to smaller ones according to the target size (specified by spark.sql.adaptive.advisoryPartitionSizeInBytes), to avoid data skew
Default: true
Use SQLConf.ADAPTIVE_OPTIMIZE_SKEWS_IN_REBALANCE_PARTITIONS_ENABLED method to access the property (in a type-safe way)
skewJoin.enabled¶
spark.sql.adaptive.skewJoin.enabled
When true and spark.sql.adaptive.enabled is enabled, Spark dynamically handles skew in sort-merge join by splitting (and replicating if needed) skewed partitions.
Default: true
Use SQLConf.SKEW_JOIN_ENABLED to reference the property.
skewJoin.skewedPartitionFactor¶
spark.sql.adaptive.skewJoin.skewedPartitionFactor
A partition is considered skewed if its size is larger than this factor multiplying the median partition size and also larger than spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes.
Default: 5
skewJoin.skewedPartitionThresholdInBytes¶
spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes
A partition is considered skewed if its size in bytes is larger than this threshold and also larger than spark.sql.adaptive.skewJoin.skewedPartitionFactor multiplying the median partition size. Ideally this config should be set larger than spark.sql.adaptive.advisoryPartitionSizeInBytes.
Default: 256MB