Skip to content

SQLConf

SQLConf is an internal configuration store of the configuration properties and hints used in Spark SQL.

Important

SQLConf is an internal part of Spark SQL and is not supposed to be used directly. Spark SQL configuration is available through the developer-facing RuntimeConfig.

SQLConf offers methods to get, set, unset or clear values of the configuration properties and hints as well as to read the current values.

Accessing SQLConf

You can access a SQLConf using:

  • SQLConf.get (preferred) - the SQLConf of the current active SparkSession

  • SessionState - direct access through SessionState of the SparkSession of your choice (that gives more flexibility on what SparkSession is used that can be different from the current active SparkSession)

import org.apache.spark.sql.internal.SQLConf

// Use type-safe access to configuration properties
// using SQLConf.get.getConf
val parallelFileListingInStatsComputation = SQLConf.get.getConf(SQLConf.PARALLEL_FILE_LISTING_IN_STATS_COMPUTATION)

// or even simpler
SQLConf.get.parallelFileListingInStatsComputation
scala> :type spark
org.apache.spark.sql.SparkSession

// Direct access to the session SQLConf
val sqlConf = spark.sessionState.conf
scala> :type sqlConf
org.apache.spark.sql.internal.SQLConf

scala> println(sqlConf.offHeapColumnVectorEnabled)
false

// Or simply import the conf value
import spark.sessionState.conf

// accessing properties through accessor methods
scala> conf.numShufflePartitions
res1: Int = 200

// Prefer SQLConf.get (over direct access)
import org.apache.spark.sql.internal.SQLConf
val cc = SQLConf.get
scala> cc == conf
res4: Boolean = true

// setting properties using aliases
import org.apache.spark.sql.internal.SQLConf.SHUFFLE_PARTITIONS
conf.setConf(SHUFFLE_PARTITIONS, 2)
scala> conf.numShufflePartitions
res2: Int = 2

// unset aka reset properties to the default value
conf.unsetConf(SHUFFLE_PARTITIONS)
scala> conf.numShufflePartitions
res3: Int = 200

ADAPTIVE_AUTO_BROADCASTJOIN_THRESHOLD

spark.sql.adaptive.autoBroadcastJoinThreshold

Used when:

ADAPTIVE_EXECUTION_FORCE_APPLY

spark.sql.adaptive.forceApply configuration property

Used when:

adaptiveExecutionEnabled

The value of spark.sql.adaptive.enabled configuration property

Used when:

adaptiveExecutionLogLevel

The value of spark.sql.adaptive.logLevel configuration property

Used when AdaptiveSparkPlanExec physical operator is executed

ADAPTIVE_MAX_SHUFFLE_HASH_JOIN_LOCAL_MAP_THRESHOLD

spark.sql.adaptive.maxShuffledHashJoinLocalMapThreshold configuration property

Used when:

ADAPTIVE_OPTIMIZER_EXCLUDED_RULES

spark.sql.adaptive.optimizer.excludedRules

ADVISORY_PARTITION_SIZE_IN_BYTES

spark.sql.adaptive.advisoryPartitionSizeInBytes configuration property

Used when:

autoBroadcastJoinThreshold

The value of spark.sql.autoBroadcastJoinThreshold configuration property

Used when:

autoBucketedScanEnabled

The value of spark.sql.sources.bucketing.autoBucketedScan.enabled configuration property

Used when:

allowStarWithSingleTableIdentifierInCount

spark.sql.legacy.allowStarWithSingleTableIdentifierInCount

Used when:

  • ResolveReferences logical resolution rule is executed

arrowPySparkSelfDestructEnabled

spark.sql.execution.arrow.pyspark.selfDestruct.enabled

Used when:

  • PandasConversionMixin is requested to toPandas

allowAutoGeneratedAliasForView

spark.sql.legacy.allowAutoGeneratedAliasForView

Used when:

  • ViewHelper utility is used to verifyAutoGeneratedAliasesNotExists

allowNonEmptyLocationInCTAS

spark.sql.legacy.allowNonEmptyLocationInCTAS

Used when:

allowNonEmptyLocationInCTAS

spark.sql.adaptive.optimizeSkewsInRebalancePartitions.enabled

Used when:

  • OptimizeSkewInRebalancePartitions physical optimization is executed

ADAPTIVE_CUSTOM_COST_EVALUATOR_CLASS

spark.sql.adaptive.customCostEvaluatorClass

autoSizeUpdateEnabled

The value of spark.sql.statistics.size.autoUpdate.enabled configuration property

Used when:

avroCompressionCodec

The value of spark.sql.avro.compression.codec configuration property

Used when AvroOptions is requested for the compression configuration property (and it was not set explicitly)

broadcastTimeout

The value of spark.sql.broadcastTimeout configuration property

Used in BroadcastExchangeExec (for broadcasting a table to executors)

bucketingEnabled

The value of spark.sql.sources.bucketing.enabled configuration property

Used when FileSourceScanExec physical operator is requested for the input RDD and to determine output partitioning and ordering

cacheVectorizedReaderEnabled

The value of spark.sql.inMemoryColumnarStorage.enableVectorizedReader configuration property

Used when InMemoryTableScanExec physical operator is requested for supportsBatch flag.

CAN_CHANGE_CACHED_PLAN_OUTPUT_PARTITIONING

spark.sql.optimizer.canChangeCachedPlanOutputPartitioning

Used when:

caseSensitiveAnalysis

The value of spark.sql.caseSensitive configuration property

cboEnabled

The value of spark.sql.cbo.enabled configuration property

Used in:

cliPrintHeader

spark.sql.cli.print.header

Used when:

  • SparkSQLCLIDriver is requested to processCmd

coalesceBucketsInJoinEnabled

The value of spark.sql.bucketing.coalesceBucketsInJoin.enabled configuration property

Used when:

COALESCE_PARTITIONS_MIN_PARTITION_SIZE

spark.sql.adaptive.coalescePartitions.minPartitionSize configuration property

Used when:

COALESCE_PARTITIONS_PARALLELISM_FIRST

spark.sql.adaptive.coalescePartitions.parallelismFirst configuration property

Used when:

coalesceShufflePartitionsEnabled

The value of spark.sql.adaptive.coalescePartitions.enabled configuration property

Used when:

codegenCacheMaxEntries

spark.sql.codegen.cache.maxEntries

columnBatchSize

The value of spark.sql.inMemoryColumnarStorage.batchSize configuration property

Used when:

constraintPropagationEnabled

The value of spark.sql.constraintPropagation.enabled configuration property

Used when:

CONVERT_METASTORE_ORC

The value of spark.sql.hive.convertMetastoreOrc configuration property

Used when RelationConversions logical post-hoc evaluation rule is executed (and requested to isConvertible)

CONVERT_METASTORE_PARQUET

The value of spark.sql.hive.convertMetastoreParquet configuration property

Used when RelationConversions logical post-hoc evaluation rule is executed (and requested to isConvertible)

csvExpressionOptimization

spark.sql.optimizer.enableCsvExpressionOptimization

Used when:

  • OptimizeCsvJsonExprs logical optimization is executed

dataFramePivotMaxValues

The value of spark.sql.pivotMaxValues configuration property

Used in pivot operator.

dataFrameRetainGroupColumns

spark.sql.retainGroupColumns

decorrelateInnerQueryEnabled

spark.sql.optimizer.decorrelateInnerQuery.enabled

Used when:

DEFAULT_CATALOG

The value of spark.sql.defaultCatalog configuration property

Used when CatalogManager is requested for the current CatalogPlugin

defaultDataSourceName

spark.sql.sources.default

defaultSizeInBytes

spark.sql.defaultSizeInBytes

Used when:

dynamicPartitionPruningEnabled

spark.sql.optimizer.dynamicPartitionPruning.enabled

dynamicPartitionPruningFallbackFilterRatio

The value of spark.sql.optimizer.dynamicPartitionPruning.fallbackFilterRatio configuration property

Used when:

dynamicPartitionPruningPruningSideExtraFilterRatio

The value of spark.sql.optimizer.dynamicPartitionPruning.pruningSideExtraFilterRatio configuration property

Used when:

dynamicPartitionPruningReuseBroadcastOnly

spark.sql.optimizer.dynamicPartitionPruning.reuseBroadcastOnly

dynamicPartitionPruningUseStats

spark.sql.optimizer.dynamicPartitionPruning.useStats

ENABLE_FULL_OUTER_SHUFFLED_HASH_JOIN_CODEGEN

spark.sql.codegen.join.fullOuterShuffledHashJoin.enabled

enableDefaultColumns

spark.sql.defaultColumn.enabled

enableRadixSort

spark.sql.sort.enableRadixSort

Used when:

enableTwoLevelAggMap

spark.sql.codegen.aggregate.map.twolevel.enabled

enableVectorizedHashMap

spark.sql.codegen.aggregate.map.vectorized.enable

exchangeReuseEnabled

spark.sql.exchange.reuse

Used when:

fallBackToHdfsForStatsEnabled

spark.sql.statistics.fallBackToHdfs

Used when DetermineTableStats logical resolution rule is executed.

fastHashAggregateRowMaxCapacityBit

spark.sql.codegen.aggregate.fastHashMap.capacityBit

fetchShuffleBlocksInBatch

The value of spark.sql.adaptive.fetchShuffleBlocksInBatch configuration property

Used when ShuffledRowRDD is created

fileCommitProtocolClass

spark.sql.sources.commitProtocolClass

fileCompressionFactor

The value of spark.sql.sources.fileCompressionFactor configuration property

Used when:

filesMaxPartitionBytes

spark.sql.files.maxPartitionBytes

filesMinPartitionNum

spark.sql.files.minPartitionNum

filesOpenCostInBytes

spark.sql.files.openCostInBytes

filesourcePartitionFileCacheSize

spark.sql.hive.filesourcePartitionFileCacheSize

histogramEnabled

The value of spark.sql.statistics.histogram.enabled configuration property

Used when AnalyzeColumnCommand logical command is executed.

histogramNumBins

spark.sql.statistics.histogram.numBins

Used when AnalyzeColumnCommand is AnalyzeColumnCommand.md#run[executed] with configuration-properties.md#spark.sql.statistics.histogram.enabled[spark.sql.statistics.histogram.enabled] turned on (and AnalyzeColumnCommand.md#computePercentiles[calculates percentiles]).

HIVE_TABLE_PROPERTY_LENGTH_THRESHOLD

spark.sql.hive.tablePropertyLengthThreshold

Used when:

hugeMethodLimit

spark.sql.codegen.hugeMethodLimit

ignoreCorruptFiles

The value of spark.sql.files.ignoreCorruptFiles configuration property

Used when:

  • AvroUtils utility is requested to inferSchema
  • OrcFileFormat is requested to inferSchema and buildReader
  • FileScanRDD is created (and then to compute a partition)
  • SchemaMergeUtils utility is requested to mergeSchemasInParallel
  • OrcUtils utility is requested to readSchema
  • FilePartitionReader is requested to ignoreCorruptFiles

ignoreMissingFiles

The value of spark.sql.files.ignoreMissingFiles configuration property

Used when:

inMemoryPartitionPruning

spark.sql.inMemoryColumnarStorage.partitionPruning

isParquetBinaryAsString

spark.sql.parquet.binaryAsString

isParquetINT96AsTimestamp

spark.sql.parquet.int96AsTimestamp

isParquetINT96TimestampConversion

spark.sql.parquet.int96TimestampConversion

Used when ParquetFileFormat is requested to build a data reader with partition column values appended.

isParquetSchemaMergingEnabled

spark.sql.parquet.mergeSchema

isParquetSchemaRespectSummaries

spark.sql.parquet.respectSummaryFiles

Used when:

joinReorderEnabled

spark.sql.cbo.joinReorder.enabled

Used in CostBasedJoinReorder logical plan optimization

legacyIntervalEnabled

spark.sql.legacy.interval.enabled

Used when:

limitScaleUpFactor

spark.sql.limit.scaleUpFactor

Used when a physical operator is requested the first n rows as an array.

LOCAL_SHUFFLE_READER_ENABLED

spark.sql.adaptive.localShuffleReader.enabled

Used when:

manageFilesourcePartitions

spark.sql.hive.manageFilesourcePartitions

maxConcurrentOutputFileWriters

The value of spark.sql.maxConcurrentOutputFileWriters configuration property

Used when:

maxMetadataStringLength

spark.sql.maxMetadataStringLength

Used when:

maxRecordsPerFile

spark.sql.files.maxRecordsPerFile

Used when:

maxToStringFields

The value of spark.sql.debug.maxToStringFields configuration property

metastorePartitionPruning

spark.sql.hive.metastorePartitionPruning

Used when HiveTableScanExec physical operator is executed with a partitioned table (and requested for rawPartitions)

methodSplitThreshold

spark.sql.codegen.methodSplitThreshold

Used when:

minNumPostShufflePartitions

spark.sql.adaptive.minNumPostShufflePartitions

Used when EnsureRequirements physical optimization is executed (for Adaptive Query Execution).

nestedSchemaPruningEnabled

The value of spark.sql.optimizer.nestedSchemaPruning.enabled configuration property

Used when SchemaPruning, ColumnPruning and V2ScanRelationPushDown logical optimizations are executed

nonEmptyPartitionRatioForBroadcastJoin

The value of spark.sql.adaptive.nonEmptyPartitionRatioForBroadcastJoin configuration property

Used when:

numShufflePartitions

spark.sql.shuffle.partitions

offHeapColumnVectorEnabled

spark.sql.columnVector.offheap.enabled

rangeExchangeSampleSizePerPartition

The value of spark.sql.execution.rangeExchange.sampleSizePerPartition configuration property

Used when:

REMOVE_REDUNDANT_SORTS_ENABLED

The value of spark.sql.execution.removeRedundantSorts configuration property

Used when:

REPLACE_HASH_WITH_SORT_AGG_ENABLED

spark.sql.execution.replaceHashWithSortAgg

runtimeFilterBloomFilterEnabled

spark.sql.optimizer.runtime.bloomFilter.enabled

RUNTIME_BLOOM_FILTER_MAX_NUM_BITS

spark.sql.optimizer.runtime.bloomFilter.maxNumBits

RUNTIME_FILTER_NUMBER_THRESHOLD

spark.sql.optimizer.runtimeFilter.number.threshold

runtimeFilterSemiJoinReductionEnabled

spark.sql.optimizer.runtimeFilter.semiJoinReduction.enabled

SKEW_JOIN_SKEWED_PARTITION_FACTOR

spark.sql.adaptive.skewJoin.skewedPartitionFactor configuration property

Used when:

SKEW_JOIN_SKEWED_PARTITION_THRESHOLD

spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes configuration property

Used when:

SKEW_JOIN_ENABLED

spark.sql.adaptive.skewJoin.enabled configuration property

Used when:

objectAggSortBasedFallbackThreshold

spark.sql.objectHashAggregate.sortBased.fallbackThreshold

offHeapColumnVectorEnabled

spark.sql.columnVector.offheap.enabled

Used when:

OPTIMIZE_ONE_ROW_RELATION_SUBQUERY

spark.sql.optimizer.optimizeOneRowRelationSubquery

Used when:

  • OptimizeOneRowRelationSubquery logical optimization is executed

optimizeNullAwareAntiJoin

spark.sql.optimizeNullAwareAntiJoin configuration property

Used when:

optimizerExcludedRules

The value of spark.sql.optimizer.excludedRules configuration property

Used when Optimizer is requested for the batches

optimizerInSetConversionThreshold

spark.sql.optimizer.inSetConversionThreshold

Used when OptimizeIn logical query optimization is executed

orcVectorizedReaderNestedColumnEnabled

spark.sql.orc.enableNestedColumnVectorizedReader

Used when:

  • OrcFileFormat is requested to supportBatchForNestedColumn

OUTPUT_COMMITTER_CLASS

spark.sql.sources.outputCommitterClass

Used when:

parallelFileListingInStatsComputation

spark.sql.statistics.parallelFileListingInStatsComputation.enabled

Used when CommandUtils helper object is requested to calculate the total size of a table (with partitions) (for AnalyzeColumnCommand and AnalyzeTableCommand commands)

parquetAggregatePushDown

spark.sql.parquet.aggregatePushdown

parquetCompressionCodec

spark.sql.parquet.compression.codec

Used when:

parquetFilterPushDown

spark.sql.parquet.filterPushdown

parquetFilterPushDownDate

spark.sql.parquet.filterPushdown.date

Used when:

parquetFilterPushDownDecimal

spark.sql.parquet.filterPushdown.decimal

Used when:

parquetFilterPushDownInFilterThreshold

spark.sql.parquet.pushdown.inFilterThreshold

Used when:

parquetFilterPushDownStringPredicate

spark.sql.parquet.filterPushdown.stringPredicate

parquetFilterPushDownStringStartWith

spark.sql.parquet.filterPushdown.string.startsWith

parquetFilterPushDownTimestamp

spark.sql.parquet.filterPushdown.timestamp

Used when:

parquetOutputCommitterClass

spark.sql.parquet.output.committer.class

Used when:

parquetOutputTimestampType

spark.sql.parquet.outputTimestampType

Used when:

parquetRecordFilterEnabled

spark.sql.parquet.recordLevelFilter.enabled

Used when ParquetFileFormat is requested to build a data reader (with partition column values appended).

parquetVectorizedReaderBatchSize

spark.sql.parquet.columnarReaderBatchSize

parquetVectorizedReaderEnabled

spark.sql.parquet.enableVectorizedReader

Used when:

parquetVectorizedReaderNestedColumnEnabled

spark.sql.parquet.enableNestedColumnVectorizedReader

partitionOverwriteMode

The value of spark.sql.sources.partitionOverwriteMode configuration property

Used when InsertIntoHadoopFsRelationCommand logical command is executed

planChangeLogLevel

The value of spark.sql.planChangeLog.level configuration property

Used when:

planChangeBatches

The value of spark.sql.planChangeLog.batches configuration property

Used when:

  • PlanChangeLogger is requested to logBatch

planChangeRules

The value of spark.sql.planChangeLog.rules configuration property

Used when:

  • PlanChangeLogger is requested to logRule

preferSortMergeJoin

spark.sql.join.preferSortMergeJoin

Used in JoinSelection execution planning strategy to prefer sort merge join over shuffle hash join.

LEAF_NODE_DEFAULT_PARALLELISM

spark.sql.leafNodeDefaultParallelism

Used when:

LEGACY_CTE_PRECEDENCE_POLICY

spark.sql.legacy.ctePrecedencePolicy

PROPAGATE_DISTINCT_KEYS_ENABLED

spark.sql.optimizer.propagateDistinctKeys.enabled

replaceDatabricksSparkAvroEnabled

spark.sql.legacy.replaceDatabricksSparkAvro.enabled

replaceExceptWithFilter

spark.sql.optimizer.replaceExceptWithFilter

Used when ReplaceExceptWithFilter logical optimization is executed

runSQLonFile

spark.sql.runSQLOnFiles

Used when:

RUNTIME_BLOOM_FILTER_EXPECTED_NUM_ITEMS

spark.sql.optimizer.runtime.bloomFilter.expectedNumItems

runtimeRowLevelOperationGroupFilterEnabled

spark.sql.optimizer.runtime.rowLevelOperationGroupFilter.enabled

sessionLocalTimeZone

spark.sql.session.timeZone

sessionWindowBufferInMemoryThreshold

spark.sql.sessionWindow.buffer.in.memory.threshold

Used when:

  • UpdatingSessionsExec unary physical operator is executed

sessionWindowBufferSpillThreshold

spark.sql.sessionWindow.buffer.spill.threshold

Used when:

  • UpdatingSessionsExec unary physical operator is executed

sortBeforeRepartition

The value of spark.sql.execution.sortBeforeRepartition configuration property

Used when ShuffleExchangeExec physical operator is executed

starSchemaDetection

spark.sql.cbo.starSchemaDetection

Used in ReorderJoin logical optimization (and indirectly in StarSchemaDetection)

stringRedactionPattern

spark.sql.redaction.string.regex

Used when:

subexpressionEliminationEnabled

spark.sql.subexpressionElimination.enabled

Used when SparkPlan is requested for subexpressionEliminationEnabled flag.

subqueryReuseEnabled

spark.sql.execution.reuseSubquery

Used when:

supportQuotedRegexColumnName

spark.sql.parser.quotedRegexColumnNames

Used when:

targetPostShuffleInputSize

spark.sql.adaptive.shuffle.targetPostShuffleInputSize

Used when EnsureRequirements physical optimization is executed (for Adaptive Query Execution)

THRIFTSERVER_FORCE_CANCEL

spark.sql.thriftServer.interruptOnCancel

Used when:

  • SparkExecuteStatementOperation is created (forceCancel)

truncateTableIgnorePermissionAcl

spark.sql.truncateTable.ignorePermissionAcl.enabled

Used when TruncateTableCommand logical command is executed

useCompression

The value of spark.sql.inMemoryColumnarStorage.compressed configuration property

Used when CacheManager is requested to cache a structured query

useObjectHashAggregation

spark.sql.execution.useObjectHashAggregateExec

Used when Aggregation execution planning strategy is executed (and uses AggUtils to create an aggregation physical operator).

v2BucketingPartiallyClusteredDistributionEnabled

spark.sql.sources.v2.bucketing.partiallyClusteredDistribution.enabled

v2BucketingPushPartValuesEnabled

spark.sql.sources.v2.bucketing.pushPartValues.enabled

variableSubstituteEnabled

spark.sql.variable.substitute

Used when:

wholeStageEnabled

spark.sql.codegen.wholeStage

Used in:

wholeStageFallback

spark.sql.codegen.fallback

wholeStageMaxNumFields

spark.sql.codegen.maxFields

Used in:

wholeStageSplitConsumeFuncByOperator

spark.sql.codegen.splitConsumeFuncByOperator

Used when CodegenSupport is requested to consume

wholeStageUseIdInClassName

spark.sql.codegen.useIdInClassName

Used when WholeStageCodegenExec is requested to generate the Java source code for the child physical plan subtree (when created)

windowExecBufferInMemoryThreshold

spark.sql.windowExec.buffer.in.memory.threshold

Used when:

windowExecBufferSpillThreshold

spark.sql.windowExec.buffer.spill.threshold

Used when: