Configuration Properties¶

Configuration properties are a way to control features of Delta Lake for the whole cluster (SparkSession, to be precise) and hence for all tables created by the cluster.

Table Properties

Unlike configuration properties, Table Properties are per table only.

spark.databricks.delta¶

Configuration properties use spark.databricks.delta prefix.

alterLocation.bypassSchemaCheck¶

spark.databricks.delta.alterLocation.bypassSchemaCheck

Enables Alter Table Set Location on Delta to go through even if the Delta table in the new location has a different schema from the original Delta table

Default: false

alterTable.changeColumn.checkExpressions¶

spark.databricks.delta.alterTable.changeColumn.checkExpressions (internal)

Given an ALTER TABLE CHANGE COLUMN command, check whether Constraints or Generated Columns use expressions that reference this column (that will be affected by this change and should be changed along).

Turn this off when there is an issue with expression checking logic that prevents a valid column change from going through.

Default: true

Used when:

AlterDeltaTableCommand is requested to checkDependentExpressions

autoCompact.enabled¶

spark.databricks.delta.autoCompact.enabled

Enables Auto Compaction on all writes across all delta tables in this session.

Default: false

Used when:

AutoCompactBase is requested for the type of Auto Compaction

autoCompact.maxFileSize¶

spark.databricks.delta.autoCompact.maxFileSize

(internal) Target file size produced by Auto Compaction

Default: 128 * 1024 * 1024 (128MB)

Used when:

AutoCompactBase is requested to compact

autoCompact.minFileSize¶

spark.databricks.delta.autoCompact.minFileSize

(internal) Files which are smaller than this threshold (in bytes) will be grouped together and rewritten as larger files in Auto Compaction

Default: The half of spark.databricks.delta.autoCompact.maxFileSize

Used when:

OptimisticTransactionImpl is requested to createAutoCompactStatsCollector
AutoCompactBase is requested to compact

autoCompact.modifiedPartitionsOnly.enabled¶

(internal) spark.databricks.delta.autoCompact.modifiedPartitionsOnly.enabled

When enabled, Auto Compaction works only on the modified partitions of the delta transaction that triggers compaction.

Default: true

Used when:

AutoCompactBase is requested to isModifiedPartitionsOnlyAutoCompactEnabled

autoCompact.nonBlindAppend.enabled¶

(internal) spark.databricks.delta.autoCompact.nonBlindAppend.enabled

When enabled, Auto Compaction is only triggered by non-blind-append write transactions.

Default: false

Used when:

AutoCompactBase is requested to isNonBlindAppendAutoCompactEnabled

changeDataFeed.unsafeBatchReadOnIncompatibleSchemaChanges.enabled¶

(internal) spark.databricks.delta.changeDataFeed.unsafeBatchReadOnIncompatibleSchemaChanges.enabled

Enables (unblocks) reading change data in batch (e.g. using table_changes()) on a delta table with column mapping schema operations

It is currently blocked due to potential data loss and schema confusion, and hence considered risky.

Default: false

Used when:

CDCReaderImpl is requested for a DataFrame of changes

checkLatestSchemaOnRead¶

spark.databricks.delta.checkLatestSchemaOnRead enables a check that ensures that users won't read corrupt data if the source schema changes in an incompatible way.

Default: true

Delta always tries to give users the latest version of table data without having to call REFRESH TABLE or redefine their DataFrames when used in the context of streaming. There is a possibility that the schema of the latest version of the table may be incompatible with the schema at the time of DataFrame creation.

checkpoint.partSize¶

spark.databricks.delta.checkpoint.partSize

(internal) The limit checkpoint parallelization starts at. It attempts to write maximum of this many actions per checkpoint.

Default: (undefined) (and assumed 1)

Must be a positive long number

Used when:

Checkpoints is requested to write out a state checkpoint

clusteredTable.enableClusteringTablePreview¶

spark.databricks.delta.clusteredTable.enableClusteringTablePreview

(internal) Controls Liquid Clustering

Default: false

Used when:

ClusteredTableUtilsBase is requested to validatePreviewEnabled
DeltaErrorsBase is requested to clusteringTablePreviewDisabledException

commitInfo.enabled¶

spark.databricks.delta.commitInfo.enabled controls whether to log commit information into a Delta log.

Default: true

commitInfo.userMetadata¶

spark.databricks.delta.commitInfo.userMetadata is an arbitrary user-defined metadata to include in CommitInfo (requires spark.databricks.delta.commitInfo.enabled).

Default: (empty)

commitLock.enabled¶

spark.databricks.delta.commitLock.enabled (internal) controls whether or not to use a lock on a delta table at transaction commit.

Default: (undefined)

Used when:

OptimisticTransactionImpl is requested to isCommitLockEnabled

commitValidation.enabled¶

spark.databricks.delta.commitValidation.enabled (internal) controls whether to perform validation checks before commit or not

Default: true

convert.metadataCheck.enabled¶

spark.databricks.delta.convert.metadataCheck.enabled enables validation during convert to delta, if there is a difference between the catalog table's properties and the Delta table's configuration, we should error.

If disabled, merge the two configurations with the same semantics as update and merge

Default: true

delete.deletionVectors.persistent¶

spark.databricks.delta.delete.deletionVectors.persistent

Enables persistent Deletion Vectors in DELETE command

Default: true

deletionVectors.useMetadataRowIndex¶

spark.databricks.delta.deletionVectors.useMetadataRowIndex

(internal) Enables using the Parquet reader generated row_index column for filtering deleted rows with Deletion Vectors (that gives predicate pushdown and file splitting in scans).

Default: true

Used when:

DeltaParquetFileFormat is requested to buildReaderWithPartitionValues
ScanWithDeletionVectors is requested to createScanWithSkipRowColumn
DeletionVectorBitmapGenerator is requested to buildRowIndexSetsForFilesMatchingCondition
DMLWithDeletionVectorsHelper is requested to replaceFileIndex

dummyFileManager.numOfFiles¶

spark.databricks.delta.dummyFileManager.numOfFiles (internal) controls how many dummy files to write in DummyFileManager

Default: 3

dummyFileManager.prefix¶

spark.databricks.delta.dummyFileManager.prefix (internal) is the file prefix to use in DummyFileManager

Default: .s3-optimization-

dynamicPartitionOverwrite.enabled¶

spark.databricks.delta.dynamicPartitionOverwrite.enabled

(internal) Enables Dynamic Partition Overwrite (whether to overwrite partitions dynamically when partitionOverwriteMode is dynamic in either the SQL configuration or a DataFrameWriter option). When disabled, partitionOverwriteMode will be ignored.

Default: true

Used when:

DeltaWriteOptionsImpl is requested to isDynamicPartitionOverwriteMode

history.maxKeysPerList¶

spark.databricks.delta.history.maxKeysPerList

(internal) How many commits to list when performing a parallel search

Default: 1000

The default is the maximum keys returned by S3 per ListObjectsV2 call. Microsoft Azure can return up to 5000 blobs (including all BlobPrefix elements) in a single List Blobs API call, and hence the default 1000.

Used when:

DeltaLog is requested for the DeltaHistoryManager

history.metricsEnabled¶

spark.databricks.delta.history.metricsEnabled

Enables metrics reporting in DESCRIBE HISTORY (CommitInfo will record the operation metrics when a OptimisticTransactionImpl is committed).

Requires spark.databricks.delta.commitInfo.enabled configuration property to be enabled

Default: true

Used when:

OptimisticTransactionImpl is requested to getOperationMetrics
ConvertToDeltaCommand is requested to streamWrite
SQLMetricsReporting is requested to registerSQLMetrics
TransactionalWrite is requested to writeFiles

Github Commit

The feature was added as part of [SC-24567][DELTA] Add additional metrics to Describe Delta History commit.

identityColumn.enabled¶

spark.databricks.delta.identityColumn.enabled

Enables IDENTITY column support (writer table feature)

Default: true

Used when:

OptimisticTransactionImpl is requested to updateMetadataInternal

import.batchSize.schemaInference¶

spark.databricks.delta.import.batchSize.schemaInference (internal) is the number of files per batch for schema inference during import.

Default: 1000000

import.batchSize.statsCollection¶

spark.databricks.delta.import.batchSize.statsCollection (internal) is the number of files per batch for stats collection during import.

Default: 50000

io.skipping.mdc.addNoise¶

spark.databricks.io.skipping.mdc.addNoise (internal) controls whether or not to add a random byte as a suffix to the interleaved bits when computing the Z-order values for MDC. This can help deal with skew, but may have a negative impact on overall min/max skipping effectiveness.

Default: true

Used when:

SpaceFillingCurveClustering is requested to cluster

io.skipping.mdc.rangeId.max¶

spark.databricks.io.skipping.mdc.rangeId.max (internal) controls the domain of rangeId values to be interleaved. The bigger, the better granularity, but at the expense of performance (more data gets sampled).

Default: 1000

Must be greater than 1

Used when:

SpaceFillingCurveClustering is requested to cluster

io.skipping.stringPrefixLength¶

spark.databricks.io.skipping.stringPrefixLength (internal) The length of the prefix of string columns to store in the data skipping index

Default: 32

Used when:

StatisticsCollection is requested for the statsCollector Column

lastCommitVersionInSession¶

spark.databricks.delta.lastCommitVersionInSession is the version of the last commit made in the SparkSession for any delta table (after OptimisticTransactionImpl is done with doCommit or DeltaCommand with commitLarge)

Default: (undefined)

loadFileSystemConfigsFromDataFrameOptions¶

spark.databricks.delta.loadFileSystemConfigsFromDataFrameOptions (internal) controls whether to load file systems configs provided in DataFrameReader or DataFrameWriter options when calling DataFrameReader.load/DataFrameWriter.save using a Delta table path.

Not supported for DataFrameReader.table and DataFrameWriter.saveAsTable

Default: true

maxCommitAttempts¶

spark.databricks.delta.maxCommitAttempts (internal) is the maximum number of commit attempts to try for a single commit before failing

Default: 10000000

Used when:

OptimisticTransactionImpl is requested to doCommitRetryIteratively

maxSnapshotLineageLength¶

spark.databricks.delta.maxSnapshotLineageLength (internal) is the maximum lineage length of a Snapshot before Delta forces to build a Snapshot from scratch

Default: 50

merge.deletionVectors.persistent¶

spark.databricks.delta.merge.deletionVectors.persistent

(internal) Enables persistent Deletion Vectors in MERGE command

Default: true

Used when:

MergeIntoCommandBase is requested to shouldWritePersistentDeletionVectors

merge.materializeSource¶

spark.databricks.delta.merge.materializeSource

(internal) When to materialize the source plan during MERGE execution

Value	Meaning
`all`	source always materialized
`auto`	sources not materialized unless non-deterministic
`none`	source never materialized

Default: auto

Used when:

MergeIntoMaterializeSource is requested to shouldMaterializeSource

merge.materializeSource.maxAttempts¶

spark.databricks.delta.merge.materializeSource.maxAttempts

How many times retry execution of MERGE command in case the data (an RDD block) of the materialized source RDD is lost

Default: 4

Used when:

MergeIntoMaterializeSource is requested to runWithMaterializedSourceLostRetries

merge.materializeSource.rddStorageLevel¶

spark.databricks.delta.merge.materializeSource.rddStorageLevel

(internal) What StorageLevel to use to persist the source RDD

Default: DISK_ONLY

Used when:

MergeIntoMaterializeSource is requested to prepare the source table

merge.materializeSource.rddStorageLevelRetry¶

spark.databricks.delta.merge.materializeSource.rddStorageLevelRetry

(internal) What StorageLevel to use to persist the source RDD when MERGE is retried

Default: DISK_ONLY_2

Used when:

MergeIntoMaterializeSource is requested to prepare the source table

merge.maxInsertCount¶

spark.databricks.delta.merge.maxInsertCount (internal) is the maximum row count of inserts in each MERGE execution

Default: 10000L

merge.optimizeInsertOnlyMerge.enabled¶

spark.databricks.delta.merge.optimizeInsertOnlyMerge.enabled

(internal) Enables extra optimization for insert-only merges by avoiding rewriting old files and just inserting new files

Default: true

Used when:

MergeIntoCommand is requested to run a merge (for insert-only merge write)
MergeIntoMaterializeSource is requested to shouldMaterializeSource (for an insert-only merge)

merge.optimizeMatchedOnlyMerge.enabled¶

spark.databricks.delta.merge.optimizeMatchedOnlyMerge.enabled

(internal) Enables optimization of matched-only merges to use a RIGHT OUTER join (instead of a FULL OUTER join) while writing out all merge changes

Default: true

merge.repartitionBeforeWrite.enabled¶

spark.databricks.delta.merge.repartitionBeforeWrite.enabled

(internal) Enables repartitioning of merge output (by the partition columns of a target table if partitioned) before write data(frame) out

Default: true

Used when:

MergeIntoCommandBase is requested to write data(frame) out

optimize.maxFileSize¶

spark.databricks.delta.optimize.maxFileSize

(internal) Target file size produced by OPTIMIZE command.

Default: 1024 * 1024 * 1024 (1GB)

Used when:

OptimizeExecutor is requested to optimize

optimize.maxThreads¶

spark.databricks.delta.optimize.maxThreads (internal) Maximum number of parallel jobs allowed in OPTIMIZE command. Increasing the maximum parallel jobs allows OPTIMIZE command to run faster, but increases the job management on the Spark driver.

Default: 15

Used when:

OptimizeExecutor is requested to optimize

optimize.minFileSize¶

spark.databricks.delta.optimize.minFileSize

(internal) Files which are smaller than this threshold (in bytes) will be grouped together and rewritten as larger files by the OPTIMIZE command.

Default: 1024 * 1024 * 1024 (1GB)

Used when:

OptimizeExecutor is requested to optimize

optimize.repartition.enabled¶

spark.databricks.delta.optimize.repartition.enabled

(internal) Use Dataset.repartition(1) instead of Dataset.coalesce(1) to merge small files. Dataset.coalesce(1) is executed with only 1 task, if there are many tiny files within a bin (e.g. 1000 files of 50MB), it cannot be optimized with more executors. On the other hand, Dataset.repartition(1) incurs a shuffle stage yet the job can be distributed.

Default: false

Used when:

OptimizeExecutor is requested to optimize (and runOptimizeBinJob)

optimize.zorder.checkStatsCollection.enabled¶

spark.databricks.delta.optimize.zorder.checkStatsCollection.enabled

(internal) Controls whether there are column statistics available for the zOrderBy columns of OPTIMIZE command

Default: true

Used when:

OptimizeTableCommandBase is requested to validate zOrderBy columns (and zOrderingOnColumnWithNoStatsException)

partitionColumnValidity.enabled¶

spark.databricks.delta.partitionColumnValidity.enabled (internal) enables validation of the partition column names (just like the data columns)

Default: true

Used when:

OptimisticTransactionImpl is requested to verify a new metadata (with NoMapping column mapping mode)

properties.defaults.minReaderVersion¶

spark.databricks.delta.properties.defaults.minReaderVersion is the default reader protocol version to create new tables with, unless a feature that requires a higher version for correctness is enabled.

Default: 1

Available values: 1

Used when:

Protocol utility is used to create a Protocol

properties.defaults.minWriterVersion¶

spark.databricks.delta.properties.defaults.minWriterVersion is the default writer protocol version to create new tables with, unless a feature that requires a higher version for correctness is enabled.

Default: 2

Available values: 1, 2, 3

Used when:

Protocol utility is used to create a Protocol

replaceWhere.constraintCheck.enabled¶

spark.databricks.delta.replaceWhere.constraintCheck.enabled

Controls whether or not replaceWhere on arbitrary expression and arbitrary columns enforces constraints to replace the target table only when all the rows in the source dataframe match that constraint.

If disabled, it will skip the constraint check and replace with all the rows from the new dataframe.

Default: true

Used when:

WriteIntoDelta is requested to extract constraints

retentionDurationCheck.enabled¶

spark.databricks.delta.retentionDurationCheck.enabled adds a check preventing users from running vacuum with a very short retention period, which may end up corrupting a Delta log.

Default: true

sampling.enabled¶

spark.databricks.delta.sampling.enabled (internal) enables sample-based estimation

Default: false

schema.autoMerge.enabled¶

spark.databricks.delta.schema.autoMerge.enabled

Enables schema merging on appends (UPDATEs) and overwrites (INSERTs).

Default: false

Equivalent DataFrame option: mergeSchema

Used when:

DeltaMergeInto utility is used to resolveReferencesAndSchema
MetadataMismatchErrorBuilder is requested to addSchemaMismatch
DeltaWriteOptionsImpl is requested for canMergeSchema
MergeIntoCommand is requested for canMergeSchema

schema.removeSparkInternalMetadata¶

(internal) spark.databricks.delta.schema.removeSparkInternalMetadata

Whether to remove leaked Spark's internal metadata from the table schema before returning to Spark. These internal metadata might be stored unintentionally in tables created by old Spark versions.

Default: true

Used when:

DeltaTableUtils utility is used to removeInternalMetadata

schema.typeCheck.enabled¶

spark.databricks.delta.schema.typeCheck.enabled (internal) controls whether to check unsupported data types while updating a table schema

Disabling this flag may allow users to create unsupported Delta tables and should only be used when trying to read/write legacy tables.

Default: true

Used when:

OptimisticTransactionImpl is requested for checkUnsupportedDataType flag
DeltaErrorsBase is requested to unsupportedDataTypes

snapshotIsolation.enabled¶

spark.databricks.delta.snapshotIsolation.enabled (internal) controls whether queries on Delta tables are guaranteed to have snapshot isolation

Default: true

snapshotPartitions¶

spark.databricks.delta.snapshotPartitions (internal) is the number of partitions to use for state reconstruction (when building a snapshot of a Delta table).

Default: 50

stalenessLimit¶

spark.databricks.delta.stalenessLimit (in millis) allows you to query the last loaded state of the Delta table without blocking on a table update. You can use this configuration to reduce the latency on queries when up-to-date results are not a requirement. Table updates will be scheduled on a separate scheduler pool in a FIFO queue, and will share cluster resources fairly with your query. If a table hasn't updated past this time limit, we will block on a synchronous state update before running the query.

Default: 0 (no tables can be stale)

state.corruptionIsFatal¶

spark.databricks.delta.state.corruptionIsFatal (internal) throws a fatal error when the recreated Delta State doesn't match committed checksum file

Default: true

stateReconstructionValidation.enabled¶

spark.databricks.delta.stateReconstructionValidation.enabled (internal) controls whether to perform validation checks on the reconstructed state

Default: true

stats.collect¶

spark.databricks.delta.stats.collect

(internal) Enables statistics to be collected while writing files into a delta table

Default: true

Used when:

ConvertToDeltaCommand is executed
TransactionalWrite is requested to write data out (and getOptionalStatsTrackerAndStatsCollection)

stats.collect.using.tableSchema¶

spark.databricks.delta.stats.collect.using.tableSchema

(internal) When collecting stats while writing files into Delta table (with spark.databricks.delta.stats.collect enabled), whether to use the table schema (true) or the DataFrame schema (false) as the stats collection schema.

Default: true

Used when:

TransactionalWrite is requested to getOptionalStatsTrackerAndStatsCollection

stats.limitPushdown.enabled¶

spark.databricks.delta.stats.limitPushdown.enabled (internal) enables using the limit clause and file statistics to prune files before they are collected to the driver

Default: true

stats.localCache.maxNumFiles¶

spark.databricks.delta.stats.localCache.maxNumFiles (internal) is the maximum number of files for a table to be considered a delta small table. Some metadata operations (such as using data skipping) are optimized for small tables using driver local caching and local execution.

Default: 2000

stats.skipping¶

spark.databricks.delta.stats.skipping (internal) enables Data Skipping

Default: true

Used when:

DataSkippingReaderBase is requested for the files to scan
PrepareDeltaScanBase logical optimization is executed

streaming.schemaTracking.enabled¶

spark.databricks.delta.streaming.schemaTracking.enabled

(internal) Controls whether the delta streaming source can support non-additive schema evolution for operations such as rename or drop column on column mapping enabled tables

Default: true

Used when:

DeltaErrorsBase is requested to blockStreamingReadsWithIncompatibleColumnMappingSchemaChanges
DeltaDataSource is requested for a DeltaSourceMetadataTrackingLog

streaming.unsafeReadOnIncompatibleColumnMappingSchemaChanges.enabled¶

spark.databricks.delta.streaming.unsafeReadOnIncompatibleColumnMappingSchemaChanges.enabled

(internal) Controls (possibly unsafe) streaming read on delta tables with column mapping schema operations (e.g. rename or drop column) that could lead to data loss and schema confusion.

Default: false

Used when:

DeltaSourceBase is requested for the allowUnsafeStreamingReadOnColumnMappingSchemaChanges

timeTravel.resolveOnIdentifier.enabled¶

spark.databricks.delta.timeTravel.resolveOnIdentifier.enabled

(internal) Enables time travel patterns (as @v123 and @yyyyMMddHHmmssSSS) in the path identifiers of delta tables

Default: true

update.deletionVectors.persistent¶

spark.databricks.delta.update.deletionVectors.persistent

(internal) Enables persistent Deletion Vectors in UPDATE command

Default: true

Used when:

UpdateCommand is executed (and shouldWritePersistentDeletionVectors)

vacuum.parallelDelete.enabled¶

spark.databricks.delta.vacuum.parallelDelete.enabled enables parallelizing the deletion of files during vacuum command.

Default: false

Enabling may result hitting rate limits on some storage backends. When enabled, parallelization is controlled by the default number of shuffle partitions.

write.txnAppId¶

spark.databricks.delta.write.txnAppId

(internal) The user-defined application ID a write will be committed with. If specified, spark.databricks.delta.write.txnVersion has to be set, too.

Default: (undefined)

Used when:

DeltaCommand is requested for the txnVersion and txnAppId

write.txnVersion¶

spark.databricks.delta.write.txnVersion

(internal) The user-defined transaction version a write will be committed with. If specified, spark.databricks.delta.write.txnAppId has to be set, too. To ensure idempotency, txnVersions across different writes need to be monotonically increasing.

Default: (undefined)

Used when:

DeltaCommand is requested to create a SetTransaction action, for the txnVersion and txnAppId and hasBeenExecuted

write.txnVersion.autoReset.enabled¶

spark.databricks.delta.write.txnVersion.autoReset.enabled

(internal) When enabled, automatically resets spark.databricks.delta.write.txnVersion after every write

If the txnAppId and txnVersion both come from the session config (based on write.txnAppId and write.txnVersion, respectively), spark.databricks.delta.write.txnVersion is reset (unset), after skipping the current transaction, as a safety measure to prevent data loss if the user forgets to manually reset txnVersion

Default: false

Used when:

DeltaCommand is requested to create a SetTransaction action and hasBeenExecuted

spark.delta.logStore.class¶

The fully-qualified class name of a LogStore

Default: HDFSLogStore

Used when:

LogStoreProvider is requested for a LogStore