Skip to content

DeltaSQLConf — Configuration Properties

DeltaSQLConf contains configuration properties to configure behaviour of Delta Lake.

alterLocation.bypassSchemaCheck enables Alter Table Set Location on Delta to go through even if the Delta table in the new location has a different schema from the original Delta table

Default: false

checkLatestSchemaOnRead (internal) enables a check that ensures that users won't read corrupt data if the source schema changes in an incompatible way.

Default: true

In Delta, we always try to give users the latest version of their data without having to call REFRESH TABLE or redefine their DataFrames when used in the context of streaming. There is a possibility that the schema of the latest version of the table may be incompatible with the schema at the time of DataFrame creation.

checkpoint.partSize (internal) is the limit at which we will start parallelizing the checkpoint. We will attempt to write maximum of this many actions per checkpoint.

Default: 5000000

commitInfo.enabled controls whether to log commit information into a Delta log.

Default: true

commitInfo.userMetadata is an arbitrary user-defined metadata to include in CommitInfo (requires

Default: (empty)

commitValidation.enabled (internal) controls whether to perform validation checks before commit or not

Default: true

convert.metadataCheck.enabled enables validation during convert to delta, if there is a difference between the catalog table's properties and the Delta table's configuration, we should error.

If disabled, merge the two configurations with the same semantics as update and merge

Default: true

dummyFileManager.numOfFiles (internal) controls how many dummy files to write in DummyFileManager

Default: 3

dummyFileManager.prefix (internal) is the file prefix to use in DummyFileManager

Default: .s3-optimization-

history.maxKeysPerList (internal) controls how many commits to list when performing a parallel search.

The default is the maximum keys returned by S3 per list call. Azure can return 5000, therefore we choose 1000.

Default: 1000

history.metricsEnabled enables metrics reporting in DESCRIBE HISTORY (CommitInfo will record the operation metrics when a OptimisticTransactionImpl is committed and the configuration property is enabled).

Requires configuration property to be enabled

Default: true

Used when:

Github Commit

The feature was added as part of [SC-24567][DELTA] Add additional metrics to Describe Delta History commit.

import.batchSize.schemaInference (internal) is the number of files per batch for schema inference during import.

Default: 1000000

import.batchSize.statsCollection (internal) is the number of files per batch for stats collection during import.

Default: 50000

maxSnapshotLineageLength (internal) is the maximum lineage length of a Snapshot before Delta forces to build a Snapshot from scratch

Default: 50

merge.maxInsertCount (internal) is the maximum row count of inserts in each MERGE execution

Default: 10000L

merge.optimizeInsertOnlyMerge.enabled (internal) controls merge without any matched clause (i.e., insert-only merge) will be optimized by avoiding rewriting old files and just inserting new files

Default: true

merge.optimizeMatchedOnlyMerge.enabled (internal) controls merge without 'when not matched' clause will be optimized to use a right outer join instead of a full outer join

Default: true

merge.repartitionBeforeWrite.enabled (internal) controls merge will repartition the output by the table's partition columns before writing the files

Default: false

partitionColumnValidity.enabled (internal) enables validation of the partition column names (just like the data columns)

Default: true

properties.defaults.minReaderVersion is the default reader protocol version to create new tables with, unless a feature that requires a higher version for correctness is enabled.

Default: 1

Available values: 1

Used when:

properties.defaults.minWriterVersion is the default writer protocol version to create new tables with, unless a feature that requires a higher version for correctness is enabled.

Default: 2

Available values: 1, 2, 3

Used when:

retentionDurationCheck.enabled adds a check preventing users from running vacuum with a very short retention period, which may end up corrupting a Delta log.

Default: true

sampling.enabled (internal) enables sample-based estimation

Default: false

schema.autoMerge.enabled enables schema merging on appends and overwrites.

Default: false

Equivalent DataFrame option: mergeSchema

snapshotIsolation.enabled (internal) controls whether queries on Delta tables are guaranteed to have snapshot isolation

Default: true

snapshotPartitions (internal) is the number of partitions to use for state reconstruction (when building a snapshot of a Delta table).

Default: 50

stalenessLimit (in millis) allows you to query the last loaded state of the Delta table without blocking on a table update. You can use this configuration to reduce the latency on queries when up-to-date results are not a requirement. Table updates will be scheduled on a separate scheduler pool in a FIFO queue, and will share cluster resources fairly with your query. If a table hasn't updated past this time limit, we will block on a synchronous state update before running the query.

Default: 0 (no tables can be stale)

state.corruptionIsFatal (internal) throws a fatal error when the recreated Delta State doesn't match committed checksum file

Default: true

stateReconstructionValidation.enabled (internal) controls whether to perform validation checks on the reconstructed state

Default: true

stats.collect (internal) enables statistics to be collected while writing files into a Delta table

Default: true


The property seems unused.

stats.limitPushdown.enabled (internal) enables using the limit clause and file statistics to prune files before they are collected to the driver

Default: true

stats.localCache.maxNumFiles (internal) is the maximum number of files for a table to be considered a delta small table. Some metadata operations (such as using data skipping) are optimized for small tables using driver local caching and local execution.

Default: 2000

stats.skipping (internal) enables statistics used for skipping

Default: true

timeTravel.resolveOnIdentifier.enabled (internal) controls whether to resolve patterns as @v123 and @yyyyMMddHHmmssSSS in path identifiers as time travel nodes.

Default: true

vacuum.parallelDelete.enabled enables parallelizing the deletion of files during vacuum command.

Default: false

Enabling may result hitting rate limits on some storage backends. When enabled, parallelization is controlled by the default number of shuffle partitions.

Last update: 2021-07-10
Back to top