Skip to content

DeltaConfigs (DeltaConfigsBase)

DeltaConfigs holds the table properties that can be set on a delta table.

Accessing DeltaConfigs

import org.apache.spark.sql.delta.OptimisticTransaction
val txn: OptimisticTransaction = ???
import org.apache.spark.sql.delta.actions.Metadata
val metadata: Metadata = txn.metadata
import org.apache.spark.sql.delta.DeltaConfigs
DeltaConfigs.CHANGE_DATA_FEED.fromMetaData(metadata)

Table Properties

All table properties start with delta. prefix.

appendOnly

Whether a delta table is append-only (true) or not (false). When enabled, a table allows appends only and no updates or deletes.

Default: false

Used when:

autoOptimize

Whether this delta table will automagically optimize the layout of files during writes.

Default: false

checkpointInterval

How often to checkpoint the state of a delta table (at the end of transaction commit)

Default: 10

checkpointRetentionDuration

How long to keep checkpoint files around before deleting them

Default: interval 2 days

The most recent checkpoint is never deleted. It is acceptable to keep checkpoint files beyond this duration until the next calendar day.

checkpoint.writeStatsAsJson

Controls whether to write file statistics in the checkpoint in JSON format as the stats column.

Default: true

checkpoint.writeStatsAsStruct

Controls whether to write file statistics in the checkpoint in the struct format in the stats_parsed column and partition values as a struct as partitionValues_parsed

Default: undefined (Option[Boolean])

columnMapping.maxColumnId

Maximum columnId used in the schema so far for column mapping

Cannot be set

Default: 0

columnMapping.mode

DeltaColumnMappingMode to read and write parquet data files

Name Description
none (default) A display name is the only valid identifier of a column
id A column ID is the identifier of a column. This mode is used for tables converted from Iceberg and parquet files in this mode will also have corresponding field Ids for each column in their file schema.
name The physical column name is the identifier of a column. Stored as part of StructField metadata in the schema. Used for reading statistics and partition values in the DeltaLog.

Used when:

compatibility.symlinkFormatManifest.enabled

Whether to register the GenerateSymlinkManifest post-commit hook while committing a transaction or not

Default: false

dataSkippingNumIndexedCols

The number of columns to collect stats on for data skipping. -1 means collecting stats for all columns.

Default: 32

Must be larger than or equal to -1.

Used when:

deletedFileRetentionDuration

How long to keep logically deleted data files around before deleting them physically (to prevent failures in stale readers after compactions or partition overwrites)

Default: interval 1 week

enableChangeDataFeed

Enables Change Data Feed

Default: false

Legacy configuration: enableChangeDataCapture

Used when:

enableExpiredLogCleanup

Whether to clean up expired log files and checkpoints

Default: true

enableFullRetentionRollback

Controls whether or not a delta table can be rolled back to any point within logRetentionDuration. When disabled, the table can be rolled back checkpointRetentionDuration only.

Default: true

logRetentionDuration

How long to keep obsolete logs around before deleting them. Delta can keep logs beyond the duration until the next calendar day to avoid constantly creating checkpoints.

Default: interval 30 days (CalendarInterval)

Examples: 2 weeks, 365 days (months and years are not accepted)

Used when:

minReaderVersion

The protocol reader version

Default: 1

This property is not stored as a table property in the Metadata action. It is stored as its own action. Having it modelled as a table property makes it easier to upgrade, and view the version.

minWriterVersion

The protocol reader version

Default: 3

This property is not stored as a table property in the Metadata action. It is stored as its own action. Having it modelled as a table property makes it easier to upgrade, and view the version.

randomizeFilePrefixes

Whether to use a random prefix in a file path instead of partition information (may be required for very high volume S3 calls to better be partitioned across S3 servers)

Default: false

randomPrefixLength

The length of the random prefix in a file path for randomizeFilePrefixes

Default: 2

sampleRetentionDuration

How long to keep delta sample files around before deleting them

Default: interval 7 days

Building Configuration

buildConfig[T](
  key: String,
  defaultValue: String,
  fromString: String => T,
  validationFunction: T => Boolean,
  helpMessage: String,
  minimumProtocolVersion: Option[Protocol] = None): DeltaConfig[T]

buildConfig creates a DeltaConfig for the given key (with delta prefix added) and adds it to the entries internal registry.

buildConfig is used to define all of the configuration properties in a type-safe way and (as a side effect) register them with the system-wide entries internal registry.

System-Wide Configuration Entries Registry

entries: HashMap[String, DeltaConfig[_]]

DeltaConfigs utility (a Scala object) uses entries internal registry of DeltaConfigs by their key.

New entries are added in buildConfig.

entries is used when:

mergeGlobalConfigs Utility

mergeGlobalConfigs(
  sqlConfs: SQLConf,
  tableConf: Map[String, String],
  protocol: Protocol): Map[String, String]

mergeGlobalConfigs finds all spark.databricks.delta.properties.defaults-prefixed configuration properties among the entries.

mergeGlobalConfigs is used when:

validateConfigurations Utility

validateConfigurations(
  configurations: Map[String, String]): Map[String, String]

validateConfigurations...FIXME

validateConfigurations is used when:

normalizeConfigKeys Utility

normalizeConfigKeys(
  propKeys: Seq[String]): Seq[String]

normalizeConfigKeys...FIXME

normalizeConfigKeys is used when:

spark.databricks.delta.properties.defaults Prefix

DeltaConfigs uses spark.databricks.delta.properties.defaults prefix for global configuration properties.