Skip to content

Options

Delta Lake comes with options to fine-tune its uses. They can be defined using option method of the following:

Accessing Options

The options are available at runtime as DeltaOptions.

import org.apache.spark.sql.delta.DeltaOptions
assert(DeltaOptions.OVERWRITE_SCHEMA_OPTION == "overwriteSchema")
val options = new DeltaOptions(Map.empty[String, String], spark.sessionState.conf)
assert(options.failOnDataLoss, "failOnDataLoss should be enabled by default")
val options = new DeltaOptions(
  Map(DeltaOptions.OVERWRITE_SCHEMA_OPTION -> true.toString),
  spark.sessionState.conf)
assert(
  options.canOverwriteSchema,
  s"${DeltaOptions.OVERWRITE_SCHEMA_OPTION} should be enabled")

checkpointLocation

Checkpoint directory for storing checkpoint data of streaming queries (Spark Structured Streaming).

dataChange

Whether to write new data to the table or just rearrange data that is already part of the table. This option declares that the data being written by this job does not change any data in the table and merely rearranges existing data. This makes sure streaming queries reading from this table will not see any new changes

Used when:

Demo

Learn more in Demo: dataChange.

excludeRegex

scala.util.matching.Regex to filter out the paths of FileActions

Default: (undefined)

Use DeltaOptions.excludeRegex to access the value

Used when:

failOnDataLoss

Controls whether or not to fail loading a delta table when the earliest available version (in the _delta_log directory) is after the version requested

Default: true

Use DeltaOptions.failOnDataLoss to access the value

ignoreChanges

ignoreDeletes

ignoreFileDeletion

maxBytesPerTrigger

maxFilesPerTrigger

Maximum number of files (AddFiles) that DeltaSource is supposed to scan (read) in a streaming micro-batch (trigger)

Default: 1000

Must be at least 1

maxRecordsPerFile

Maximum number of records per data file

Spark SQL

maxRecordsPerFile is amongst the FileFormatWriter (Spark SQL) options so all Delta Lake does is to let it be available (hand it over) to the underlying "writing infrastructure".

Used when:

mergeSchema

Enables schema migration (and allows automatic schema merging during a write operation for WriteIntoDelta and DeltaSink)

Equivalent SQL Session configuration: spark.databricks.delta.schema.autoMerge.enabled

optimizeWrite

Enables...FIXME

overwriteSchema

Enables overwriting schema or change partitioning of a delta table during an overwrite write operation

Use DeltaOptions.canOverwriteSchema to access the value

Note

The schema cannot be overwritten when using replaceWhere option.

path

(required) Directory on a Hadoop DFS-compliant file system with an optional time travel identifier

Default: (undefined)

Note

Can also be specified using load method of DataFrameReader and DataStreamReader.

queryName

readChangeFeed

Enables Change Data Feed while reading a delta table (CDC read)

Use DeltaOptions.readChangeFeed to access the value

Requires either startingVersion or startingTimestamp option

replaceWhere

Partition predicates (unless replaceWhere.dataColumns.enabled is enabled to allow for arbitrary non-partition data predicates)

Available as DeltaWriteOptions.replaceWhere

Demo

Learn more in Demo: replaceWhere.

timestampAsOf

Timestamp of the version of a delta table for Time Travel

Mutually exclusive with versionAsOf option and the time travel identifier of the path option.

Used when:

userMetadata

Defines a user-defined commit metadata

Take precedence over spark.databricks.delta.commitInfo.userMetadata

Available by inspecting CommitInfos using DESCRIBE HISTORY or DeltaTable.history.

versionAsOf

Version of a delta table for Time Travel

Must be castable to a long number

Mutually exclusive with timestampAsOf option and the time travel identifier of the path option.

Used when: