Skip to content

Options

checkpointLocation

Checkpoint directory for streaming queries (Spark Structured Streaming).

dataChange

Whether to write new data to the table or just rearrange data that is already part of the table. This option declares that the data being written by this job does not change any data in the table and merely rearranges existing data. This makes sure streaming queries reading from this table will not see any new changes

Used when:

Demo

Learn more in Demo: dataChange.

excludeRegex

scala.util.matching.Regex to filter out the paths of FileActions

Default: (undefined)

Use DeltaOptions.excludeRegex to access the value

Used when:

failOnDataLoss

Controls whether or not to fail loading a delta table when the earliest available version (in the _delta_log directory) is after the version requested

Default: true

Use DeltaOptions.failOnDataLoss to access the value

ignoreChanges

ignoreDeletes

ignoreFileDeletion

maxBytesPerTrigger

maxFilesPerTrigger

Maximum number of files (AddFiles) that DeltaSource is supposed to scan (read) in a streaming micro-batch (trigger)

Default: 1000

Must be at least 1

maxRecordsPerFile

Maximum number of records per data file

Spark SQL

maxRecordsPerFile is amongst the FileFormatWriter (Spark SQL) options so all Delta Lake does is to let it be available (hand it over) to the underlying "writing infrastructure".

Used when:

mergeSchema

Enables schema migration (and allows automatic schema merging during a write operation for WriteIntoDelta and DeltaSink)

Equivalent SQL Session configuration: spark.databricks.delta.schema.autoMerge.enabled

optimizeWrite

optimizeWrite is a writer option.

Not used

overwriteSchema

Enables overwriting schema or change partitioning of a delta table during an overwrite write operation

Use DeltaOptions.canOverwriteSchema to access the value

Note

The schema cannot be overwritten when using replaceWhere option.

partitionOverwriteMode

Mutually exclusive with replaceWhere

Used when:

path

(required) Directory on a Hadoop DFS-compliant file system with an optional time travel identifier

Default: (undefined)

Note

Can also be specified using load method of DataFrameReader and DataStreamReader.

queryName

readChangeFeed

Enables Change Data Feed while reading delta tables (CDC-aware table scans)

Use DeltaOptions.readChangeFeed for the value

Note

Use the following options to fine-tune Change Data Feed-aware queries:


readChangeFeed is used when:

replaceWhere

Partition predicates to overwrite only the data that matches predicates over partition columns (unless replaceWhere.dataColumns.enabled is enabled)

Available as DeltaWriteOptions.replaceWhere

Mutually exclusive with partitionOverwriteMode

Demo

Learn more in Demo: replaceWhere.

streamingSourceTrackingId

The directory for a schema log of DeltaSourceMetadataTrackingLog

Available as DeltaOptions.sourceTrackingId

Used when:

timestampAsOf

Timestamp of the version of a delta table for Time Travel

Mutually exclusive with versionAsOf option and the time travel identifier of the path option.

Used when:

userMetadata

Defines a user-defined commit metadata

Take precedence over spark.databricks.delta.commitInfo.userMetadata

Available by inspecting CommitInfos using DESCRIBE HISTORY or DeltaTable.history.

versionAsOf

Version of a delta table for Time Travel

Must be castable to a long number

Mutually exclusive with timestampAsOf option and the time travel identifier of the path option.

Used when: