Options¶
Delta Lake comes with options to fine-tune its uses. They can be defined using option
method of the following:
DataFrameReader
(Spark SQL) andDataFrameWriter
(Spark SQL) for batch queriesDataStreamReader
(Spark Structured Streaming) andDataStreamWriter
(Spark Structured Streaming) for streaming queries- SQL queries
checkpointLocation¶
Checkpoint directory for storing checkpoint data of streaming queries (Spark Structured Streaming).
dataChange¶
Whether to write new data to the table or just rearrange data that is already part of the table. This option declares that the data being written by this job does not change any data in the table and merely rearranges existing data. This makes sure streaming queries reading from this table will not see any new changes
Used when:
DeltaWriteOptionsImpl
is requested for rearrangeOnly
Demo
Learn more in Demo: dataChange.
excludeRegex¶
ignoreChanges¶
ignoreDeletes¶
ignoreFileDeletion¶
maxBytesPerTrigger¶
maxFilesPerTrigger¶
Maximum number of files (AddFiles) that DeltaSource is supposed to scan (read) in a streaming micro-batch (trigger)
Default: 1000
Must be at least 1
mergeSchema¶
Enables schema migration (and allows automatic schema merging during a write operation for WriteIntoDelta and DeltaSink)
Equivalent SQL Session configuration: spark.databricks.delta.schema.autoMerge.enabled
optimizeWrite¶
Enables...FIXME
overwriteSchema¶
path¶
(required) Directory on a Hadoop DFS-compliant file system with an optional time travel identifier
Default: (undefined)
Note
Can also be specified using load
method of DataFrameReader
and DataStreamReader
.
queryName¶
replaceWhere¶
Partition predicates (unless replaceWhere.dataColumns.enabled is enabled to allow for arbitrary non-partition data predicates)
Available as DeltaWriteOptions.replaceWhere
Demo
Learn more in Demo: replaceWhere.
timestampAsOf¶
Timestamp of the version of a Delta table for Time Travel
Mutually exclusive with versionAsOf option and the time travel identifier of the path option.
userMetadata¶
Defines a user-defined commit metadata
Take precedence over spark.databricks.delta.commitInfo.userMetadata
Available by inspecting CommitInfos using DESCRIBE HISTORY or DeltaTable.history.
Demo
Learn more in Demo: User Metadata for Labelling Commits.
versionAsOf¶
Version of a Delta table for Time Travel
Mutually exclusive with timestampAsOf option and the time travel identifier of the path option.
Used when:
DeltaDataSource
is requested for a relation