Options¶
checkpointLocation¶
Checkpoint directory for streaming queries (Spark Structured Streaming).
dataChange¶
Whether to write new data to the table or just rearrange data that is already part of the table. This option declares that the data being written by this job does not change any data in the table and merely rearranges existing data. This makes sure streaming queries reading from this table will not see any new changes
Used when:
DeltaWriteOptionsImpl
is requested for rearrangeOnly
Demo
Learn more in Demo: dataChange.
excludeRegex¶
scala.util.matching.Regex to filter out the paths of FileActions
Default: (undefined)
Use DeltaOptions.excludeRegex to access the value
Used when:
DeltaSourceBase
is requested for the data (for a given DeltaSourceOffset)DeltaSourceCDCSupport
is requested for the data
failOnDataLoss¶
Controls whether or not to fail loading a delta table when the earliest available version (in the _delta_log
directory) is after the version requested
Default: true
Use DeltaOptions.failOnDataLoss to access the value
ignoreChanges¶
ignoreDeletes¶
ignoreFileDeletion¶
maxBytesPerTrigger¶
maxFilesPerTrigger¶
Maximum number of files (AddFiles) that DeltaSource is supposed to scan (read) in a streaming micro-batch (trigger)
Default: 1000
Must be at least 1
maxRecordsPerFile¶
Maximum number of records per data file
Spark SQL
maxRecordsPerFile
is amongst the FileFormatWriter
(Spark SQL) options so all Delta Lake does is to let it be available (hand it over) to the underlying "writing infrastructure".
Used when:
TransactionalWrite
is requested to write data out (for write options of DelayedCommitProtocol)
mergeSchema¶
Enables schema migration (and allows automatic schema merging during a write operation for WriteIntoDelta and DeltaSink)
Equivalent SQL Session configuration: spark.databricks.delta.schema.autoMerge.enabled
optimizeWrite¶
optimizeWrite
is a writer option.
Not used
overwriteSchema¶
Enables overwriting schema or change partitioning of a delta table during an overwrite write operation
Use DeltaOptions.canOverwriteSchema to access the value
Note
The schema cannot be overwritten when using replaceWhere option.
partitionOverwriteMode¶
Mutually exclusive with replaceWhere
Used when:
- DeltaDynamicPartitionOverwriteCommand is executed (and sets
partitionOverwriteMode
toDYNAMIC
) DeltaWriteOptionsImpl
is requested to isDynamicPartitionOverwriteMode and for partitionOverwriteModeInOptionsWriteIntoDeltaBuilder
is requested to overwriteDynamicPartitions
path¶
(required) Directory on a Hadoop DFS-compliant file system with an optional time travel identifier
Default: (undefined)
Note
Can also be specified using load
method of DataFrameReader
and DataStreamReader
.
queryName¶
readChangeFeed¶
Enables Change Data Feed while reading delta tables (CDC-aware table scans)
Use DeltaOptions.readChangeFeed for the value
Note
Use the following options to fine-tune Change Data Feed-aware queries:
readChangeFeed
is used when:
CDCStatementBase
is requested togetOptions
CDCReaderImpl
is requested to isCDCReadDeltaDataSource
is requested to create a BaseRelation
replaceWhere¶
Partition predicates to overwrite only the data that matches predicates over partition columns (unless replaceWhere.dataColumns.enabled is enabled)
Available as DeltaWriteOptions.replaceWhere
Mutually exclusive with partitionOverwriteMode
Demo
Learn more in Demo: replaceWhere.
streamingSourceTrackingId¶
The directory for a schema log of DeltaSourceMetadataTrackingLog
Available as DeltaOptions.sourceTrackingId
Used when:
DeltaAnalysis
is requested to verifyDeltaSourceSchemaLocation
timestampAsOf¶
Timestamp of the version of a delta table for Time Travel
Mutually exclusive with versionAsOf option and the time travel identifier of the path option.
Used when:
DeltaDataSource
utility is used to get a DeltaTimeTravelSpec
userMetadata¶
Defines a user-defined commit metadata
Take precedence over spark.databricks.delta.commitInfo.userMetadata
Available by inspecting CommitInfos using DESCRIBE HISTORY or DeltaTable.history.
Demo
Learn more in Demo: User Metadata for Labelling Commits.
versionAsOf¶
Version of a delta table for Time Travel
Must be castable to a long
number
Mutually exclusive with timestampAsOf option and the time travel identifier of the path option.
Used when:
DeltaDataSource
utility is used to get a DeltaTimeTravelSpec