Options¶
checkpointLocation¶
Checkpoint directory for streaming queries (Spark Structured Streaming).
dataChange¶
Whether to write new data to the table or just rearrange data that is already part of the table. This option declares that the data being written by this job does not change any data in the table and merely rearranges existing data. This makes sure streaming queries reading from this table will not see any new changes
Used when:
DeltaWriteOptionsImplis requested for rearrangeOnly
Demo
Learn more in Demo: dataChange.
excludeRegex¶
scala.util.matching.Regex to filter out the paths of FileActions
Default: (undefined)
Use DeltaOptions.excludeRegex to access the value
Used when:
DeltaSourceBaseis requested for the data (for a given DeltaSourceOffset)DeltaSourceCDCSupportis requested for the data
failOnDataLoss¶
Controls whether or not to fail loading a delta table when the earliest available version (in the _delta_log directory) is after the version requested
Default: true
Use DeltaOptions.failOnDataLoss to access the value
ignoreChanges¶
ignoreDeletes¶
ignoreFileDeletion¶
maxBytesPerTrigger¶
maxFilesPerTrigger¶
Maximum number of files (AddFiles) that DeltaSource is supposed to scan (read) in a streaming micro-batch (trigger)
Default: 1000
Must be at least 1
maxRecordsPerFile¶
Maximum number of records per data file
Spark SQL
maxRecordsPerFile is amongst the FileFormatWriter (Spark SQL) options so all Delta Lake does is to let it be available (hand it over) to the underlying "writing infrastructure".
Used when:
TransactionalWriteis requested to write data out (for write options of DelayedCommitProtocol)
mergeSchema¶
Enables schema migration (and allows automatic schema merging during a write operation for WriteIntoDelta and DeltaSink)
Equivalent SQL Session configuration: spark.databricks.delta.schema.autoMerge.enabled
optimizeWrite¶
optimizeWrite is a writer option.
Not used
overwriteSchema¶
Enables overwriting schema or change partitioning of a delta table during an overwrite write operation
Use DeltaOptions.canOverwriteSchema to access the value
Note
The schema cannot be overwritten when using replaceWhere option.
partitionOverwriteMode¶
Mutually exclusive with replaceWhere
Used when:
- DeltaDynamicPartitionOverwriteCommand is executed (and sets
partitionOverwriteModetoDYNAMIC) DeltaWriteOptionsImplis requested to isDynamicPartitionOverwriteMode and for partitionOverwriteModeInOptionsWriteIntoDeltaBuilderis requested to overwriteDynamicPartitions
path¶
(required) Directory on a Hadoop DFS-compliant file system with an optional time travel identifier
Default: (undefined)
Note
Can also be specified using load method of DataFrameReader and DataStreamReader.
queryName¶
readChangeFeed¶
Enables Change Data Feed while reading delta tables (CDC-aware table scans)
Use DeltaOptions.readChangeFeed for the value
Note
Use the following options to fine-tune Change Data Feed-aware queries:
readChangeFeed is used when:
CDCStatementBaseis requested togetOptionsCDCReaderImplis requested to isCDCReadDeltaDataSourceis requested to create a BaseRelation
replaceWhere¶
Partition predicates to overwrite only the data that matches predicates over partition columns (unless replaceWhere.dataColumns.enabled is enabled)
Available as DeltaWriteOptions.replaceWhere
Mutually exclusive with partitionOverwriteMode
Demo
Learn more in Demo: replaceWhere.
streamingSourceTrackingId¶
The directory for a schema log of DeltaSourceMetadataTrackingLog
Available as DeltaOptions.sourceTrackingId
Used when:
DeltaAnalysisis requested to verifyDeltaSourceSchemaLocation
timestampAsOf¶
Timestamp of the version of a delta table for Time Travel
Mutually exclusive with versionAsOf option and the time travel identifier of the path option.
Used when:
DeltaDataSourceutility is used to get a DeltaTimeTravelSpec
userMetadata¶
Defines a user-defined commit metadata
Take precedence over spark.databricks.delta.commitInfo.userMetadata
Available by inspecting CommitInfos using DESCRIBE HISTORY or DeltaTable.history.
Demo
Learn more in Demo: User Metadata for Labelling Commits.
versionAsOf¶
Version of a delta table for Time Travel
Must be castable to a long number
Mutually exclusive with timestampAsOf option and the time travel identifier of the path option.
Used when:
DeltaDataSourceutility is used to get a DeltaTimeTravelSpec