DeltaOptions¶
DeltaOptions
is an extension of DeltaWriteOptions and DeltaReadOptions for all supported options of the DeltaDataSource.
DeltaOptions
is used to create WriteIntoDelta command, DeltaSink, and DeltaSource.
DeltaOptions
can be verified.
Options¶
checkpointLocation¶
dataChange¶
excludeRegex¶
ignoreChanges¶
ignoreDeletes¶
ignoreFileDeletion¶
maxBytesPerTrigger¶
maxFilesPerTrigger¶
Maximum number of files (AddFiles) that DeltaSource is supposed to scan (read) in a streaming micro-batch (trigger)
Default: 1000
Must be at least 1
mergeSchema¶
Enables schema migration (and allows automatic schema merging during a write operation for WriteIntoDelta and DeltaSink)
Equivalent SQL Session configuration: spark.databricks.delta.schema.autoMerge.enabled
optimizeWrite¶
overwriteSchema¶
path¶
(required) Directory on a Hadoop DFS-compliant file system with an optional time travel identifier
Default: (undefined)
Note
Can also be specified using load
method of DataFrameReader
and DataStreamReader
.
queryName¶
replaceWhere¶
timestampAsOf¶
Timestamp of the version of a Delta table for Time Travel
Mutually exclusive with versionAsOf option and the time travel identifier of the path option.
userMetadata¶
Defines a user-defined commit metadata
Take precedence over spark.databricks.delta.commitInfo.userMetadata
Available by inspecting CommitInfos using DESCRIBE HISTORY or DeltaTable.history.
Demo
Learn more in Demo: User Metadata for Labelling Commits.
versionAsOf¶
Version of a Delta table for Time Travel
Mutually exclusive with timestampAsOf option and the time travel identifier of the path option.
Used when:
DeltaDataSource
is requested for a relation
Creating Instance¶
DeltaOptions
takes the following to be created:
- Case-Insensitive Options
-
SQLConf
(Spark SQL)
When created, DeltaOptions
verifies the input options.
DeltaOptions
is created when:
DeltaLog
is requested for a relation (for DeltaDataSource as a CreatableRelationProvider and a RelationProvider)DeltaDataSource
is requested for a streaming source (to create a DeltaSource for Structured Streaming), a streaming sink (to create a DeltaSink for Structured Streaming), and for an insertable HadoopFsRelationWriteIntoDeltaBuilder
is requested to buildForV1WriteCreateDeltaTableCommand
is requested to run
How to Define Options¶
The options can be defined using option
method of the following:
DataFrameReader
andDataFrameWriter
for batch queries (Spark SQL)DataStreamReader
andDataStreamWriter
for streaming queries (Spark Structured Streaming)
Verifying Options¶
verifyOptions(
options: CaseInsensitiveMap[String]): Unit
verifyOptions
finds invalid options among the input options
.
Note
In the open-source version verifyOptions
does nothing. The underlying objects (recordDeltaEvent
and the others) are no-ops.
verifyOptions
is used when:
DeltaOptions
is createdDeltaDataSource
is requested for a relation (for loading data in batch queries)