Skip to content


DeltaSink is the Sink (Spark Structured Streaming) of the delta data source for streaming queries.

Creating Instance

DeltaSink takes the following to be created:

DeltaSink is created when:


deltaLog: DeltaLog

deltaLog is a DeltaLog that is created for the delta table when DeltaSink is created.

deltaLog is used when:

Adding Streaming Micro-Batch

  batchId: Long,
  data: DataFrame): Unit

addBatch is part of the Sink (Spark Structured Streaming) abstraction.

addBatch requests the DeltaLog to start a new transaction.

addBatch registers the following performance metrics.

Name web UI
numAddedFiles number of files added.
numRemovedFiles number of files removed.

addBatch makes sure that sql.streaming.queryId local property is defined (attached to the query's current thread).

If the batch reads the same delta table as this sink is going to write to, addBatch requests the OptimisticTransaction to readWholeTable.

addBatch updates the metadata.

addBatch determines the deleted files based on the OutputMode. For Complete output mode, addBatch...FIXME

addBatch requests the OptimisticTransaction to write data out.

addBatch updates the numRemovedFiles and numAddedFiles performance metrics, and requests the OptimisticTransaction to register the SQLMetrics.

In the end, addBatch requests the OptimisticTransaction to commit (with a new SetTransaction, AddFiles and RemoveFiles, and StreamingUpdate operation).

Text Representation

toString(): String

DeltaSink uses the following text representation (with the path):



DeltaSink is an ImplicitMetadataOperation.


canMergeSchema: Boolean

canMergeSchema is part of the ImplicitMetadataOperation abstraction.

canMergeSchema is the value of canMergeSchema option (in the DeltaOptions).


canOverwriteSchema: Boolean

canOverwriteSchema is part of the ImplicitMetadataOperation abstraction.

canOverwriteSchema is true when all the following hold:

  1. OutputMode is OutputMode.Complete
  2. canOverwriteSchema is enabled (true) (in the DeltaOptions)