Skip to content

DeltaSink

DeltaSink is the Sink (Spark Structured Streaming) of the delta data source for streaming queries.

Creating Instance

DeltaSink takes the following to be created:

DeltaSink is created when:

DeltaLog

deltaLog: DeltaLog

deltaLog is a DeltaLog that is created for the delta table when DeltaSink is created.

deltaLog is used when:

Adding Streaming Micro-Batch

addBatch(
  batchId: Long,
  data: DataFrame): Unit

addBatch requests the DeltaLog to start a new transaction.

addBatch registers the following performance metrics.

Name web UI
numAddedFiles number of files added.
numRemovedFiles number of files removed.

addBatch makes sure that sql.streaming.queryId local property is defined (attached to the query's current thread).

If the batch reads the same delta table as this sink is going to write to, addBatch requests the OptimisticTransaction to readWholeTable.

addBatch updates the metadata.

addBatch determines the deleted files based on the OutputMode. For Complete output mode, addBatch...FIXME

addBatch requests the OptimisticTransaction to write data out.

addBatch updates the numRemovedFiles and numAddedFiles performance metrics, and requests the OptimisticTransaction to register the SQLMetrics.

In the end, addBatch requests the OptimisticTransaction to commit (with a new SetTransaction, AddFiles and RemoveFiles, and StreamingUpdate operation).


addBatch is part of the Sink (Spark Structured Streaming) abstraction.

Text Representation

toString(): String

DeltaSink uses the following text representation (with the path):

DeltaSink[path]

ImplicitMetadataOperation

DeltaSink is an ImplicitMetadataOperation.

Back to top