DeltaSink¶
DeltaSink is the Sink (Spark Structured Streaming) of the delta data source for streaming queries.
Creating Instance¶
DeltaSink takes the following to be created:
-  
SQLContext - Hadoop Path of the delta table (to write data to as configured by the path option)
 - Partition columns
 -  
OutputMode(Spark Structured Streaming) - DeltaOptions
 
DeltaSink is created when:
DeltaDataSourceis requested for a streaming sink
DeltaLog¶
deltaLog: DeltaLog
deltaLog is a DeltaLog that is created for the delta table when DeltaSink is created.
deltaLog is used when:
- DeltaSink is requested to add a streaming micro-batch
 
Adding Streaming Micro-Batch¶
addBatch(
  batchId: Long,
  data: DataFrame): Unit
addBatch is part of the Sink (Spark Structured Streaming) abstraction.
addBatch requests the DeltaLog to start a new transaction.
addBatch registers the following performance metrics.
| Name | web UI | 
|---|---|
numAddedFiles |  number of files added. | 
numRemovedFiles |  number of files removed. | 
addBatch makes sure that sql.streaming.queryId local property is defined (attached to the query's current thread).
If the batch reads the same delta table as this sink is going to write to, addBatch requests the OptimisticTransaction to readWholeTable.
addBatch updates the metadata.
addBatch determines the deleted files based on the OutputMode. For Complete output mode, addBatch...FIXME
addBatch requests the OptimisticTransaction to write data out.
addBatch updates the numRemovedFiles and numAddedFiles performance metrics, and requests the OptimisticTransaction to register the SQLMetrics.
In the end, addBatch requests the OptimisticTransaction to commit (with a new SetTransaction, AddFiles and RemoveFiles, and StreamingUpdate operation).
Text Representation¶
toString(): String
DeltaSink uses the following text representation (with the path):
DeltaSink[path]
ImplicitMetadataOperation¶
DeltaSink is an ImplicitMetadataOperation.
canMergeSchema¶
canMergeSchema: Boolean
canMergeSchema is part of the ImplicitMetadataOperation abstraction.
canMergeSchema is the value of canMergeSchema option (in the DeltaOptions).
canOverwriteSchema¶
canOverwriteSchema: Boolean
canOverwriteSchema is part of the ImplicitMetadataOperation abstraction.
canOverwriteSchema is true when all the following hold:
- OutputMode is 
OutputMode.Complete - canOverwriteSchema is enabled (
true) (in the DeltaOptions)