Skip to content

DeltaSink

= DeltaSink

DeltaSink is the sink of <> for streaming queries in Spark Structured Streaming.

TIP: Read up on https://jaceklaskowski.gitbooks.io/spark-structured-streaming/spark-sql-streaming-Sink.html[Streaming Sink] in https://bit.ly/spark-structured-streaming[The Internals of Spark Structured Streaming] online book.

DeltaSink is <> exclusively when DeltaDataSource is requested for a <> (Structured Streaming).

[[toString]] DeltaSink uses the following text representation (with the <>):

DeltaSink[path]

[[ImplicitMetadataOperation]] DeltaSink is an <> of a <>.

== [[creating-instance]] Creating Instance

DeltaSink takes the following to be created:

  • [[sqlContext]] SQLContext
  • [[path]] Hadoop Path of the delta table (to <> as configured by the <> option)
  • [[partitionColumns]] Names of the partition columns (Seq[String])
  • [[outputMode]] OutputMode
  • [[options]] <>

== [[deltaLog]] deltaLog Internal Property

[source, scala]

deltaLog: DeltaLog

deltaLog is a <> that is <> for the <> when DeltaSink is created (when DeltaDataSource is requested for a <>).

deltaLog is used exclusively when DeltaSink is requested to <>.

== [[addBatch]] Adding Streaming Micro-Batch

[source, scala]

addBatch( batchId: Long, data: DataFrame): Unit


NOTE: addBatch is part of the Sink contract (in Spark Structured Streaming) to add a batch of data to the sink.

addBatch requests the <> to <>.

addBatch...FIXME

In the end, addBatch requests the OptimisticTransaction to <>.


Last update: 2020-10-05