TransactionalWrite¶
TransactionalWrite is an <
== [[contract]] Contract
=== [[deltaLog]] deltaLog
[source,scala]¶
deltaLog: DeltaLog¶
DeltaLog.md[] (of a delta table) that this transaction is changing
Used when:
-
OptimisticTransactionImpl
is requested to <>, < > (after < >), < >, and < > (and execute < >) -
<
> and < > are executed -
DeltaCommand
is requested to <> -
DeltaLog
is requested to <> -
TransactionalWrite is requested to <
>
=== [[metadata]] metadata
[source, scala]¶
metadata: Metadata¶
Metadata.md[] (of the <
=== [[protocol]] protocol
[source, scala]¶
protocol: Protocol¶
Protocol.md[] (of the <
Used when AlterTableSetPropertiesDeltaCommand.md[] is executed (to DeltaConfigs.md#verifyProtocolVersionRequirements[verifyProtocolVersionRequirements])
=== [[snapshot]] snapshot
[source, scala]¶
snapshot: Snapshot¶
Snapshot.md[] (of the <
== [[implementations]][[self]] Implementations
OptimisticTransaction.md[] is the default and only known TransactionalWrite in Delta Lake (indirectly as a OptimisticTransactionImpl.md[]).
Writing Data Out (Result Of Structured Query)¶
writeFiles(
data: Dataset[_]): Seq[AddFile]
writeFiles(
data: Dataset[_],
writeOptions: Option[DeltaOptions]): Seq[AddFile]
writeFiles(
data: Dataset[_],
isOptimize: Boolean): Seq[AddFile]
writeFiles(
data: Dataset[_],
writeOptions: Option[DeltaOptions],
isOptimize: Boolean): Seq[AddFile]
writeFiles
creates a DeltaInvariantCheckerExec and a DelayedCommitProtocol to write out files to the data path (of the DeltaLog).
Note
writeFiles
uses Spark SQL's FileFormatWriter
utility to write out a result of a streaming query.
Read up on FileFormatWriter in The Internals of Spark SQL online book.
writeFiles
is executed within SQLExecution.withNewExecutionId
.
Note
writeFiles
can be tracked using web UI or SQLAppStatusListener
(using SparkListenerSQLExecutionStart
and SparkListenerSQLExecutionEnd
events).
In the end, writeFiles
returns the addedStatuses of the DelayedCommitProtocol
committer.
Internally, writeFiles
turns the hasWritten flag on (true
).
NOTE: After writeFiles
, no metadata updates in the transaction are permitted.
writeFiles
normalize the given data
dataset (based on the partitionColumns of the Metadata).
writeFiles
getPartitioningColumns based on the partitionSchema of the Metadata.
writeFiles creates a DelayedCommitProtocol committer for the data path of the DeltaLog.
writeFiles
gets the invariants from the schema of the Metadata.
writeFiles
requests a new Execution ID (that is used to track all Spark jobs of FileFormatWriter.write
in Spark SQL) with a physical query plan of a new DeltaInvariantCheckerExec unary physical operator (with the executed plan of the normalized query execution as the child operator).
writeFiles
is used when:
- DeleteCommand, MergeIntoCommand, UpdateCommand, and WriteIntoDelta commands are executed
DeltaSink
is requested to add a streaming micro-batch
== [[getCommitter]] Creating Committer
[source, scala]¶
getCommitter( outputPath: Path): DelayedCommitProtocol
getCommitter creates a new <outputPath
(and no random prefix).
getCommitter is used when TransactionalWrite is requested to <
== [[makeOutputNullable]] makeOutputNullable Method
[source, scala]¶
makeOutputNullable( output: Seq[Attribute]): Seq[Attribute]
makeOutputNullable...FIXME
makeOutputNullable is used when...FIXME
== [[normalizeData]] normalizeData Method
[source, scala]¶
normalizeData( data: Dataset[_], partitionCols: Seq[String]): (QueryExecution, Seq[Attribute])
normalizeData...FIXME
normalizeData is used when...FIXME
== [[getPartitioningColumns]] getPartitioningColumns Method
[source, scala]¶
getPartitioningColumns( partitionSchema: StructType, output: Seq[Attribute], colsDropped: Boolean): Seq[Attribute]
getPartitioningColumns...FIXME
getPartitioningColumns is used when...FIXME
== [[hasWritten]] hasWritten Flag
[source, scala]¶
hasWritten: Boolean = false¶
TransactionalWrite uses the hasWritten internal registry to prevent OptimisticTransactionImpl
from <
hasWritten is initially turned off (false
). It can be turned on (true
) when TransactionalWrite is requested to <