Skip to content

TransactionalWrite

TransactionalWrite is an <> of <> that can <> to a <>.

== [[contract]] Contract

=== [[deltaLog]] deltaLog

[source,scala]

deltaLog: DeltaLog

DeltaLog.md[] (of a delta table) that this transaction is changing

Used when:

  • OptimisticTransactionImpl is requested to <>, <> (after <>), <>, and <> (and execute <>)

  • <> and <> are executed

  • DeltaCommand is requested to <>

  • DeltaLog is requested to <>

  • TransactionalWrite is requested to <>

=== [[metadata]] metadata

[source, scala]

metadata: Metadata

Metadata.md[] (of the <>) that this transaction is changing

=== [[protocol]] protocol

[source, scala]

protocol: Protocol

Protocol.md[] (of the <>) that this transaction is changing

Used when AlterTableSetPropertiesDeltaCommand.md[] is executed (to DeltaConfigs.md#verifyProtocolVersionRequirements[verifyProtocolVersionRequirements])

=== [[snapshot]] snapshot

[source, scala]

snapshot: Snapshot

Snapshot.md[] (of the <>) that this transaction is <>

== [[implementations]][[self]] Implementations

OptimisticTransaction.md[] is the default and only known TransactionalWrite in Delta Lake (indirectly as a OptimisticTransactionImpl.md[]).

Writing Data Out (Result Of Structured Query)

writeFiles(
  data: Dataset[_]): Seq[AddFile]
writeFiles(
  data: Dataset[_],
  writeOptions: Option[DeltaOptions]): Seq[AddFile]
writeFiles(
  data: Dataset[_],
  isOptimize: Boolean): Seq[AddFile]
writeFiles(
  data: Dataset[_],
  writeOptions: Option[DeltaOptions],
  isOptimize: Boolean): Seq[AddFile]

writeFiles creates a DeltaInvariantCheckerExec and a DelayedCommitProtocol to write out files to the data path (of the DeltaLog).

Note

writeFiles uses Spark SQL's FileFormatWriter utility to write out a result of a streaming query.

Read up on FileFormatWriter in The Internals of Spark SQL online book.

writeFiles is executed within SQLExecution.withNewExecutionId.

Note

writeFiles can be tracked using web UI or SQLAppStatusListener (using SparkListenerSQLExecutionStart and SparkListenerSQLExecutionEnd events).

In the end, writeFiles returns the addedStatuses of the DelayedCommitProtocol committer.

Internally, writeFiles turns the hasWritten flag on (true).

NOTE: After writeFiles, no metadata updates in the transaction are permitted.

writeFiles normalize the given data dataset (based on the partitionColumns of the Metadata).

writeFiles getPartitioningColumns based on the partitionSchema of the Metadata.

writeFiles creates a DelayedCommitProtocol committer for the data path of the DeltaLog.

writeFiles gets the invariants from the schema of the Metadata.

writeFiles requests a new Execution ID (that is used to track all Spark jobs of FileFormatWriter.write in Spark SQL) with a physical query plan of a new DeltaInvariantCheckerExec unary physical operator (with the executed plan of the normalized query execution as the child operator).

writeFiles is used when:

== [[getCommitter]] Creating Committer

[source, scala]

getCommitter( outputPath: Path): DelayedCommitProtocol


getCommitter creates a new <> with the delta job ID and the given outputPath (and no random prefix).

getCommitter is used when TransactionalWrite is requested to <>.

== [[makeOutputNullable]] makeOutputNullable Method

[source, scala]

makeOutputNullable( output: Seq[Attribute]): Seq[Attribute]


makeOutputNullable...FIXME

makeOutputNullable is used when...FIXME

== [[normalizeData]] normalizeData Method

[source, scala]

normalizeData( data: Dataset[_], partitionCols: Seq[String]): (QueryExecution, Seq[Attribute])


normalizeData...FIXME

normalizeData is used when...FIXME

== [[getPartitioningColumns]] getPartitioningColumns Method

[source, scala]

getPartitioningColumns( partitionSchema: StructType, output: Seq[Attribute], colsDropped: Boolean): Seq[Attribute]


getPartitioningColumns...FIXME

getPartitioningColumns is used when...FIXME

== [[hasWritten]] hasWritten Flag

[source, scala]

hasWritten: Boolean = false

TransactionalWrite uses the hasWritten internal registry to prevent OptimisticTransactionImpl from <> after <>.

hasWritten is initially turned off (false). It can be turned on (true) when TransactionalWrite is requested to <>.


Last update: 2020-09-24