DeltaCommand — Delta Commands¶
DeltaCommand
is a marker interface for delta commands.
Implementations¶
- AlterDeltaTableCommand
- ConvertToDeltaCommandBase
- CreateDeltaTableCommand
- DeleteCommand
- DescribeDeltaDetailCommand
- DescribeDeltaHistory
- DMLWithDeletionVectorsHelper
- MergeIntoCommandBase
- OptimizeExecutor
- OptimizeTableCommandBase
- RestoreTableCommand
- ShowDeltaTableColumnsCommand
- StatisticsCollection
- UpdateCommand
- VacuumCommandImpl
- WriteIntoDelta
Converting Predicate Text to Catalyst Expression¶
parsePredicates(
spark: SparkSession,
predicate: String): Seq[Expression]
parsePredicates
converts the given predicate
text to an Expression
(Spark SQL).
parsePredicates
requests the given SparkSession
(Spark SQL) for the session ParserInterface
(Spark SQL) to parseExpression
the given predicate
text.
parsePredicates
is used when:
- OptimizeTableCommand is executed (to convert the partitionPredicate)
WriteIntoDelta
command is requested to write (to convert the replaceWhere option with predicates)
Verifying Partition Predicates¶
verifyPartitionPredicates(
spark: SparkSession,
partitionColumns: Seq[String],
predicates: Seq[Expression]): Unit
verifyPartitionPredicates
asserts that the given predicates
expressions are as follows:
- Contain no subqueries
- Reference partition columns only (
partitionColumns
)
verifyPartitionPredicates
is used when:
- OptimizeTableCommand is executed (to verify the partitionPredicate if defined)
WriteIntoDelta
command is requested to write (to verify the replaceWhere option for SaveMode.Overwrite mode)StatisticsCollection
utility is used to recompute statistics of a delta table
generateCandidateFileMap¶
generateCandidateFileMap(
basePath: Path,
candidateFiles: Seq[AddFile]): Map[String, AddFile]
generateCandidateFileMap
...FIXME
generateCandidateFileMap
is used when...FIXME
removeFilesFromPaths¶
removeFilesFromPaths(
deltaLog: DeltaLog,
nameToAddFileMap: Map[String, AddFile],
filesToRewrite: Seq[String],
operationTimestamp: Long): Seq[RemoveFile]
removeFilesFromPaths
...FIXME
removeFilesFromPaths
is used when:
- DeleteCommand and UpdateCommand commands are executed
Creating HadoopFsRelation (with TahoeBatchFileIndex)¶
buildBaseRelation(
spark: SparkSession,
txn: OptimisticTransaction,
actionType: String,
rootPath: Path,
inputLeafFiles: Seq[String],
nameToAddFileMap: Map[String, AddFile]): HadoopFsRelation
buildBaseRelation
converts the given inputLeafFiles
to AddFiles (with the given rootPath
and nameToAddFileMap
).
buildBaseRelation
creates a TahoeBatchFileIndex for the AddFile
s (with the input actionType
and rootPath
).
In the end, buildBaseRelation
creates a HadoopFsRelation
(Spark SQL) with the TahoeBatchFileIndex
(and the other properties based on the metadata of the given OptimisticTransaction).
buildBaseRelation
is used when:
- DeleteCommand and UpdateCommand commands are executed (with
delete
andupdate
action types, respectively)
getTouchedFile¶
getTouchedFile(
basePath: Path,
filePath: String,
nameToAddFileMap: Map[String, AddFile]): AddFile
getTouchedFile
...FIXME
getTouchedFile
is used when:
-
DeltaCommand
is requested to removeFilesFromPaths and create a HadoopFsRelation (for DeleteCommand and UpdateCommand commands) -
MergeIntoCommand is executed
isCatalogTable¶
isCatalogTable(
analyzer: Analyzer,
tableIdent: TableIdentifier): Boolean
isCatalogTable
...FIXME
isCatalogTable
is used when:
ConvertToDeltaCommandBase
is requested to isCatalogTable
isPathIdentifier¶
isPathIdentifier(
tableIdent: TableIdentifier): Boolean
isPathIdentifier
...FIXME
isPathIdentifier
is used when...FIXME
commitLarge¶
commitLarge(
spark: SparkSession,
txn: OptimisticTransaction,
actions: Iterator[Action],
op: DeltaOperations.Operation,
context: Map[String, String],
metrics: Map[String, String]): Long
commitLarge
...FIXME
commitLarge
is used when:
- ConvertToDeltaCommand command is executed (and requested to performConvert)
- RestoreTableCommand command is executed
updateAndCheckpoint¶
updateAndCheckpoint(
spark: SparkSession,
deltaLog: DeltaLog,
commitSize: Int,
attemptVersion: Long): Unit
updateAndCheckpoint
requests the given DeltaLog to update.
updateAndCheckpoint
prints out the following INFO message to the logs:
Committed delta #[attemptVersion] to [logPath]. Wrote [commitSize] actions.
In the end, updateAndCheckpoint
requests the given DeltaLog to checkpoint the current snapshot.
IllegalStateException¶
updateAndCheckpoint
throws an IllegalStateException
if the version after update does not match the assumed attemptVersion
:
The committed version is [attemptVersion] but the current version is [currentSnapshot].
Posting Metric Updates¶
sendDriverMetrics(
spark: SparkSession,
metrics: Map[String, SQLMetric]): Unit
Procedure
sendDriverMetrics
is a procedure (returns Unit
) so what happens inside stays inside (paraphrasing the former advertising slogan of Las Vegas, Nevada).
sendDriverMetrics
announces the updates to the given SQLMetric
s (Spark SQL).
sendDriverMetrics
is used when:
DeleteCommand
is requested to run and performDeleteMergeIntoCommand
is requested to run mergeUpdateCommand
is requested to run (and performUpdate)
Creating SetTransaction Action¶
createSetTransaction(
sparkSession: SparkSession,
deltaLog: DeltaLog,
options: Option[DeltaOptions] = None): Option[SetTransaction]
options
Argument
options
is undefined (None
) by default and is only defined when WriteIntoDelta
is requested to write data out.
createSetTransaction
creates (and returns) a new SetTransaction when the transaction version and application ID are both available (in either the given SparkSession
or DeltaOptions).
spark.databricks.delta.write.txnVersion.autoReset.enabled
createSetTransaction
does something extra with spark.databricks.delta.write.txnVersion.autoReset.enabled enabled.
createSetTransaction
is used when:
DeleteCommand
is requested to performDeleteMergeIntoCommand
is requested to commitAndRecordStats- Update command is executed
WriteIntoDelta
is requested to write data out
Logging¶
DeltaCommand
is an abstract class and logging is configured using the logger of the implementations.