DeltaCommand — Delta Commands¶
DeltaCommand is a marker interface for delta commands.
Implementations¶
- AlterDeltaTableCommand
- ConvertToDeltaCommandBase
- CreateDeltaTableCommand
- DeleteCommand
- DescribeDeltaDetailCommand
- DescribeDeltaHistory
- DMLWithDeletionVectorsHelper
- MergeIntoCommandBase
- OptimizeExecutor
- OptimizeTableCommandBase
- RestoreTableCommand
- ShowDeltaTableColumnsCommand
- StatisticsCollection
- UpdateCommand
- VacuumCommandImpl
- WriteIntoDelta
Converting Predicate Text to Catalyst Expression¶
parsePredicates(
spark: SparkSession,
predicate: String): Seq[Expression]
parsePredicates converts the given predicate text to an Expression (Spark SQL).
parsePredicates requests the given SparkSession (Spark SQL) for the session ParserInterface (Spark SQL) to parseExpression the given predicate text.
parsePredicates is used when:
- OptimizeTableCommand is executed (to convert the partitionPredicate)
WriteIntoDeltacommand is requested to write (to convert the replaceWhere option with predicates)
Verifying Partition Predicates¶
verifyPartitionPredicates(
spark: SparkSession,
partitionColumns: Seq[String],
predicates: Seq[Expression]): Unit
verifyPartitionPredicates asserts that the given predicates expressions are as follows:
- Contain no subqueries
- Reference partition columns only (
partitionColumns)
verifyPartitionPredicates is used when:
- OptimizeTableCommand is executed (to verify the partitionPredicate if defined)
WriteIntoDeltacommand is requested to write (to verify the replaceWhere option for SaveMode.Overwrite mode)StatisticsCollectionutility is used to recompute statistics of a delta table
generateCandidateFileMap¶
generateCandidateFileMap(
basePath: Path,
candidateFiles: Seq[AddFile]): Map[String, AddFile]
generateCandidateFileMap...FIXME
generateCandidateFileMap is used when...FIXME
removeFilesFromPaths¶
removeFilesFromPaths(
deltaLog: DeltaLog,
nameToAddFileMap: Map[String, AddFile],
filesToRewrite: Seq[String],
operationTimestamp: Long): Seq[RemoveFile]
removeFilesFromPaths...FIXME
removeFilesFromPaths is used when:
- DeleteCommand and UpdateCommand commands are executed
Creating HadoopFsRelation (with TahoeBatchFileIndex)¶
buildBaseRelation(
spark: SparkSession,
txn: OptimisticTransaction,
actionType: String,
rootPath: Path,
inputLeafFiles: Seq[String],
nameToAddFileMap: Map[String, AddFile]): HadoopFsRelation
buildBaseRelation converts the given inputLeafFiles to AddFiles (with the given rootPath and nameToAddFileMap).
buildBaseRelation creates a TahoeBatchFileIndex for the AddFiles (with the input actionType and rootPath).
In the end, buildBaseRelation creates a HadoopFsRelation (Spark SQL) with the TahoeBatchFileIndex (and the other properties based on the metadata of the given OptimisticTransaction).
buildBaseRelation is used when:
- DeleteCommand and UpdateCommand commands are executed (with
deleteandupdateaction types, respectively)
getTouchedFile¶
getTouchedFile(
basePath: Path,
filePath: String,
nameToAddFileMap: Map[String, AddFile]): AddFile
getTouchedFile...FIXME
getTouchedFile is used when:
-
DeltaCommandis requested to removeFilesFromPaths and create a HadoopFsRelation (for DeleteCommand and UpdateCommand commands) -
MergeIntoCommand is executed
isCatalogTable¶
isCatalogTable(
analyzer: Analyzer,
tableIdent: TableIdentifier): Boolean
isCatalogTable...FIXME
isCatalogTable is used when:
ConvertToDeltaCommandBaseis requested to isCatalogTable
isPathIdentifier¶
isPathIdentifier(
tableIdent: TableIdentifier): Boolean
isPathIdentifier...FIXME
isPathIdentifier is used when...FIXME
commitLarge¶
commitLarge(
spark: SparkSession,
txn: OptimisticTransaction,
actions: Iterator[Action],
op: DeltaOperations.Operation,
context: Map[String, String],
metrics: Map[String, String]): Long
commitLarge...FIXME
commitLarge is used when:
- ConvertToDeltaCommand command is executed (and requested to performConvert)
- RestoreTableCommand command is executed
updateAndCheckpoint¶
updateAndCheckpoint(
spark: SparkSession,
deltaLog: DeltaLog,
commitSize: Int,
attemptVersion: Long): Unit
updateAndCheckpoint requests the given DeltaLog to update.
updateAndCheckpoint prints out the following INFO message to the logs:
Committed delta #[attemptVersion] to [logPath]. Wrote [commitSize] actions.
In the end, updateAndCheckpoint requests the given DeltaLog to checkpoint the current snapshot.
IllegalStateException¶
updateAndCheckpoint throws an IllegalStateException if the version after update does not match the assumed attemptVersion:
The committed version is [attemptVersion] but the current version is [currentSnapshot].
Posting Metric Updates¶
sendDriverMetrics(
spark: SparkSession,
metrics: Map[String, SQLMetric]): Unit
Procedure
sendDriverMetrics is a procedure (returns Unit) so what happens inside stays inside (paraphrasing the former advertising slogan of Las Vegas, Nevada).
sendDriverMetrics announces the updates to the given SQLMetrics (Spark SQL).
sendDriverMetrics is used when:
DeleteCommandis requested to run and performDeleteMergeIntoCommandis requested to run mergeUpdateCommandis requested to run (and performUpdate)
Creating SetTransaction Action¶
createSetTransaction(
sparkSession: SparkSession,
deltaLog: DeltaLog,
options: Option[DeltaOptions] = None): Option[SetTransaction]
options Argument
options is undefined (None) by default and is only defined when WriteIntoDelta is requested to write data out.
createSetTransaction creates (and returns) a new SetTransaction when the transaction version and application ID are both available (in either the given SparkSession or DeltaOptions).
spark.databricks.delta.write.txnVersion.autoReset.enabled
createSetTransaction does something extra with spark.databricks.delta.write.txnVersion.autoReset.enabled enabled.
createSetTransaction is used when:
DeleteCommandis requested to performDeleteMergeIntoCommandis requested to commitAndRecordStats- Update command is executed
WriteIntoDeltais requested to write data out
Logging¶
DeltaCommand is an abstract class and logging is configured using the logger of the implementations.