Skip to content

DeltaCommand — Delta Commands

DeltaCommand is a marker interface for delta commands.

Implementations

Converting Predicate Text to Catalyst Expression

parsePredicates(
  spark: SparkSession,
  predicate: String): Seq[Expression]

parsePredicates converts the given predicate text to an Expression (Spark SQL).


parsePredicates requests the given SparkSession (Spark SQL) for the session ParserInterface (Spark SQL) to parseExpression the given predicate text.


parsePredicates is used when:

Verifying Partition Predicates

verifyPartitionPredicates(
  spark: SparkSession,
  partitionColumns: Seq[String],
  predicates: Seq[Expression]): Unit

verifyPartitionPredicates asserts that the given predicates expressions are as follows:

  • Contain no subqueries
  • Reference partition columns only (partitionColumns)

verifyPartitionPredicates is used when:

generateCandidateFileMap

generateCandidateFileMap(
  basePath: Path,
  candidateFiles: Seq[AddFile]): Map[String, AddFile]

generateCandidateFileMap...FIXME

generateCandidateFileMap is used when...FIXME

removeFilesFromPaths

removeFilesFromPaths(
  deltaLog: DeltaLog,
  nameToAddFileMap: Map[String, AddFile],
  filesToRewrite: Seq[String],
  operationTimestamp: Long): Seq[RemoveFile]

removeFilesFromPaths...FIXME

removeFilesFromPaths is used when:

Creating HadoopFsRelation (with TahoeBatchFileIndex)

buildBaseRelation(
  spark: SparkSession,
  txn: OptimisticTransaction,
  actionType: String,
  rootPath: Path,
  inputLeafFiles: Seq[String],
  nameToAddFileMap: Map[String, AddFile]): HadoopFsRelation

buildBaseRelation converts the given inputLeafFiles to AddFiles (with the given rootPath and nameToAddFileMap).

buildBaseRelation creates a TahoeBatchFileIndex for the AddFiles (with the input actionType and rootPath).

In the end, buildBaseRelation creates a HadoopFsRelation (Spark SQL) with the TahoeBatchFileIndex (and the other properties based on the metadata of the given OptimisticTransaction).


buildBaseRelation is used when:

getTouchedFile

getTouchedFile(
  basePath: Path,
  filePath: String,
  nameToAddFileMap: Map[String, AddFile]): AddFile

getTouchedFile...FIXME

getTouchedFile is used when:

isCatalogTable

isCatalogTable(
  analyzer: Analyzer,
  tableIdent: TableIdentifier): Boolean

isCatalogTable...FIXME

isCatalogTable is used when:

isPathIdentifier

isPathIdentifier(
  tableIdent: TableIdentifier): Boolean

isPathIdentifier...FIXME

isPathIdentifier is used when...FIXME

commitLarge

commitLarge(
  spark: SparkSession,
  txn: OptimisticTransaction,
  actions: Iterator[Action],
  op: DeltaOperations.Operation,
  context: Map[String, String],
  metrics: Map[String, String]): Long

commitLarge...FIXME

commitLarge is used when:

updateAndCheckpoint

updateAndCheckpoint(
  spark: SparkSession,
  deltaLog: DeltaLog,
  commitSize: Int,
  attemptVersion: Long): Unit

updateAndCheckpoint requests the given DeltaLog to update.

updateAndCheckpoint prints out the following INFO message to the logs:

Committed delta #[attemptVersion] to [logPath]. Wrote [commitSize] actions.

In the end, updateAndCheckpoint requests the given DeltaLog to checkpoint the current snapshot.

IllegalStateException

updateAndCheckpoint throws an IllegalStateException if the version after update does not match the assumed attemptVersion:

The committed version is [attemptVersion] but the current version is [currentSnapshot].

Posting Metric Updates

sendDriverMetrics(
  spark: SparkSession,
  metrics: Map[String, SQLMetric]): Unit
Procedure

sendDriverMetrics is a procedure (returns Unit) so what happens inside stays inside (paraphrasing the former advertising slogan of Las Vegas, Nevada).

sendDriverMetrics announces the updates to the given SQLMetrics (Spark SQL).


sendDriverMetrics is used when:

Creating SetTransaction Action

createSetTransaction(
  sparkSession: SparkSession,
  deltaLog: DeltaLog,
  options: Option[DeltaOptions] = None): Option[SetTransaction]
options Argument

options is undefined (None) by default and is only defined when WriteIntoDelta is requested to write data out.

createSetTransaction creates (and returns) a new SetTransaction when the transaction version and application ID are both available (in either the given SparkSession or DeltaOptions).

spark.databricks.delta.write.txnVersion.autoReset.enabled

createSetTransaction does something extra with spark.databricks.delta.write.txnVersion.autoReset.enabled enabled.


createSetTransaction is used when:

Logging

DeltaCommand is an abstract class and logging is configured using the logger of the implementations.