DeltaTableUtils¶

extractIfPathContainsTimeTravel¶

extractIfPathContainsTimeTravel(
  session: SparkSession,
  path: String): (String, Option[DeltaTimeTravelSpec])

extractIfPathContainsTimeTravel uses the internal spark.databricks.delta.timeTravel.resolveOnIdentifier.enabled configuration property to find time travel patterns in the given path.

extractIfPathContainsTimeTravel...FIXME

extractIfPathContainsTimeTravel is used when:

DeltaDataSource is requested to sourceSchema and parsePathIdentifier

findDeltaTableRoot¶

findDeltaTableRoot(
  spark: SparkSession,
  path: Path,
  options: Map[String, String] = Map.empty): Option[Path]

findDeltaTableRoot traverses the Hadoop DFS-compliant path upwards (to the root directory of the file system) until _delta_log or _samples directories are found, or the root directory is reached.

For _delta_log or _samples directories, findDeltaTableRoot returns the parent directory (of _delta_log directory).

findDeltaTableRoot is used when:

DeltaTable.isDeltaTable utility is used
VacuumTableCommand is executed
DeltaTableUtils utility is used to isDeltaTable
DeltaDataSource utility is used to parsePathIdentifier

isPredicatePartitionColumnsOnly¶

isPredicatePartitionColumnsOnly(
  condition: Expression,
  partitionColumns: Seq[String],
  spark: SparkSession): Boolean

isPredicatePartitionColumnsOnly holds true when all of the references of the condition expression are among the partitionColumns.

isPredicatePartitionColumnsOnly is used when:

DeltaTableUtils is used to isPredicateMetadataOnly
OptimisticTransactionImpl is requested for the filterFiles
DeltaSourceSnapshot is requested for the partition and data filters

isDeltaTable¶

isDeltaTable(
  table: CatalogTable): Boolean
isDeltaTable(
  spark: SparkSession,
  path: Path): Boolean
isDeltaTable(
  spark: SparkSession,
  tableName: TableIdentifier): Boolean

isDeltaTable...FIXME

isDeltaTable is used when:

DeltaCatalog is requested to loadTable
DeltaTable.forName, DeltaTable.forPath and DeltaTable.isDeltaTable utilities are used
DeltaTableIdentifier utility is used to create a DeltaTableIdentifier from a TableIdentifier
DeltaUnsupportedOperationsCheck is requested to fail

resolveTimeTravelVersion¶

resolveTimeTravelVersion(
  conf: SQLConf,
  deltaLog: DeltaLog,
  tt: DeltaTimeTravelSpec): (Long, String)

resolveTimeTravelVersion...FIXME

resolveTimeTravelVersion is used when:

DeltaLog is requested to create a relation (per partition filters and time travel)
DeltaTableV2 is requested for a Snapshot

splitMetadataAndDataPredicates¶

splitMetadataAndDataPredicates(
  condition: Expression,
  partitionColumns: Seq[String],
  spark: SparkSession): (Seq[Expression], Seq[Expression])

splitMetadataAndDataPredicates splits conjunctive (and) predicates in the given condition expression and partitions them into two collections based on the isPredicateMetadataOnly predicate (with the given partitionColumns).

splitMetadataAndDataPredicates is used when:

PartitionFiltering is requested for filesForScan
DeleteCommand is executed (with a delete condition)
UpdateCommand is executed

isPredicateMetadataOnly¶

isPredicateMetadataOnly(
  condition: Expression,
  partitionColumns: Seq[String],
  spark: SparkSession): Boolean

isPredicateMetadataOnly holds true when the following hold about the given condition:

Is partition column only (given the partitionColumns)
Does not contain a subquery

Removing Internal Table Metadata¶

removeInternalMetadata(
  spark: SparkSession,
  persistedSchema: StructType): StructType

removeInternalMetadata removes any default expressions that could be in the given persistedSchema.

With spark.databricks.delta.schema.removeSparkInternalMetadata enabled, removeInternalMetadata removes the following Spark internal metadata keys from the schema fields:

__autoGeneratedAlias
__metadata_col
__supports_qualified_star
__qualified_access_only
__file_source_metadata_col
__file_source_constant_metadata_col
__file_source_generated_metadata_col

removeInternalMetadata is used when:

DeltaLog is requested to buildHadoopFsRelationWithFileIndex
DeltaTableV2 is requested to tableSchema
DeltaDataSource is requested to sourceSchema
DeltaSourceBase is requested for the schema

getFileMetadataColumn¶

getFileMetadataColumn(
  df: DataFrame): Column

getFileMetadataColumn requests the given DataFrame for the metadata column for the _metadata logical column name (using Dataset.metadataColumn operator).

getFileMetadataColumn is used when:

RowCommitVersion is requested to preserveRowCommitVersions
RowId is requested to preserveRowIds