Skip to content

DeltaTableUtils

extractIfPathContainsTimeTravel

extractIfPathContainsTimeTravel(
  session: SparkSession,
  path: String): (String, Option[DeltaTimeTravelSpec])

extractIfPathContainsTimeTravel uses the internal spark.databricks.delta.timeTravel.resolveOnIdentifier.enabled configuration property to find time travel patterns in the given path.

extractIfPathContainsTimeTravel...FIXME

extractIfPathContainsTimeTravel is used when:

findDeltaTableRoot

findDeltaTableRoot(
  spark: SparkSession,
  path: Path,
  options: Map[String, String] = Map.empty): Option[Path]

findDeltaTableRoot traverses the Hadoop DFS-compliant path upwards (to the root directory of the file system) until _delta_log or _samples directories are found, or the root directory is reached.

For _delta_log or _samples directories, findDeltaTableRoot returns the parent directory (of _delta_log directory).

findDeltaTableRoot is used when:

isPredicatePartitionColumnsOnly

isPredicatePartitionColumnsOnly(
  condition: Expression,
  partitionColumns: Seq[String],
  spark: SparkSession): Boolean

isPredicatePartitionColumnsOnly holds true when all of the references of the condition expression are among the partitionColumns.

isPredicatePartitionColumnsOnly is used when:

isDeltaTable

isDeltaTable(
  table: CatalogTable): Boolean
isDeltaTable(
  spark: SparkSession,
  path: Path): Boolean
isDeltaTable(
  spark: SparkSession,
  tableName: TableIdentifier): Boolean

isDeltaTable...FIXME

isDeltaTable is used when:

resolveTimeTravelVersion

resolveTimeTravelVersion(
  conf: SQLConf,
  deltaLog: DeltaLog,
  tt: DeltaTimeTravelSpec): (Long, String)

resolveTimeTravelVersion...FIXME

resolveTimeTravelVersion is used when:

splitMetadataAndDataPredicates

splitMetadataAndDataPredicates(
  condition: Expression,
  partitionColumns: Seq[String],
  spark: SparkSession): (Seq[Expression], Seq[Expression])

splitMetadataAndDataPredicates splits conjunctive (and) predicates in the given condition expression and partitions them into two collections based on the isPredicateMetadataOnly predicate (with the given partitionColumns).

splitMetadataAndDataPredicates is used when:

isPredicateMetadataOnly

isPredicateMetadataOnly(
  condition: Expression,
  partitionColumns: Seq[String],
  spark: SparkSession): Boolean

isPredicateMetadataOnly holds true when the following hold about the given condition:

  1. Is partition column only (given the partitionColumns)
  2. Does not contain a subquery

Removing Internal Table Metadata

removeInternalMetadata(
  spark: SparkSession,
  persistedSchema: StructType): StructType

removeInternalMetadata removes any default expressions that could be in the given persistedSchema.

With spark.databricks.delta.schema.removeSparkInternalMetadata enabled, removeInternalMetadata removes the following Spark internal metadata keys from the schema fields:

  • __autoGeneratedAlias
  • __metadata_col
  • __supports_qualified_star
  • __qualified_access_only
  • __file_source_metadata_col
  • __file_source_constant_metadata_col
  • __file_source_generated_metadata_col

removeInternalMetadata is used when:

getFileMetadataColumn

getFileMetadataColumn(
  df: DataFrame): Column

getFileMetadataColumn requests the given DataFrame for the metadata column for the _metadata logical column name (using Dataset.metadataColumn operator).


getFileMetadataColumn is used when: