DeltaTableUtils¶
extractIfPathContainsTimeTravel¶
extractIfPathContainsTimeTravel(
session: SparkSession,
path: String): (String, Option[DeltaTimeTravelSpec])
extractIfPathContainsTimeTravel uses the internal spark.databricks.delta.timeTravel.resolveOnIdentifier.enabled configuration property to find time travel patterns in the given path.
extractIfPathContainsTimeTravel...FIXME
extractIfPathContainsTimeTravel is used when:
DeltaDataSourceis requested to sourceSchema and parsePathIdentifier
findDeltaTableRoot¶
findDeltaTableRoot(
spark: SparkSession,
path: Path,
options: Map[String, String] = Map.empty): Option[Path]
findDeltaTableRoot traverses the Hadoop DFS-compliant path upwards (to the root directory of the file system) until _delta_log or _samples directories are found, or the root directory is reached.
For _delta_log or _samples directories, findDeltaTableRoot returns the parent directory (of _delta_log directory).
findDeltaTableRoot is used when:
- DeltaTable.isDeltaTable utility is used
- VacuumTableCommand is executed
DeltaTableUtilsutility is used to isDeltaTableDeltaDataSourceutility is used to parsePathIdentifier
isPredicatePartitionColumnsOnly¶
isPredicatePartitionColumnsOnly(
condition: Expression,
partitionColumns: Seq[String],
spark: SparkSession): Boolean
isPredicatePartitionColumnsOnly holds true when all of the references of the condition expression are among the partitionColumns.
isPredicatePartitionColumnsOnly is used when:
DeltaTableUtilsis used to isPredicateMetadataOnlyOptimisticTransactionImplis requested for the filterFilesDeltaSourceSnapshotis requested for the partition and data filters
isDeltaTable¶
isDeltaTable(
table: CatalogTable): Boolean
isDeltaTable(
spark: SparkSession,
path: Path): Boolean
isDeltaTable(
spark: SparkSession,
tableName: TableIdentifier): Boolean
isDeltaTable...FIXME
isDeltaTable is used when:
DeltaCatalogis requested to loadTable- DeltaTable.forName, DeltaTable.forPath and DeltaTable.isDeltaTable utilities are used
DeltaTableIdentifierutility is used to create a DeltaTableIdentifier from a TableIdentifierDeltaUnsupportedOperationsCheckis requested to fail
resolveTimeTravelVersion¶
resolveTimeTravelVersion(
conf: SQLConf,
deltaLog: DeltaLog,
tt: DeltaTimeTravelSpec): (Long, String)
resolveTimeTravelVersion...FIXME
resolveTimeTravelVersion is used when:
DeltaLogis requested to create a relation (per partition filters and time travel)DeltaTableV2is requested for a Snapshot
splitMetadataAndDataPredicates¶
splitMetadataAndDataPredicates(
condition: Expression,
partitionColumns: Seq[String],
spark: SparkSession): (Seq[Expression], Seq[Expression])
splitMetadataAndDataPredicates splits conjunctive (and) predicates in the given condition expression and partitions them into two collections based on the isPredicateMetadataOnly predicate (with the given partitionColumns).
splitMetadataAndDataPredicates is used when:
PartitionFilteringis requested for filesForScan- DeleteCommand is executed (with a delete condition)
- UpdateCommand is executed
isPredicateMetadataOnly¶
isPredicateMetadataOnly(
condition: Expression,
partitionColumns: Seq[String],
spark: SparkSession): Boolean
isPredicateMetadataOnly holds true when the following hold about the given condition:
- Is partition column only (given the
partitionColumns) - Does not contain a subquery
Removing Internal Table Metadata¶
removeInternalMetadata(
spark: SparkSession,
persistedSchema: StructType): StructType
removeInternalMetadata removes any default expressions that could be in the given persistedSchema.
With spark.databricks.delta.schema.removeSparkInternalMetadata enabled, removeInternalMetadata removes the following Spark internal metadata keys from the schema fields:
__autoGeneratedAlias__metadata_col__supports_qualified_star__qualified_access_only__file_source_metadata_col__file_source_constant_metadata_col__file_source_generated_metadata_col
removeInternalMetadata is used when:
DeltaLogis requested to buildHadoopFsRelationWithFileIndexDeltaTableV2is requested to tableSchemaDeltaDataSourceis requested to sourceSchemaDeltaSourceBaseis requested for the schema
getFileMetadataColumn¶
getFileMetadataColumn(
df: DataFrame): Column
getFileMetadataColumn requests the given DataFrame for the metadata column for the _metadata logical column name (using Dataset.metadataColumn operator).
getFileMetadataColumn is used when:
RowCommitVersionis requested to preserveRowCommitVersionsRowIdis requested to preserveRowIds