DeltaTableUtils¶
extractIfPathContainsTimeTravel¶
extractIfPathContainsTimeTravel(
session: SparkSession,
path: String): (String, Option[DeltaTimeTravelSpec])
extractIfPathContainsTimeTravel
uses the internal spark.databricks.delta.timeTravel.resolveOnIdentifier.enabled configuration property to find time travel patterns in the given path
.
extractIfPathContainsTimeTravel
...FIXME
extractIfPathContainsTimeTravel
is used when:
DeltaDataSource
is requested to sourceSchema and parsePathIdentifier
findDeltaTableRoot¶
findDeltaTableRoot(
spark: SparkSession,
path: Path,
options: Map[String, String] = Map.empty): Option[Path]
findDeltaTableRoot
traverses the Hadoop DFS-compliant path upwards (to the root directory of the file system) until _delta_log
or _samples
directories are found, or the root directory is reached.
For _delta_log
or _samples
directories, findDeltaTableRoot
returns the parent directory (of _delta_log
directory).
findDeltaTableRoot
is used when:
- DeltaTable.isDeltaTable utility is used
- VacuumTableCommand is executed
DeltaTableUtils
utility is used to isDeltaTableDeltaDataSource
utility is used to parsePathIdentifier
isPredicatePartitionColumnsOnly¶
isPredicatePartitionColumnsOnly(
condition: Expression,
partitionColumns: Seq[String],
spark: SparkSession): Boolean
isPredicatePartitionColumnsOnly
holds true
when all of the references of the condition
expression are among the partitionColumns
.
isPredicatePartitionColumnsOnly
is used when:
DeltaTableUtils
is used to isPredicateMetadataOnlyOptimisticTransactionImpl
is requested for the filterFilesDeltaSourceSnapshot
is requested for the partition and data filters
isDeltaTable¶
isDeltaTable(
table: CatalogTable): Boolean
isDeltaTable(
spark: SparkSession,
path: Path): Boolean
isDeltaTable(
spark: SparkSession,
tableName: TableIdentifier): Boolean
isDeltaTable
...FIXME
isDeltaTable
is used when:
DeltaCatalog
is requested to loadTable- DeltaTable.forName, DeltaTable.forPath and DeltaTable.isDeltaTable utilities are used
DeltaTableIdentifier
utility is used to create a DeltaTableIdentifier from a TableIdentifierDeltaUnsupportedOperationsCheck
is requested to fail
resolveTimeTravelVersion¶
resolveTimeTravelVersion(
conf: SQLConf,
deltaLog: DeltaLog,
tt: DeltaTimeTravelSpec): (Long, String)
resolveTimeTravelVersion
...FIXME
resolveTimeTravelVersion
is used when:
DeltaLog
is requested to create a relation (per partition filters and time travel)DeltaTableV2
is requested for a Snapshot
splitMetadataAndDataPredicates¶
splitMetadataAndDataPredicates(
condition: Expression,
partitionColumns: Seq[String],
spark: SparkSession): (Seq[Expression], Seq[Expression])
splitMetadataAndDataPredicates
splits conjunctive (and) predicates in the given condition
expression and partitions them into two collections based on the isPredicateMetadataOnly predicate (with the given partitionColumns
).
splitMetadataAndDataPredicates
is used when:
PartitionFiltering
is requested for filesForScan- DeleteCommand is executed (with a delete condition)
- UpdateCommand is executed
isPredicateMetadataOnly¶
isPredicateMetadataOnly(
condition: Expression,
partitionColumns: Seq[String],
spark: SparkSession): Boolean
isPredicateMetadataOnly
holds true
when the following hold about the given condition
:
- Is partition column only (given the
partitionColumns
) - Does not contain a subquery
Removing Internal Table Metadata¶
removeInternalMetadata(
spark: SparkSession,
persistedSchema: StructType): StructType
removeInternalMetadata
removes any default expressions that could be in the given persistedSchema
.
With spark.databricks.delta.schema.removeSparkInternalMetadata enabled, removeInternalMetadata
removes the following Spark internal metadata keys from the schema fields:
__autoGeneratedAlias
__metadata_col
__supports_qualified_star
__qualified_access_only
__file_source_metadata_col
__file_source_constant_metadata_col
__file_source_generated_metadata_col
removeInternalMetadata
is used when:
DeltaLog
is requested to buildHadoopFsRelationWithFileIndexDeltaTableV2
is requested to tableSchemaDeltaDataSource
is requested to sourceSchemaDeltaSourceBase
is requested for the schema
getFileMetadataColumn¶
getFileMetadataColumn(
df: DataFrame): Column
getFileMetadataColumn
requests the given DataFrame
for the metadata column for the _metadata
logical column name (using Dataset.metadataColumn
operator).
getFileMetadataColumn
is used when:
RowCommitVersion
is requested to preserveRowCommitVersionsRowId
is requested to preserveRowIds