Skip to content

DeltaCatalog

DeltaCatalog is a DelegatingCatalogExtension (Spark SQL) and a StagingTableCatalog.

DeltaCatalog is registered using spark.sql.catalog.spark_catalog (Spark SQL) configuration property.

StagingTableCatalog

DeltaCatalog is a StagingTableCatalog (Spark SQL) that creates a StagedDeltaTableV2 (for delta data source) or a BestEffortStagedTable.

stageCreate

StagingTableCatalog
stageCreate(
  ident: Identifier,
  schema: StructType,
  partitions: Array[Transform],
  properties: util.Map[String, String]): StagedTable

stageCreate is part of the StagingTableCatalog (Spark SQL) abstraction.

stageCreate creates a StagedDeltaTableV2 (with TableCreationModes.Create operation) for delta data source only (based on the given properties or spark.sql.sources.default configuration property).

Otherwise, stageCreate creates a BestEffortStagedTable (requesting the parent TableCatalog to create a table).

stageCreateOrReplace

StagingTableCatalog
stageCreateOrReplace(
  ident: Identifier,
  schema: StructType,
  partitions: Array[Transform],
  properties: util.Map[String, String]): StagedTable

stageCreateOrReplace is part of the StagingTableCatalog (Spark SQL) abstraction.

stageCreateOrReplace creates a StagedDeltaTableV2 (with TableCreationModes.CreateOrReplace operation) for delta data source only (based on the given properties or spark.sql.sources.default configuration property).

Otherwise, stageCreateOrReplace requests the parent TableCatalog to drop the table first and then creates a BestEffortStagedTable (requesting the parent TableCatalog to create the table).

stageReplace

StagingTableCatalog
stageReplace(
  ident: Identifier,
  schema: StructType,
  partitions: Array[Transform],
  properties: util.Map[String, String]): StagedTable

stageReplace is part of the StagingTableCatalog (Spark SQL) abstraction.

stageReplace creates a StagedDeltaTableV2 (with TableCreationModes.Replace operation) for delta data source only (based on the given properties or spark.sql.sources.default configuration property).

Otherwise, stageReplace requests the parent TableCatalog to drop the table first and then creates a BestEffortStagedTable (requesting the parent TableCatalog to create the table).

Altering Table

TableCatalog
alterTable(
  ident: Identifier,
  changes: TableChange*): Table

alterTable is part of the TableCatalog (Spark SQL) abstraction.

alterTable loads the table and continues only when it is a DeltaTableV2. Otherwise, alterTable delegates to the parent TableCatalog.

alterTable groups the given TableChanges by their (class) type.

In addition, alterTable collects the following ColumnChanges together (that are then executed as column updates as AlterTableChangeColumnDeltaCommand):

  • RenameColumn
  • UpdateColumnComment
  • UpdateColumnNullability
  • UpdateColumnPosition
  • UpdateColumnType

alterTable executes the table changes as one of AlterDeltaTableCommands.

TableChange AlterDeltaTableCommand
AddColumn AlterTableAddColumnsDeltaCommand
AddConstraint AlterTableAddConstraintDeltaCommand
ColumnChange AlterTableChangeColumnDeltaCommand
DropConstraint AlterTableDropConstraintDeltaCommand
RemoveProperty AlterTableUnsetPropertiesDeltaCommand
SetLocation
(SetProperty with location property)
catalog delta tables only
AlterTableSetLocationDeltaCommand
SetProperty AlterTableSetPropertiesDeltaCommand

alterTable...FIXME

Creating Table

TableCatalog
createTable(
  ident: Identifier,
  schema: StructType,
  partitions: Array[Transform],
  properties: util.Map[String, String]): Table

createTable is part of the TableCatalog (Spark SQL) abstraction.

createTable...FIXME

Loading Table

TableCatalog
loadTable(
  ident: Identifier): Table

loadTable is part of the TableCatalog (Spark SQL) abstraction.

loadTable loads a table by the given identifier from a catalog.

If found and the table is a delta table (Spark SQL's V1Table with delta provider), loadTable creates a DeltaTableV2.

Creating Delta Table

createDeltaTable(
  ident: Identifier,
  schema: StructType,
  partitions: Array[Transform],
  allTableProperties: Map[String, String],
  writeOptions: Map[String, String],
  sourceQuery: Option[DataFrame],
  operation: TableCreationModes.CreationMode): Table

createDeltaTable...FIXME


createDeltaTable is used when:

Operation

createDeltaTable is given an argument of type TableCreationModes.CreationMode:

validateClusterBySpec

validateClusterBySpec(
  maybeClusterBySpec: Option[ClusterBySpec],
  schema: StructType): Unit

validateClusterBySpec...FIXME

Looking Up Table Provider

getProvider(
  properties: util.Map[String, String]): String

getProvider takes the value of the provider from the given properties (if available) or defaults to the value of spark.sql.sources.default (Spark SQL) configuration property.


getProvider is used when: