DeltaCatalog¶
DeltaCatalog
is a DelegatingCatalogExtension
(Spark SQL) and a StagingTableCatalog.
DeltaCatalog
is registered using spark.sql.catalog.spark_catalog
(Spark SQL) configuration property.
StagingTableCatalog¶
DeltaCatalog
is a StagingTableCatalog
(Spark SQL) that creates a StagedDeltaTableV2 (for delta data source) or a BestEffortStagedTable
.
stageCreate¶
StagingTableCatalog
stageCreate(
ident: Identifier,
schema: StructType,
partitions: Array[Transform],
properties: util.Map[String, String]): StagedTable
stageCreate
is part of the StagingTableCatalog
(Spark SQL) abstraction.
stageCreate
creates a StagedDeltaTableV2 (with TableCreationModes.Create
operation) for delta data source only (based on the given properties
or spark.sql.sources.default configuration property).
Otherwise, stageCreate
creates a BestEffortStagedTable
(requesting the parent TableCatalog
to create a table).
stageCreateOrReplace¶
StagingTableCatalog
stageCreateOrReplace(
ident: Identifier,
schema: StructType,
partitions: Array[Transform],
properties: util.Map[String, String]): StagedTable
stageCreateOrReplace
is part of the StagingTableCatalog
(Spark SQL) abstraction.
stageCreateOrReplace
creates a StagedDeltaTableV2 (with TableCreationModes.CreateOrReplace
operation) for delta data source only (based on the given properties
or spark.sql.sources.default configuration property).
Otherwise, stageCreateOrReplace
requests the parent TableCatalog
to drop the table first and then creates a BestEffortStagedTable
(requesting the parent TableCatalog
to create the table).
stageReplace¶
StagingTableCatalog
stageReplace(
ident: Identifier,
schema: StructType,
partitions: Array[Transform],
properties: util.Map[String, String]): StagedTable
stageReplace
is part of the StagingTableCatalog
(Spark SQL) abstraction.
stageReplace
creates a StagedDeltaTableV2 (with TableCreationModes.Replace
operation) for delta data source only (based on the given properties
or spark.sql.sources.default configuration property).
Otherwise, stageReplace
requests the parent TableCatalog
to drop the table first and then creates a BestEffortStagedTable
(requesting the parent TableCatalog
to create the table).
Altering Table¶
TableCatalog
alterTable(
ident: Identifier,
changes: TableChange*): Table
alterTable
is part of the TableCatalog
(Spark SQL) abstraction.
alterTable
loads the table and continues only when it is a DeltaTableV2. Otherwise, alterTable
delegates to the parent TableCatalog
.
alterTable
groups the given TableChange
s by their (class) type.
In addition, alterTable
collects the following ColumnChange
s together (that are then executed as column updates as AlterTableChangeColumnDeltaCommand):
RenameColumn
UpdateColumnComment
UpdateColumnNullability
UpdateColumnPosition
UpdateColumnType
alterTable
executes the table changes as one of AlterDeltaTableCommands.
TableChange | AlterDeltaTableCommand |
---|---|
AddColumn | AlterTableAddColumnsDeltaCommand |
AddConstraint | AlterTableAddConstraintDeltaCommand |
ColumnChange | AlterTableChangeColumnDeltaCommand |
DropConstraint | AlterTableDropConstraintDeltaCommand |
RemoveProperty | AlterTableUnsetPropertiesDeltaCommand |
SetLocation ( SetProperty with location property)catalog delta tables only | AlterTableSetLocationDeltaCommand |
SetProperty | AlterTableSetPropertiesDeltaCommand |
alterTable
...FIXME
Creating Table¶
TableCatalog
createTable(
ident: Identifier,
schema: StructType,
partitions: Array[Transform],
properties: util.Map[String, String]): Table
createTable
is part of the TableCatalog
(Spark SQL) abstraction.
createTable
...FIXME
Loading Table¶
TableCatalog
loadTable(
ident: Identifier): Table
loadTable
is part of the TableCatalog
(Spark SQL) abstraction.
loadTable
loads a table by the given identifier from a catalog.
If found and the table is a delta table (Spark SQL's V1Table with delta
provider), loadTable
creates a DeltaTableV2.
Creating Delta Table¶
createDeltaTable(
ident: Identifier,
schema: StructType,
partitions: Array[Transform],
allTableProperties: Map[String, String],
writeOptions: Map[String, String],
sourceQuery: Option[DataFrame],
operation: TableCreationModes.CreationMode): Table
createDeltaTable
...FIXME
createDeltaTable
is used when:
DeltaCatalog
is requested to create a tableStagedDeltaTableV2
is requested to commitStagedChanges
Operation¶
createDeltaTable
is given an argument of type TableCreationModes.CreationMode
:
Create
when DeltaCatalog creates a tableStagedDeltaTableV2
is given a CreationMode when created
validateClusterBySpec¶
validateClusterBySpec(
maybeClusterBySpec: Option[ClusterBySpec],
schema: StructType): Unit
validateClusterBySpec
...FIXME
Looking Up Table Provider¶
getProvider(
properties: util.Map[String, String]): String
getProvider
takes the value of the provider
from the given properties
(if available) or defaults to the value of spark.sql.sources.default
(Spark SQL) configuration property.
getProvider
is used when:
DeltaCatalog
is requested to createTable, stageReplace, stageCreateOrReplace and stageCreate