CatalogTable¶

CatalogTable is the specification (metadata) of a table in a SessionCatalog.

Creating Instance¶

CatalogTable takes the following to be created:

TableIdentifier
Table type
CatalogStorageFormat
Schema (StructType)
Name of the table provider
Partition Columns
Bucketing specification
Owner
Created Time
Last access time
Created By version
Table Properties
Statistics
View Text
Comment
Unsupported Features (Seq[String])
tracksPartitionsInCatalog flag (default: false)
schemaPreservesCase flag (default: true)
Ignored properties
View Original Text

CatalogTable is created when:

HiveClientImpl is requested to convertHiveTableToCatalogTable
InsertIntoHiveDirCommand is executed
DataFrameWriter is requested to create a table
ResolveSessionCatalog logical resolution rule is requested to buildCatalogTable
CreateTableLikeCommand is executed
CreateViewCommand is requested to prepareTable
ViewHelper utility is used to prepareTemporaryView and prepareTemporaryViewStoringAnalyzedPlan
V2SessionCatalog is requested to createTable
CatalogImpl is requested to createTable

Bucketing Specification¶

bucketSpec: Option[BucketSpec] = None

CatalogTable can be given a BucketSpec when created. It is undefined (None) by default.

BucketSpec is given (using getBucketSpecFromTableProperties from a Hive metastore) when:

HiveExternalCatalog is requested to restoreHiveSerdeTable and restoreDataSourceTable
HiveClientImpl is requested to convertHiveTableToCatalogTable

BucketSpec is given when:

DataFrameWriter is requested to create a table (with getBucketSpec)
ResolveSessionCatalog logical resolution rule is requested to buildCatalogTable
CreateTableLikeCommand is executed (with a bucketed table)
PreprocessTableCreation logical resolution rule is requested to normalizeCatalogTable
V2SessionCatalog is requested to create a table

BucketSpec is used when:

CatalogTable is requested to toLinkedHashMap
V1Table is requested for the partitioning
CreateDataSourceTableCommand is executed
CreateDataSourceTableAsSelectCommand is requested to saveDataIntoTable
others

Note

Use DescribeTableCommand to review BucketSpec
Use ShowCreateTableCommand to review the Spark DDL syntax
Use Catalog.listColumns to list all columns (incl. bucketing columns)

Table Type¶

CatalogTable is given a CatalogTableType when created:

EXTERNAL for external tables (EXTERNAL_TABLE in Hive)
MANAGED for managed tables (MANAGED_TABLE in Hive)
VIEW for views (VIRTUAL_VIEW in Hive)

CatalogTableType is included when a TreeNode is requested for a JSON representation for...FIXME

Statistics¶

stats: Option[CatalogStatistics] = None

CatalogTable can be given a CatalogStatistics when created. It is undefined (None) by default.

CatalogTable can be displayed using the following commands (when executed with EXTENDED or FORMATTED clause):

DescribeTableCommand (DESCRIBE TABLE SQL statement)
DescribeColumnCommand (DESCRIBE TABLE with a column specified)

The CatalogStatistics can be defined when:

InMemoryCatalog is requested to alterTableStats
HiveExternalCatalog is requested to restore a table metadata
HiveClientImpl is requested to convertHiveTableToCatalogTable
PruneHiveTablePartitions logical optimization is executed (and requested to update a table metadata)
PruneFileSourcePartitions logical optimization is executed

The CatalogStatistics is used when:

DataSource is requested to resolve a Relation (of type FileFormat that uses a CatalogFileIndex)
HiveTableRelation is requested to computeStats (with spark.sql.cbo.enabled or spark.sql.cbo.planStats.enabled enabled)
LogicalRelation is requested to computeStats (with spark.sql.cbo.enabled or spark.sql.cbo.planStats.enabled enabled)

The CatalogStatistics is updated (altered) when:

AnalyzeColumnCommand is requested to analyzeColumnInCatalog
CommandUtils is requested to updateTableStats, analyzeTable
AlterTableAddPartitionCommand is executed

CatalogStatistics is Statistics in toLinkedHashMap.

toLinkedHashMap¶

toLinkedHashMap: LinkedHashMap[String, String]

toLinkedHashMap...FIXME

toLinkedHashMap is used when:

CatalogTable is requested to toString and simpleString
DescribeTableCommand is executed (and describeFormattedTableInfo)

Demo: Accessing Table Metadata¶

Catalog¶

val q = spark.catalog.listTables.filter($"name" === "t1")

scala> q.show
+----+--------+-----------+---------+-----------+
|name|database|description|tableType|isTemporary|
+----+--------+-----------+---------+-----------+
|  t1| default|       null|  MANAGED|      false|
+----+--------+-----------+---------+-----------+

SessionCatalog¶

import org.apache.spark.sql.catalyst.catalog.SessionCatalog
val sessionCatalog = spark.sessionState.catalog
assert(sessionCatalog.isInstanceOf[SessionCatalog])

val t1Tid = spark.sessionState.sqlParser.parseTableIdentifier("t1")
val t1Metadata = sessionCatalog.getTempViewOrPermanentTableMetadata(t1Tid)

import org.apache.spark.sql.catalyst.catalog.CatalogTable
assert(t1Metadata.isInstanceOf[CatalogTable])