CatalogTable¶
CatalogTable
is the specification (metadata) of a table in a SessionCatalog.
Creating Instance¶
CatalogTable
takes the following to be created:
-
TableIdentifier
- Table type
- CatalogStorageFormat
- Schema (StructType)
- Name of the table provider
- Partition Columns
- Bucketing specification
- Owner
- Created Time
- Last access time
- Created By version
- Table Properties
- Statistics
- View Text
- Comment
- Unsupported Features (
Seq[String]
) -
tracksPartitionsInCatalog
flag (default:false
) -
schemaPreservesCase
flag (default:true
) - Ignored properties
- View Original Text
CatalogTable
is created when:
HiveClientImpl
is requested to convertHiveTableToCatalogTable- InsertIntoHiveDirCommand is executed
DataFrameWriter
is requested to create a tableResolveSessionCatalog
logical resolution rule is requested to buildCatalogTableCreateTableLikeCommand
is executedCreateViewCommand
is requested to prepareTableViewHelper
utility is used toprepareTemporaryView
andprepareTemporaryViewStoringAnalyzedPlan
V2SessionCatalog
is requested to createTableCatalogImpl
is requested to createTable
Bucketing Specification¶
bucketSpec: Option[BucketSpec] = None
CatalogTable
can be given a BucketSpec when created. It is undefined (None
) by default.
BucketSpec
is given (using getBucketSpecFromTableProperties from a Hive metastore) when:
HiveExternalCatalog
is requested to restoreHiveSerdeTable and restoreDataSourceTableHiveClientImpl
is requested to convertHiveTableToCatalogTable
BucketSpec
is given when:
DataFrameWriter
is requested to create a table (with getBucketSpec)ResolveSessionCatalog
logical resolution rule is requested to buildCatalogTableCreateTableLikeCommand
is executed (with a bucketed table)PreprocessTableCreation
logical resolution rule is requested to normalizeCatalogTableV2SessionCatalog
is requested to create a table
BucketSpec
is used when:
CatalogTable
is requested to toLinkedHashMapV1Table
is requested for the partitioning- CreateDataSourceTableCommand is executed
- CreateDataSourceTableAsSelectCommand is requested to saveDataIntoTable
- others
Note
- Use DescribeTableCommand to review
BucketSpec
- Use ShowCreateTableCommand to review the Spark DDL syntax
- Use Catalog.listColumns to list all columns (incl. bucketing columns)
Table Type¶
CatalogTable
is given a CatalogTableType
when created:
EXTERNAL
for external tables (EXTERNAL_TABLE in Hive)MANAGED
for managed tables (MANAGED_TABLE in Hive)VIEW
for views (VIRTUAL_VIEW in Hive)
CatalogTableType
is included when a TreeNode
is requested for a JSON representation for...FIXME
Statistics¶
stats: Option[CatalogStatistics] = None
CatalogTable
can be given a CatalogStatistics when created. It is undefined (None
) by default.
CatalogTable
can be displayed using the following commands (when executed with EXTENDED
or FORMATTED
clause):
- DescribeTableCommand (
DESCRIBE TABLE
SQL statement) - DescribeColumnCommand (
DESCRIBE TABLE
with a column specified)
The CatalogStatistics
can be defined when:
InMemoryCatalog
is requested to alterTableStatsHiveExternalCatalog
is requested to restore a table metadataHiveClientImpl
is requested to convertHiveTableToCatalogTable- PruneHiveTablePartitions logical optimization is executed (and requested to update a table metadata)
- PruneFileSourcePartitions logical optimization is executed
The CatalogStatistics
is used when:
DataSource
is requested to resolve a Relation (of type FileFormat that uses a CatalogFileIndex)HiveTableRelation
is requested to computeStats (with spark.sql.cbo.enabled or spark.sql.cbo.planStats.enabled enabled)LogicalRelation
is requested to computeStats (with spark.sql.cbo.enabled or spark.sql.cbo.planStats.enabled enabled)
The CatalogStatistics
is updated (altered) when:
AnalyzeColumnCommand
is requested to analyzeColumnInCatalogCommandUtils
is requested to updateTableStats, analyzeTableAlterTableAddPartitionCommand
is executed
CatalogStatistics
is Statistics in toLinkedHashMap.
toLinkedHashMap¶
toLinkedHashMap: LinkedHashMap[String, String]
toLinkedHashMap
...FIXME
toLinkedHashMap
is used when:
CatalogTable
is requested to toString and simpleString- DescribeTableCommand is executed (and describeFormattedTableInfo)
Demo: Accessing Table Metadata¶
Catalog¶
val q = spark.catalog.listTables.filter($"name" === "t1")
scala> q.show
+----+--------+-----------+---------+-----------+
|name|database|description|tableType|isTemporary|
+----+--------+-----------+---------+-----------+
| t1| default| null| MANAGED| false|
+----+--------+-----------+---------+-----------+
SessionCatalog¶
import org.apache.spark.sql.catalyst.catalog.SessionCatalog
val sessionCatalog = spark.sessionState.catalog
assert(sessionCatalog.isInstanceOf[SessionCatalog])
val t1Tid = spark.sessionState.sqlParser.parseTableIdentifier("t1")
val t1Metadata = sessionCatalog.getTempViewOrPermanentTableMetadata(t1Tid)
import org.apache.spark.sql.catalyst.catalog.CatalogTable
assert(t1Metadata.isInstanceOf[CatalogTable])