CatalogTable¶
CatalogTable is the specification (metadata) of a table in a SessionCatalog.
Creating Instance¶
CatalogTable takes the following to be created:
-
TableIdentifier - Table type
- CatalogStorageFormat
- Schema (StructType)
- Name of the table provider
- Partition Columns
- Bucketing specification
- Owner
- Created Time
- Last access time
- Created By version
- Table Properties
- Statistics
- View Text
- Comment
- Unsupported Features (
Seq[String]) -
tracksPartitionsInCatalogflag (default:false) -
schemaPreservesCaseflag (default:true) - Ignored properties
- View Original Text
CatalogTable is created when:
HiveClientImplis requested to convertHiveTableToCatalogTable- InsertIntoHiveDirCommand is executed
DataFrameWriteris requested to create a tableResolveSessionCataloglogical resolution rule is requested to buildCatalogTableCreateTableLikeCommandis executedCreateViewCommandis requested to prepareTableViewHelperutility is used toprepareTemporaryViewandprepareTemporaryViewStoringAnalyzedPlanV2SessionCatalogis requested to createTableCatalogImplis requested to createTable
Bucketing Specification¶
bucketSpec: Option[BucketSpec] = None
CatalogTable can be given a BucketSpec when created. It is undefined (None) by default.
BucketSpec is given (using getBucketSpecFromTableProperties from a Hive metastore) when:
HiveExternalCatalogis requested to restoreHiveSerdeTable and restoreDataSourceTableHiveClientImplis requested to convertHiveTableToCatalogTable
BucketSpec is given when:
DataFrameWriteris requested to create a table (with getBucketSpec)ResolveSessionCataloglogical resolution rule is requested to buildCatalogTableCreateTableLikeCommandis executed (with a bucketed table)PreprocessTableCreationlogical resolution rule is requested to normalizeCatalogTableV2SessionCatalogis requested to create a table
BucketSpec is used when:
CatalogTableis requested to toLinkedHashMapV1Tableis requested for the partitioning- CreateDataSourceTableCommand is executed
- CreateDataSourceTableAsSelectCommand is requested to saveDataIntoTable
- others
Note
- Use DescribeTableCommand to review
BucketSpec - Use ShowCreateTableCommand to review the Spark DDL syntax
- Use Catalog.listColumns to list all columns (incl. bucketing columns)
Table Type¶
CatalogTable is given a CatalogTableType when created:
EXTERNALfor external tables (EXTERNAL_TABLE in Hive)MANAGEDfor managed tables (MANAGED_TABLE in Hive)VIEWfor views (VIRTUAL_VIEW in Hive)
CatalogTableType is included when a TreeNode is requested for a JSON representation for...FIXME
Statistics¶
stats: Option[CatalogStatistics] = None
CatalogTable can be given a CatalogStatistics when created. It is undefined (None) by default.
CatalogTable can be displayed using the following commands (when executed with EXTENDED or FORMATTED clause):
- DescribeTableCommand (
DESCRIBE TABLESQL statement) - DescribeColumnCommand (
DESCRIBE TABLEwith a column specified)
The CatalogStatistics can be defined when:
InMemoryCatalogis requested to alterTableStatsHiveExternalCatalogis requested to restore a table metadataHiveClientImplis requested to convertHiveTableToCatalogTable- PruneHiveTablePartitions logical optimization is executed (and requested to update a table metadata)
- PruneFileSourcePartitions logical optimization is executed
The CatalogStatistics is used when:
DataSourceis requested to resolve a Relation (of type FileFormat that uses a CatalogFileIndex)HiveTableRelationis requested to computeStats (with spark.sql.cbo.enabled or spark.sql.cbo.planStats.enabled enabled)LogicalRelationis requested to computeStats (with spark.sql.cbo.enabled or spark.sql.cbo.planStats.enabled enabled)
The CatalogStatistics is updated (altered) when:
AnalyzeColumnCommandis requested to analyzeColumnInCatalogCommandUtilsis requested to updateTableStats, analyzeTableAlterTableAddPartitionCommandis executed
CatalogStatistics is Statistics in toLinkedHashMap.
toLinkedHashMap¶
toLinkedHashMap: LinkedHashMap[String, String]
toLinkedHashMap...FIXME
toLinkedHashMap is used when:
CatalogTableis requested to toString and simpleString- DescribeTableCommand is executed (and describeFormattedTableInfo)
Demo: Accessing Table Metadata¶
Catalog¶
val q = spark.catalog.listTables.filter($"name" === "t1")
scala> q.show
+----+--------+-----------+---------+-----------+
|name|database|description|tableType|isTemporary|
+----+--------+-----------+---------+-----------+
| t1| default| null| MANAGED| false|
+----+--------+-----------+---------+-----------+
SessionCatalog¶
import org.apache.spark.sql.catalyst.catalog.SessionCatalog
val sessionCatalog = spark.sessionState.catalog
assert(sessionCatalog.isInstanceOf[SessionCatalog])
val t1Tid = spark.sessionState.sqlParser.parseTableIdentifier("t1")
val t1Metadata = sessionCatalog.getTempViewOrPermanentTableMetadata(t1Tid)
import org.apache.spark.sql.catalyst.catalog.CatalogTable
assert(t1Metadata.isInstanceOf[CatalogTable])