Skip to content


CatalogTable is the specification (metadata) of a table.

CatalogTable is stored in a SessionCatalog (session-scoped catalog of relational entities).

scala> :type spark.sessionState.catalog

// Using high-level user-friendly catalog interface
scala> spark.catalog.listTables.filter($"name" === "t1").show
|  t1| default|       null|  MANAGED|      false|

// Using low-level internal SessionCatalog interface to access CatalogTables
val t1Tid = spark.sessionState.sqlParser.parseTableIdentifier("t1")
val t1Metadata = spark.sessionState.catalog.getTempViewOrPermanentTableMetadata(t1Tid)
scala> :type t1Metadata

CatalogTable is <> when:

  • SessionCatalog is requested for a table metadata

  • HiveClientImpl is requested for hive/[looking up a table in a metastore]

  • DataFrameWriter is requested to create a table

  • hive/[InsertIntoHiveDirCommand] logical command is executed

  • SparkSqlAstBuilder does[visitCreateTable] and[visitCreateHiveTable]

  • CreateTableLikeCommand logical command is executed

  • CreateViewCommand logical command is <> (and <>)

  • CatalogImpl is requested to createTable

[[simpleString]] The readable text representation of a CatalogTable (aka simpleString) is...FIXME

NOTE: simpleString is used exclusively when ShowTablesCommand logical command is executed (with a partition specification).

[[toString]] CatalogTable uses the following text representation (i.e. toString)...FIXME

CatalogTable is <> with the optional <> that is used for the following:

Creating Instance

CatalogTable takes the following to be created:

  • [[identifier]] TableIdentifier
  • [[tableType]] Table type
  • [[storage]] CatalogStorageFormat
  • [[schema]] Schema
  • [[provider]] Name of the table provider (optional)
  • [[partitionColumnNames]] Partition column names
  • [[bucketSpec]] Optional Bucketing specification (default: None)
  • [[owner]] Owner
  • [[createTime]] Create time
  • [[lastAccessTime]] Last access time
  • [[createVersion]] Create version
  • [[properties]] Properties
  • [[stats]] Optional table statistics
  • [[viewText]] Optional view text
  • [[comment]] Optional comment
  • [[unsupportedFeatures]] Unsupported features
  • [[tracksPartitionsInCatalog]] tracksPartitionsInCatalog flag
  • [[schemaPreservesCase]] schemaPreservesCase flag
  • [[ignoredProperties]] Ignored properties

=== [[CatalogTableType]] Table Type

The type of a table (CatalogTableType) can be one of the following:

  • EXTERNAL for external tables (hive/[EXTERNAL_TABLE] in Hive)
  • MANAGED for managed tables (hive/[MANAGED_TABLE] in Hive)
  • VIEW for views (hive/[VIRTUAL_VIEW] in Hive)

CatalogTableType is included when a TreeNode is requested for a JSON representation for...FIXME

=== [[stats-metadata]] Table Statistics for Query Planning (Auto Broadcast Joins and Cost-Based Optimization)

You manage a table metadata using the Catalog interface. Among the management tasks is to get the <> of a table (that are used for cost-based query optimization).

[source, scala]

scala> t1Metadata.stats.foreach(println) CatalogStatistics(714,Some(2),Map(p1 -> ColumnStat(2,Some(0),Some(1),0,4,4,None), id -> ColumnStat(2,Some(0),Some(1),0,4,4,None)))

scala> 714 bytes, 2 rows

NOTE: The <> are optional when CatalogTable is <>.

CAUTION: FIXME When are stats specified? What if there are not?

Unless <> are available in a table metadata (in a catalog) for a non-streaming file data source table, DataSource creates a HadoopFsRelation with the table size specified by spark.sql.defaultSizeInBytes internal property (default: Long.MaxValue) for query planning of joins (and possibly to auto broadcast the table).

Internally, Spark alters table statistics using ExternalCatalog.doAlterTableStats.

Unless <> are available in a table metadata (in a catalog) for HiveTableRelation (and hive provider) DetermineTableStats logical resolution rule can compute the table size using HDFS (if spark.sql.statistics.fallBackToHdfs property is turned on) or assume spark.sql.defaultSizeInBytes (that effectively disables table broadcasting).

When requested to hive/[look up a table in a metastore], HiveClientImpl hive/[reads table or partition statistics directly from a Hive metastore].

You can use[AnalyzeColumnCommand],[AnalyzePartitionCommand],[AnalyzeTableCommand] commands to record statistics in a catalog.

The table statistics can be automatically updated (after executing commands like AlterTableAddPartitionCommand) when spark.sql.statistics.size.autoUpdate.enabled property is turned on.

You can use DESCRIBE SQL command to show the histogram of a column if stored in a catalog.

=== [[dataSchema]] dataSchema Method

[source, scala]

dataSchema: StructType


NOTE: dataSchema is used when...FIXME

=== [[partitionSchema]] partitionSchema Method

[source, scala]

partitionSchema: StructType


NOTE: partitionSchema is used when...FIXME

=== [[toLinkedHashMap]] Converting Table Specification to LinkedHashMap -- toLinkedHashMap Method

[source, scala]

toLinkedHashMap: mutable.LinkedHashMap[String, String]

toLinkedHashMap converts the table specification to a collection of pairs (LinkedHashMap[String, String]) with the following fields and their values:

  • Database with the database of the <>

  • Table with the table of the <>

  • Owner with the <> (if defined)

  • Created Time with the <>

  • Created By with Spark and the <>

  • Type with the name of the <>

  • Provider with the <> (if defined)

  • Bucket specification (of the BucketSpec if defined)

  • Comment with the <> (if defined)

  • View Text, View Default Database and View Query Output Columns for <>

  • Table Properties with the <> (if not empty)

  • Statistics with the <> (if defined)

  • Storage specification (of the <> if defined)

  • Partition Provider with Catalog if the <> flag is on

  • Partition Columns with the <> (if not empty)

  • Schema with the <> (if not empty)


toLinkedHashMap is used when:

  • DescribeTableCommand is requested to[describeFormattedTableInfo] (when DescribeTableCommand is requested to[run] for a non-temporary table and the[isExtended] flag on)

* CatalogTable is requested for either a <> or a <> text representation

=== [[database]] database Method

[source, scala]

database: String

database simply returns the database (of the <>) or throws an AnalysisException:

table [identifier] did not specify database

NOTE: database is used when...FIXME