Skip to content

HiveClientImpl

HiveClientImpl is a HiveClient that uses a Hive metastore client to communicate with a Hive metastore.

Creating Instance

HiveClientImpl takes the following to be created:

When created, HiveClientImpl prints out the following INFO message to the logs:

Warehouse location for Hive client (version [fullVersion]) is [the value of hive.metastore.warehouse.dir]

HiveClientImpl is created when:

Metastore Warehouse Directory

HiveClientImpl is given the directory of the default database of a Hive warehouse.

The directory is the value of hive.metastore.warehouse.dir configuration property (default: /user/hive/warehouse).

Hive Metastore Client

client: Hive

client is a Hive metastore client (for meta data/DDL operations using calls to the metastore).

Creating CatalogStatistics

readHiveStats(
  properties: Map[String, String]): Option[CatalogStatistics]

readHiveStats creates a CatalogStatistics from the input Hive properties (with table and possibly partition parameters). readHiveStats uses the following Hive properties, if available and greater than 0.

Hive Property Table Statistic
totalSize or rawDataSize sizeInBytes
numRows rowCount

readHiveStats is used when:

  • HiveClientImpl is requested for the metadata of a table or partition

convertHiveTableToCatalogTable

convertHiveTableToCatalogTable(
  h: Table): CatalogTable

convertHiveTableToCatalogTable creates a CatalogTable based on the given Hive Table as follows:

CatalogTable Hive Table
Table Statistics readHiveStats
...

convertHiveTableToCatalogTable is used when:

fromHivePartition

fromHivePartition(
  hp: HivePartition): CatalogTablePartition

fromHivePartition...FIXME


fromHivePartition is used when:

Logging

Enable ALL logging level for org.apache.spark.sql.hive.client.HiveClientImpl logger to see what happens inside.

Add the following line to conf/log4j2.properties:

log4j.logger.org.apache.spark.sql.hive.client.HiveClientImpl=ALL

Refer to Logging.