HiveClientImpl¶
HiveClientImpl is a HiveClient that uses a Hive metastore client to communicate with a Hive metastore.
Creating Instance¶
HiveClientImpl takes the following to be created:
-
HiveVersion - Metastore Warehouse Directory
-
SparkConf(Spark Core) - Hadoop Configuration (
Iterable[Map.Entry[String, String]]) - Extra Configuration (
Map[String, String]) - Init
ClassLoader - IsolatedClientLoader
When created, HiveClientImpl prints out the following INFO message to the logs:
Warehouse location for Hive client (version [fullVersion]) is [the value of hive.metastore.warehouse.dir]
HiveClientImpl is created when:
IsolatedClientLoaderis requested to create a HiveClient
Metastore Warehouse Directory¶
HiveClientImpl is given the directory of the default database of a Hive warehouse.
The directory is the value of hive.metastore.warehouse.dir configuration property (default: /user/hive/warehouse).
Hive Metastore Client¶
client: Hive
client is a Hive metastore client (for meta data/DDL operations using calls to the metastore).
Creating CatalogStatistics¶
readHiveStats(
properties: Map[String, String]): Option[CatalogStatistics]
readHiveStats creates a CatalogStatistics from the input Hive properties (with table and possibly partition parameters). readHiveStats uses the following Hive properties, if available and greater than 0.
| Hive Property | Table Statistic |
|---|---|
totalSize or rawDataSize | sizeInBytes |
numRows | rowCount |
readHiveStats is used when:
convertHiveTableToCatalogTable¶
convertHiveTableToCatalogTable(
h: Table): CatalogTable
convertHiveTableToCatalogTable creates a CatalogTable based on the given Hive Table as follows:
| CatalogTable | Hive Table |
|---|---|
| Table Statistics | readHiveStats |
| ... |
convertHiveTableToCatalogTable is used when:
HiveClientImplis requested to getRawHiveTableOption (and requestsRawHiveTableImpltogetRawHiveTableOption), getTablesByName, getTableOption
fromHivePartition¶
fromHivePartition(
hp: HivePartition): CatalogTablePartition
fromHivePartition...FIXME
fromHivePartition is used when:
HiveClientImplis requested to getPartitionOption, getPartitions, getPartitionsByFilter
Logging¶
Enable ALL logging level for org.apache.spark.sql.hive.client.HiveClientImpl logger to see what happens inside.
Add the following line to conf/log4j2.properties:
log4j.logger.org.apache.spark.sql.hive.client.HiveClientImpl=ALL
Refer to Logging.