HiveClientImpl¶
HiveClientImpl
is a HiveClient that uses a Hive metastore client to communicate with a Hive metastore.
Creating Instance¶
HiveClientImpl
takes the following to be created:
-
HiveVersion
- Metastore Warehouse Directory
-
SparkConf
(Spark Core) - Hadoop Configuration (
Iterable[Map.Entry[String, String]]
) - Extra Configuration (
Map[String, String]
) - Init
ClassLoader
- IsolatedClientLoader
When created, HiveClientImpl
prints out the following INFO message to the logs:
Warehouse location for Hive client (version [fullVersion]) is [the value of hive.metastore.warehouse.dir]
HiveClientImpl
is created when:
IsolatedClientLoader
is requested to create a HiveClient
Metastore Warehouse Directory¶
HiveClientImpl
is given the directory of the default database of a Hive warehouse.
The directory is the value of hive.metastore.warehouse.dir
configuration property (default: /user/hive/warehouse
).
Hive Metastore Client¶
client: Hive
client
is a Hive metastore client (for meta data/DDL operations using calls to the metastore).
Creating CatalogStatistics¶
readHiveStats(
properties: Map[String, String]): Option[CatalogStatistics]
readHiveStats
creates a CatalogStatistics from the input Hive properties
(with table and possibly partition parameters). readHiveStats
uses the following Hive properties, if available and greater than 0.
Hive Property | Table Statistic |
---|---|
totalSize or rawDataSize | sizeInBytes |
numRows | rowCount |
readHiveStats
is used when:
convertHiveTableToCatalogTable¶
convertHiveTableToCatalogTable(
h: Table): CatalogTable
convertHiveTableToCatalogTable
creates a CatalogTable based on the given Hive Table as follows:
CatalogTable | Hive Table |
---|---|
Table Statistics | readHiveStats |
... |
convertHiveTableToCatalogTable
is used when:
HiveClientImpl
is requested to getRawHiveTableOption (and requestsRawHiveTableImpl
togetRawHiveTableOption
), getTablesByName, getTableOption
fromHivePartition¶
fromHivePartition(
hp: HivePartition): CatalogTablePartition
fromHivePartition
...FIXME
fromHivePartition
is used when:
HiveClientImpl
is requested to getPartitionOption, getPartitions, getPartitionsByFilter
Logging¶
Enable ALL
logging level for org.apache.spark.sql.hive.client.HiveClientImpl
logger to see what happens inside.
Add the following line to conf/log4j2.properties
:
log4j.logger.org.apache.spark.sql.hive.client.HiveClientImpl=ALL
Refer to Logging.