HiveTableRelation Leaf Logical Operator¶
HiveTableRelation is a ../LeafNode.md[leaf logical operator] that represents a Hive table in a ../spark-sql-LogicalPlan.md[logical query plan].
HiveTableRelation is <FindDataSourceTable logical evaluation rule is requested to resolve UnresolvedCatalogRelations in a logical plan (for Hive tables).
NOTE: HiveTableRelation can be RelationConversions.md#convert[converted to a HadoopFsRelation] based on spark.sql.hive.convertMetastoreParquet and spark.sql.hive.convertMetastoreOrc properties (and "disappears" from a logical plan when enabled).
HiveTableRelation is <
[[MultiInstanceRelation]] HiveTableRelation is a MultiInstanceRelation.
HiveTableRelation is converted (resolved) to as follows:
-
HiveTableScanExec.md[HiveTableScanExec] physical operator in HiveTableScans.md[HiveTableScans] execution planning strategy
-
InsertIntoHiveTable.md[InsertIntoHiveTable] command in HiveAnalysis.md[HiveAnalysis] logical resolution rule
val tableName = "h1"
// Make the example reproducible
val db = spark.catalog.currentDatabase
import spark.sharedState.{externalCatalog => extCatalog}
extCatalog.dropTable(
db, table = tableName, ignoreIfNotExists = true, purge = true)
// sql("CREATE TABLE h1 (id LONG) USING hive")
import org.apache.spark.sql.types.StructType
spark.catalog.createTable(
tableName,
source = "hive",
schema = new StructType().add($"id".long),
options = Map.empty[String, String])
val h1meta = extCatalog.getTable(db, tableName)
scala> println(h1meta.provider.get)
hive
// Looks like we've got the testing space ready for the experiment
val h1 = spark.table(tableName)
import org.apache.spark.sql.catalyst.dsl.plans._
val plan = table(tableName).insertInto("t2", overwrite = true)
scala> println(plan.numberedTreeString)
00 'InsertIntoTable 'UnresolvedRelation `t2`, true, false
01 +- 'UnresolvedRelation `h1`
// ResolveRelations logical rule first to resolve UnresolvedRelations
import spark.sessionState.analyzer.ResolveRelations
val rrPlan = ResolveRelations(plan)
scala> println(rrPlan.numberedTreeString)
00 'InsertIntoTable 'UnresolvedRelation `t2`, true, false
01 +- 'SubqueryAlias h1
02 +- 'UnresolvedCatalogRelation `default`.`h1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
// FindDataSourceTable logical rule next to resolve UnresolvedCatalogRelations
import org.apache.spark.sql.execution.datasources.FindDataSourceTable
val findTablesRule = new FindDataSourceTable(spark)
val planWithTables = findTablesRule(rrPlan)
// At long last...
// Note HiveTableRelation in the logical plan
scala> println(planWithTables.numberedTreeString)
00 'InsertIntoTable 'UnresolvedRelation `t2`, true, false
01 +- SubqueryAlias h1
02 +- HiveTableRelation `default`.`h1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [id#13L]
The metadata of a HiveTableRelation (in a catalog) has to meet the requirements:
- The database is defined
- The partition schema is of the same type as <
> - The data schema is of the same type as <
>
[[output]] HiveTableRelation has the output attributes made up of <
=== [[computeStats]] Computing Statistics -- computeStats Method
[source, scala]¶
computeStats(): Statistics¶
NOTE: computeStats is part of ../LeafNode.md#computeStats[LeafNode Contract] to compute statistics for ../cost-based-optimization/index.md[cost-based optimizer].
computeStats takes the table statistics from the <
If the table statistics are not available, computeStats reports an IllegalStateException.
table stats must be specified.
Creating Instance¶
HiveTableRelation takes the following when created:
- [[tableMeta]] Table metadata
- [[dataCols]] Columns (as a collection of
AttributeReferences) - [[partitionCols]] Partition columns (as a collection of
AttributeReferences)
=== [[partition-columns]] Partition Columns
When created, HiveTableRelation is given the <
FindDataSourceTable.mdFindDataSourceTable logical evaluation rule creates a HiveTableRelation based on a table specification (from a catalog).
The <
=== [[isPartitioned]] isPartitioned Method
[source, scala]¶
isPartitioned: Boolean¶
isPartitioned is true when there is at least one <
[NOTE]¶
isPartitioned is used when:
-
HiveMetastoreCatalogis requested to HiveMetastoreCatalog.md#convertToLogicalRelation[convert a HiveTableRelation to a LogicalRelation over a HadoopFsRelation] -
RelationConversions.md[RelationConversions] logical posthoc evaluation rule is executed (on a RelationConversions.md#apply-InsertIntoTable[InsertIntoTable])