LogicalRelation Leaf Logical Operator¶
LogicalRelation
is a leaf logical operator that represents a BaseRelation in a logical query plan.
LogicalRelation
is a MultiInstanceRelation.
Creating Instance¶
LogicalRelation
takes the following to be created:
- BaseRelation
- Output Schema (
AttributeReference
s) - Optional CatalogTable
-
isStreaming
flag
LogicalRelation
is created using apply factory.
apply Utility¶
apply(
relation: BaseRelation,
isStreaming: Boolean = false): LogicalRelation
apply(
relation: BaseRelation,
table: CatalogTable): LogicalRelation
apply
wraps the given BaseRelation into a LogicalRelation
(so it could be used in a logical query plan).
apply
creates a LogicalRelation for the given BaseRelation (with a CatalogTable and isStreaming
flag).
import org.apache.spark.sql.sources.BaseRelation
val baseRelation: BaseRelation = ???
val data = spark.baseRelationToDataFrame(baseRelation)
apply
is used when:
SparkSession
is requested for a DataFrame for a BaseRelation- CreateTempViewUsing command is executed
FallBackFileSourceV2
logical resolution rule is executed- ResolveSQLOnFile and FindDataSourceTable logical evaluation rules are executed
HiveMetastoreCatalog
is requested to convert a HiveTableRelationFileStreamSource
(Spark Structured Streaming) is requested togetBatch
refresh¶
refresh(): Unit
refresh
is part of LogicalPlan abstraction.
refresh
requests the FileIndex (of the HadoopFsRelation) to refresh.
Note
refresh
does the work for HadoopFsRelation relations only.
Simple Text Representation¶
simpleString(
maxFields: Int): String
simpleString
is part of the QueryPlan abstraction.
simpleString
is made up of the output schema (truncated to maxFields
) and the relation:
Relation[[output]] [relation]
Demo¶
val q = spark.read.text("README.md")
val logicalPlan = q.queryExecution.logical
scala> println(logicalPlan.simpleString)
Relation[value#2] text
computeStats¶
computeStats(): Statistics
computeStats
is part of the LeafNode abstraction.
computeStats
takes the optional CatalogTable.
If available, computeStats
requests the CatalogTable
for the CatalogStatistics that, if available, is requested to toPlanStats (with the planStatsEnabled
flag enabled when either spark.sql.cbo.enabled or spark.sql.cbo.planStats.enabled is enabled).
Otherwise, computeStats
creates a Statistics with the sizeInBytes
only to be the sizeInBytes of the BaseRelation.
Demo¶
The following are two logically-equivalent batch queries described using different Spark APIs: Scala and SQL.
val format = "csv"
val path = "../datasets/people.csv"
val q = spark
.read
.option("header", true)
.format(format)
.load(path)
scala> println(q.queryExecution.logical.numberedTreeString)
00 Relation[id#16,name#17] csv
val q = sql(s"select * from `$format`.`$path`")
scala> println(q.queryExecution.optimizedPlan.numberedTreeString)
00 Relation[_c0#74,_c1#75] csv