LogicalRelation Leaf Logical Operator¶
LogicalRelation is a leaf logical operator that represents a BaseRelation in a logical query plan.
LogicalRelation is a MultiInstanceRelation.
Creating Instance¶
LogicalRelation takes the following to be created:
- BaseRelation
- Output Schema (
AttributeReferences) - Optional CatalogTable
-
isStreamingflag
LogicalRelation is created using apply factory.
apply Utility¶
apply(
relation: BaseRelation,
isStreaming: Boolean = false): LogicalRelation
apply(
relation: BaseRelation,
table: CatalogTable): LogicalRelation
apply wraps the given BaseRelation into a LogicalRelation (so it could be used in a logical query plan).
apply creates a LogicalRelation for the given BaseRelation (with a CatalogTable and isStreaming flag).
import org.apache.spark.sql.sources.BaseRelation
val baseRelation: BaseRelation = ???
val data = spark.baseRelationToDataFrame(baseRelation)
apply is used when:
SparkSessionis requested for a DataFrame for a BaseRelation- CreateTempViewUsing command is executed
FallBackFileSourceV2logical resolution rule is executed- ResolveSQLOnFile and FindDataSourceTable logical evaluation rules are executed
HiveMetastoreCatalogis requested to convert a HiveTableRelationFileStreamSource(Spark Structured Streaming) is requested togetBatch
refresh¶
refresh(): Unit
refresh is part of LogicalPlan abstraction.
refresh requests the FileIndex (of the HadoopFsRelation) to refresh.
Note
refresh does the work for HadoopFsRelation relations only.
Simple Text Representation¶
simpleString(
maxFields: Int): String
simpleString is part of the QueryPlan abstraction.
simpleString is made up of the output schema (truncated to maxFields) and the relation:
Relation[[output]] [relation]
Demo¶
val q = spark.read.text("README.md")
val logicalPlan = q.queryExecution.logical
scala> println(logicalPlan.simpleString)
Relation[value#2] text
computeStats¶
computeStats(): Statistics
computeStats is part of the LeafNode abstraction.
computeStats takes the optional CatalogTable.
If available, computeStats requests the CatalogTable for the CatalogStatistics that, if available, is requested to toPlanStats (with the planStatsEnabled flag enabled when either spark.sql.cbo.enabled or spark.sql.cbo.planStats.enabled is enabled).
Otherwise, computeStats creates a Statistics with the sizeInBytes only to be the sizeInBytes of the BaseRelation.
Demo¶
The following are two logically-equivalent batch queries described using different Spark APIs: Scala and SQL.
val format = "csv"
val path = "../datasets/people.csv"
val q = spark
.read
.option("header", true)
.format(format)
.load(path)
scala> println(q.queryExecution.logical.numberedTreeString)
00 Relation[id#16,name#17] csv
val q = sql(s"select * from `$format`.`$path`")
scala> println(q.queryExecution.optimizedPlan.numberedTreeString)
00 Relation[_c0#74,_c1#75] csv