Skip to content

LogicalRelation Leaf Logical Operator

LogicalRelation is a leaf logical operator that represents a BaseRelation in a logical query plan.

LogicalRelation is a MultiInstanceRelation.

Creating Instance

LogicalRelation takes the following to be created:

LogicalRelation is created using apply factory.

apply Utility

apply(
  relation: BaseRelation,
  isStreaming: Boolean = false): LogicalRelation
apply(
  relation: BaseRelation,
  table: CatalogTable): LogicalRelation

apply wraps the given BaseRelation into a LogicalRelation (so it could be used in a logical query plan).

apply creates a LogicalRelation for the given BaseRelation (with a CatalogTable and isStreaming flag).

import org.apache.spark.sql.sources.BaseRelation
val baseRelation: BaseRelation = ???

val data = spark.baseRelationToDataFrame(baseRelation)

apply is used when:

refresh

refresh(): Unit

refresh is part of LogicalPlan abstraction.

refresh requests the FileIndex (of the HadoopFsRelation) to refresh.

Note

refresh does the work for HadoopFsRelation relations only.

Simple Text Representation

simpleString(
  maxFields: Int): String

simpleString is part of the QueryPlan abstraction.

simpleString is made up of the output schema (truncated to maxFields) and the relation:

Relation[[output]] [relation]

Demo

val q = spark.read.text("README.md")
val logicalPlan = q.queryExecution.logical

scala> println(logicalPlan.simpleString)
Relation[value#2] text

computeStats

computeStats(): Statistics

computeStats is part of the LeafNode abstraction.


computeStats takes the optional CatalogTable.

If available, computeStats requests the CatalogTable for the CatalogStatistics that, if available, is requested to toPlanStats (with the planStatsEnabled flag enabled when either spark.sql.cbo.enabled or spark.sql.cbo.planStats.enabled is enabled).

Otherwise, computeStats creates a Statistics with the sizeInBytes only to be the sizeInBytes of the BaseRelation.

Demo

The following are two logically-equivalent batch queries described using different Spark APIs: Scala and SQL.

val format = "csv"
val path = "../datasets/people.csv"
val q = spark
  .read
  .option("header", true)
  .format(format)
  .load(path)
scala> println(q.queryExecution.logical.numberedTreeString)
00 Relation[id#16,name#17] csv
val q = sql(s"select * from `$format`.`$path`")
scala> println(q.queryExecution.optimizedPlan.numberedTreeString)
00 Relation[_c0#74,_c1#75] csv