Skip to content

CacheTableAsSelectExec Physical Operator

CacheTableAsSelectExec is a BaseCacheTableExec physical operator that represents CACHE TABLE SQL command (as a CacheTableAsSelect logical operator) at execution.

CACHE [LAZY] TABLE identifierReference
  [OPTIONS key=value (, key=value)*]
  [[AS] query]

When executed, CacheTableAsSelectExec uses CreateViewCommand logical operator followed by SparkSession.table operator to create a LogicalPlan to cache.

In other words, CacheTableAsSelectExec is a shorter version (shortcut) of executing CREATE VIEW SQL command (or the corresponding Dataset operators, e.g. Dataset.createTempView) followed by CACHE TABLE (that boils down to requesting the session-wide CacheManager to cache this LogicalPlan to cache).

Creating Instance

CacheTableAsSelectExec takes the following to be created:

  • The name of the temporary view
  • The LogicalPlan of the query
  • Original SQL Text
  • isLazy flag
  • Options (Map[String, String])
  • Referred temporary functions (Seq[String])

CacheTableAsSelectExec is created when:

  • DataSourceV2Strategy execution planning strategy is executed (to plan a CacheTableAsSelect logical operator)

Relation Name

BaseCacheTableExec
relationName: String

relationName is part of the BaseCacheTableExec abstraction.

relationName is this name of the temporary view.

LogicalPlan to Cache

BaseCacheTableExec
planToCache: LogicalPlan

planToCache is part of the BaseCacheTableExec abstraction.

Lazy Value

planToCache is a Scala lazy value to guarantee that the code to initialize it is executed once only (when accessed for the first time) and the computed value never changes afterwards.

Learn more in the Scala Language Specification.

planToCache creates a CreateViewCommand logical operator that is immediately executed.

CreateViewCommand
CreateViewCommand Value
Table name this name
Original Text this original text
Logical query plan this query
ViewType LocalTempView

In the end, planToCache requests the dataFrameForCachedPlan for the logical plan.

dataFrameForCachedPlan

BaseCacheTableExec
dataFrameForCachedPlan: DataFrame

dataFrameForCachedPlan is part of the BaseCacheTableExec abstraction.

Lazy Value

dataFrameForCachedPlan is a Scala lazy value to guarantee that the code to initialize it is executed once only (when accessed for the first time) and the computed value never changes afterwards.

Learn more in the Scala Language Specification.

dataFrameForCachedPlan uses SparkSession.table operator to create a DataFrame that represents loading data from the temporary view.