Skip to content

Generate Unary Logical Operator

Generate is a unary logical operator that represents the following high-level operators in logical query plans (among other use cases):

Creating Instance

Generate takes the following to be created:

Generate is created when:

  • GeneratorBuilder is requested to build (a Generate logical operator)
  • TableFunctionRegistry is requested to generator
  • RewriteExceptAll logical optimization is executed (on Except logical operator with isAll enabled)
  • RewriteIntersectAll logical optimization is executed (on Intersect logical operator with isAll enabled)
  • AstBuilder is requested to withGenerate
  • Dataset.explode (deprecated) is used
  • UserDefinedPythonTableFunction (PySpark) is requested to builder

Catalyst DSL

generate(
  generator: Generator,
  unrequiredChildIndex: Seq[Int] = Nil,
  outer: Boolean = false,
  alias: Option[String] = None,
  outputNames: Seq[String] = Nil): LogicalPlan

Catalyst DSL defines generate operator to create a Generate logical operator.

import org.apache.spark.sql.catalyst.plans.logical._
import org.apache.spark.sql.types._
val lr = LocalRelation('key.int, 'values.array(StringType))

// JsonTuple generator
import org.apache.spark.sql.catalyst.expressions.JsonTuple
import org.apache.spark.sql.catalyst.dsl.expressions._
import org.apache.spark.sql.catalyst.expressions.Expression
val children: Seq[Expression] = Seq("e")
val json_tuple = JsonTuple(children)

import org.apache.spark.sql.catalyst.dsl.plans._  // <-- gives generate
val plan = lr.generate(
  generator = json_tuple,
  join = true,
  outer = true,
  alias = Some("alias"),
  outputNames = Seq.empty)
scala> println(plan.numberedTreeString)
00 'Generate json_tuple(e), true, true, alias
01 +- LocalRelation <empty>, [key#0, values#1]

Node Patterns

TreeNode
nodePatterns: Seq[TreePattern]

nodePatterns is part of the TreeNode abstraction.

nodePatterns is a single GENERATE.

Execution Planning

Generate logical operator is resolved to GenerateExec unary physical operator in BasicOperators execution planning strategy.