Generate Unary Logical Operator¶
Generate
is a unary logical operator that represents the following high-level operators in logical query plans (among other use cases):
- LATERAL VIEW in
SELECT
orFROM
clauses in SQL - Dataset.explode (deprecated)
- Generator or
GeneratorOuter
expressions (by ExtractGenerator logical evaluation rule)
Creating Instance¶
Generate
takes the following to be created:
- Generator
- Unrequired Child Index (
Seq[Int]
) -
outer
flag - Qualifier
- Generator Output Attributes
- Child Logical Operator
Generate
is created when:
GeneratorBuilder
is requested tobuild
(aGenerate
logical operator)TableFunctionRegistry
is requested to generator- RewriteExceptAll logical optimization is executed (on Except logical operator with isAll enabled)
RewriteIntersectAll
logical optimization is executed (onIntersect
logical operator withisAll
enabled)AstBuilder
is requested to withGenerate- Dataset.explode (deprecated) is used
UserDefinedPythonTableFunction
(PySpark) is requested tobuilder
Catalyst DSL¶
generate(
generator: Generator,
unrequiredChildIndex: Seq[Int] = Nil,
outer: Boolean = false,
alias: Option[String] = None,
outputNames: Seq[String] = Nil): LogicalPlan
Catalyst DSL defines generate operator to create a Generate
logical operator.
import org.apache.spark.sql.catalyst.plans.logical._
import org.apache.spark.sql.types._
val lr = LocalRelation('key.int, 'values.array(StringType))
// JsonTuple generator
import org.apache.spark.sql.catalyst.expressions.JsonTuple
import org.apache.spark.sql.catalyst.dsl.expressions._
import org.apache.spark.sql.catalyst.expressions.Expression
val children: Seq[Expression] = Seq("e")
val json_tuple = JsonTuple(children)
import org.apache.spark.sql.catalyst.dsl.plans._ // <-- gives generate
val plan = lr.generate(
generator = json_tuple,
join = true,
outer = true,
alias = Some("alias"),
outputNames = Seq.empty)
scala> println(plan.numberedTreeString)
00 'Generate json_tuple(e), true, true, alias
01 +- LocalRelation <empty>, [key#0, values#1]
Node Patterns¶
nodePatterns
is a single GENERATE.
Execution Planning¶
Generate
logical operator is resolved to GenerateExec unary physical operator in BasicOperators execution planning strategy.