Generate Unary Logical Operator¶
Generate is a unary logical operator that represents the following high-level operators in logical query plans (among other use cases):
- LATERAL VIEW in
SELECTorFROMclauses in SQL - Dataset.explode (deprecated)
- Generator or
GeneratorOuterexpressions (by ExtractGenerator logical evaluation rule)
Creating Instance¶
Generate takes the following to be created:
- Generator
- Unrequired Child Index (
Seq[Int]) -
outerflag - Qualifier
- Generator Output Attributes
- Child Logical Operator
Generate is created when:
GeneratorBuilderis requested tobuild(aGeneratelogical operator)TableFunctionRegistryis requested to generator- RewriteExceptAll logical optimization is executed (on Except logical operator with isAll enabled)
RewriteIntersectAlllogical optimization is executed (onIntersectlogical operator withisAllenabled)AstBuilderis requested to withGenerate- Dataset.explode (deprecated) is used
UserDefinedPythonTableFunction(PySpark) is requested tobuilder
Catalyst DSL¶
generate(
generator: Generator,
unrequiredChildIndex: Seq[Int] = Nil,
outer: Boolean = false,
alias: Option[String] = None,
outputNames: Seq[String] = Nil): LogicalPlan
Catalyst DSL defines generate operator to create a Generate logical operator.
import org.apache.spark.sql.catalyst.plans.logical._
import org.apache.spark.sql.types._
val lr = LocalRelation('key.int, 'values.array(StringType))
// JsonTuple generator
import org.apache.spark.sql.catalyst.expressions.JsonTuple
import org.apache.spark.sql.catalyst.dsl.expressions._
import org.apache.spark.sql.catalyst.expressions.Expression
val children: Seq[Expression] = Seq("e")
val json_tuple = JsonTuple(children)
import org.apache.spark.sql.catalyst.dsl.plans._ // <-- gives generate
val plan = lr.generate(
generator = json_tuple,
join = true,
outer = true,
alias = Some("alias"),
outputNames = Seq.empty)
scala> println(plan.numberedTreeString)
00 'Generate json_tuple(e), true, true, alias
01 +- LocalRelation <empty>, [key#0, values#1]
Node Patterns¶
nodePatterns is a single GENERATE.
Execution Planning¶
Generate logical operator is resolved to GenerateExec unary physical operator in BasicOperators execution planning strategy.