Skip to content

Generator Expressions

Generator is a <> for Catalyst expressions that can <> zero or more rows given a single input row.

NOTE: Generator corresponds to SQL's sql/[LATERAL VIEW].

[[dataType]] dataType in Generator is simply an ArrayType of <>.

[[foldable]] [[nullable]] Generator is not[foldable] and not[nullable] by default.

[[supportCodegen]] Generator supports Java code generation conditionally, i.e. only when a physical operator is not marked as CodegenFallback.

[[terminate]] Generator uses terminate to inform that there are no more rows to process, clean up code, and additional rows can be made here.

[source, scala]

terminate(): TraversableOnce[InternalRow] = Nil

[[generator-implementations]] .Generators [width="100%",cols="1,2",options="header"] |=== | Name | Description

[[GeneratorOuter]] GeneratorOuter
[[HiveGenericUDTF]] HiveGenericUDTF

| [[Inline]][Inline] | Corresponds to inline and inline_outer functions.


| [[UnresolvedGenerator]][UnresolvedGenerator] a| Represents an unresolved <>.

Created when AstBuilder sql/[creates] Generate unary logical operator for LATERAL VIEW that corresponds to the following:

generatorFunctionName (arg1, arg2, ...)
AS? col1, col2, ...

UnresolvedGenerator is resolved to Generator by ResolveFunctions logical evaluation rule.

| [[UserDefinedGenerator]] UserDefinedGenerator | Used exclusively in the deprecated explode operator |===

[[lateral-view]] [NOTE] ==== You can only have one generator per select clause that is enforced by ExtractGenerator logical evaluation rule, e.g.

scala>$"xs"), explode($"ys")).show
org.apache.spark.sql.AnalysisException: Only one generator allowed per select clause but found 2: explode(xs), explode(ys);
  at org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractGenerator$$anonfun$apply$20.applyOrElse(Analyzer.scala:1670)
  at org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractGenerator$$anonfun$apply$20.applyOrElse(Analyzer.scala:1662)
  at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:62)

If you want to have more than one generator in a structured query you should use LATERAL VIEW which is supported in SQL only, e.g.

[source, scala]

val arrayTuple = (Array(1,2,3), Array("a","b","c")) val ncs = Seq(arrayTuple).toDF("ns", "cs")

scala> +---------+---------+ | ns| cs| +---------+---------+ |[1, 2, 3]|[a, b, c]| +---------+---------+

scala> ncs.createOrReplaceTempView("ncs")

val q = """ SELECT n, c FROM ncs LATERAL VIEW explode(ns) nsExpl AS n LATERAL VIEW explode(cs) csExpl AS c """

scala> sql(q).show +---+---+ | n| c| +---+---+ | 1| a| | 1| b| | 1| c| | 2| a| | 2| b| | 2| c| | 3| a| | 3| b| | 3| c| +---+---+


=== [[contract]] Generator Contract

[source, scala]

package org.apache.spark.sql.catalyst.expressions

trait Generator extends Expression { // only required methods that have no implementation def elementSchema: StructType def eval(input: InternalRow): TraversableOnce[InternalRow] }

.(Subset of) Generator Contract [cols="1,2",options="header",width="100%"] |=== | Method | Description

| [[elementSchema]] elementSchema | Schema of the elements to be generated

[[eval]] eval