GroupingSets Unary Logical Operator¶
GroupingSets
is a spark-sql-LogicalPlan.md#UnaryNode[unary logical operator] that represents SQL's sql/AstBuilder.md#withAggregation[GROUPING SETS] variant of GROUP BY
clause.
val q = sql("""
SELECT customer, year, SUM(sales)
FROM VALUES ("abc", 2017, 30) AS t1 (customer, year, sales)
GROUP BY customer, year
GROUPING SETS ((customer), (year))
""")
scala> println(q.queryExecution.logical.numberedTreeString)
00 'GroupingSets [ArrayBuffer('customer), ArrayBuffer('year)], ['customer, 'year], ['customer, 'year, unresolvedalias('SUM('sales), None)]
01 +- 'SubqueryAlias t1
02 +- 'UnresolvedInlineTable [customer, year, sales], [List(abc, 2017, 30)]
GroupingSets
operator is resolved to an Aggregate
logical operator at <
scala> println(q.queryExecution.analyzed.numberedTreeString)
00 Aggregate [customer#8, year#9, spark_grouping_id#5], [customer#8, year#9, sum(cast(sales#2 as bigint)) AS sum(sales)#4L]
01 +- Expand [List(customer#0, year#1, sales#2, customer#6, null, 1), List(customer#0, year#1, sales#2, null, year#7, 2)], [customer#0, year#1, sales#2, customer#8, year#9, spark_grouping_id#5]
02 +- Project [customer#0, year#1, sales#2, customer#0 AS customer#6, year#1 AS year#7]
03 +- SubqueryAlias t1
04 +- LocalRelation [customer#0, year#1, sales#2]
NOTE: GroupingSets
can only be created using SQL.
NOTE: GroupingSets
is not supported on Structured Streaming's spark-sql-LogicalPlan.md#isStreaming[streaming Datasets].
[[resolved]] GroupingSets
is never resolved (as it can only be converted to an Aggregate
logical operator).
[[output]] The catalyst/QueryPlan.md#output[output schema] of a GroupingSets
are exactly the attributes of <
=== [[analyzer]] Analysis Phase
GroupingSets
operator is resolved at analysis phase in the following logical evaluation rules:
-
ResolveAliases for unresolved aliases in <
>
GroupingSets
operator is resolved to an Aggregate.md[Aggregate] with Expand.md[Expand] logical operators.
[source, scala]¶
val spark: SparkSession = ... // using q from the example above val plan = q.queryExecution.logical
scala> println(plan.numberedTreeString) 00 'GroupingSets [ArrayBuffer('customer), ArrayBuffer('year)], ['customer, 'year], ['customer, 'year, unresolvedalias('SUM('sales), None)] 01 +- 'SubqueryAlias t1 02 +- 'UnresolvedInlineTable [customer, year, sales], [List(abc, 2017, 30)]
// Note unresolvedalias for SUM expression // Note UnresolvedInlineTable and SubqueryAlias
// FIXME Show the evaluation rules to get rid of the unresolvable parts¶
=== [[creating-instance]] Creating GroupingSets Instance
GroupingSets
takes the following when created:
- [[selectedGroupByExprs]] expressions/Expression.md[Expressions] from
GROUPING SETS
clause - [[groupByExprs]] Grouping expressions/Expression.md[expressions] from
GROUP BY
clause - [[child]] Child spark-sql-LogicalPlan.md[logical plan]
- [[aggregations]] Aggregate expressions/NamedExpression.md[named expressions]