DecimalAggregates Logical Optimization¶
DecimalAggregates
is a base logical optimization that <Sum
and Average
aggregate functions on fixed-precision DecimalType
values to use UnscaledValue
(unscaled Long) values in spark-sql-Expression-WindowExpression.md[WindowExpression] and AggregateExpression expressions.
DecimalAggregates
is part of the Decimal Optimizations fixed-point batch in the standard batches of the Logical Optimizer.
DecimalAggregates
is simply a <Rule[LogicalPlan]
.
[TIP]¶
Import DecimalAggregates
and apply the rule directly on your structured queries to learn how the rule works.
[source, scala]¶
import org.apache.spark.sql.catalyst.optimizer.DecimalAggregates val da = DecimalAggregates(spark.sessionState.conf)
// Build analyzed logical plan // with sum aggregate function and Decimal field import org.apache.spark.sql.types.DecimalType val query = spark.range(5).select(sum($"id" cast DecimalType(1,0)) as "sum") scala> val plan = query.queryExecution.analyzed plan: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan = Aggregate [sum(cast(id#91L as decimal(1,0))) AS sum#95] +- Range (0, 5, step=1, splits=Some(8))
// Apply DecimalAggregates rule // Note MakeDecimal and UnscaledValue operators scala> da.apply(plan) res27: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan = Aggregate [MakeDecimal(sum(UnscaledValue(cast(id#91L as decimal(1,0)))),11,0) AS sum#95] +- Range (0, 5, step=1, splits=Some(8))
====
=== [[example-sum-decimal]] Example: sum Aggregate Function on Decimal with Precision Smaller Than 9
[source, scala]¶
// sum aggregate with Decimal field with precision <= 8 val q = "SELECT sum(cast(id AS DECIMAL(5,0))) FROM range(1)"
scala> sql(q).explain(true) == Parsed Logical Plan == 'Project [unresolvedalias('sum(cast('id as decimal(5,0))), None)] +- 'UnresolvedTableValuedFunction range, [1]
== Analyzed Logical Plan == sum(CAST(id AS DECIMAL(5,0))): decimal(15,0) Aggregate [sum(cast(id#104L as decimal(5,0))) AS sum(CAST(id AS DECIMAL(5,0)))#106] +- Range (0, 1, step=1, splits=None)
== Optimized Logical Plan == Aggregate [MakeDecimal(sum(UnscaledValue(cast(id#104L as decimal(5,0)))),15,0) AS sum(CAST(id AS DECIMAL(5,0)))#106] +- Range (0, 1, step=1, splits=None)
== Physical Plan == *HashAggregate(keys=[], functions=[sum(UnscaledValue(cast(id#104L as decimal(5,0))))], output=[sum(CAST(id AS DECIMAL(5,0)))#106]) +- Exchange SinglePartition +- *HashAggregate(keys=[], functions=[partial_sum(UnscaledValue(cast(id#104L as decimal(5,0))))], output=[sum#108L]) +- *Range (0, 1, step=1, splits=None)
=== [[example-avg-decimal]] Example: avg Aggregate Function on Decimal with Precision Smaller Than 12
[source, scala]¶
// avg aggregate with Decimal field with precision <= 11 val q = "SELECT avg(cast(id AS DECIMAL(10,0))) FROM range(1)"
scala> val q = "SELECT avg(cast(id AS DECIMAL(10,0))) FROM range(1)" q: String = SELECT avg(cast(id AS DECIMAL(10,0))) FROM range(1)
scala> sql(q).explain(true) == Parsed Logical Plan == 'Project [unresolvedalias('avg(cast('id as decimal(10,0))), None)] +- 'UnresolvedTableValuedFunction range, [1]
== Analyzed Logical Plan == avg(CAST(id AS DECIMAL(10,0))): decimal(14,4) Aggregate [avg(cast(id#115L as decimal(10,0))) AS avg(CAST(id AS DECIMAL(10,0)))#117] +- Range (0, 1, step=1, splits=None)
== Optimized Logical Plan == Aggregate [cast((avg(UnscaledValue(cast(id#115L as decimal(10,0)))) / 1.0) as decimal(14,4)) AS avg(CAST(id AS DECIMAL(10,0)))#117] +- Range (0, 1, step=1, splits=None)
== Physical Plan == *HashAggregate(keys=[], functions=[avg(UnscaledValue(cast(id#115L as decimal(10,0))))], output=[avg(CAST(id AS DECIMAL(10,0)))#117]) +- Exchange SinglePartition +- *HashAggregate(keys=[], functions=[partial_avg(UnscaledValue(cast(id#115L as decimal(10,0))))], output=[sum#120, count#121L]) +- *Range (0, 1, step=1, splits=None)
Executing Rule¶
apply(plan: LogicalPlan): LogicalPlan
apply
...FIXME
apply
is part of the Rule abstraction.