SimplifyCasts Logical Optimization¶
SimplifyCasts
is a base logical optimization that <
. The input is already the type to cast to. . The input is of ArrayType or MapType
type and contains no null
elements.
SimplifyCasts
is part of the Operator Optimization before Inferring Filters fixed-point batch in the standard batches of the Logical Optimizer.
SimplifyCasts
is simply a <Rule[LogicalPlan]
.
[source, scala]¶
// Case 1. The input is already the type to cast to scala> val ds = spark.range(1) ds: org.apache.spark.sql.Dataset[Long] = [id: bigint]
scala> ds.printSchema root |-- id: long (nullable = false)
scala> ds.selectExpr("CAST (id AS long)").explain(true) ... TRACE SparkOptimizer: === Applying Rule org.apache.spark.sql.catalyst.optimizer.SimplifyCasts === !Project [cast(id#0L as bigint) AS id#7L] Project [id#0L AS id#7L] +- Range (0, 1, step=1, splits=Some(8)) +- Range (0, 1, step=1, splits=Some(8))
TRACE SparkOptimizer: === Applying Rule org.apache.spark.sql.catalyst.optimizer.RemoveAliasOnlyProject === !Project [id#0L AS id#7L] Range (0, 1, step=1, splits=Some(8)) !+- Range (0, 1, step=1, splits=Some(8))
TRACE SparkOptimizer: Fixed point reached for batch Operator Optimizations after 2 iterations. DEBUG SparkOptimizer: === Result of Batch Operator Optimizations === !Project [cast(id#0L as bigint) AS id#7L] Range (0, 1, step=1, splits=Some(8)) !+- Range (0, 1, step=1, splits=Some(8)) ... == Parsed Logical Plan == 'Project [unresolvedalias(cast('id as bigint), None)] +- Range (0, 1, step=1, splits=Some(8))
== Analyzed Logical Plan == id: bigint Project [cast(id#0L as bigint) AS id#7L] +- Range (0, 1, step=1, splits=Some(8))
== Optimized Logical Plan == Range (0, 1, step=1, splits=Some(8))
== Physical Plan == *Range (0, 1, step=1, splits=Some(8))
// Case 2A. The input is of ArrayType
type and contains no null
elements. scala> val intArray = Seq(Array(1)).toDS intArray: org.apache.spark.sql.Dataset[Array[Int]] = [value: array
scala> intArray.printSchema root |-- value: array (nullable = true) | |-- element: integer (containsNull = false)
scala> intArray.map(arr => arr.sum).explain(true) ... TRACE SparkOptimizer: === Applying Rule org.apache.spark.sql.catalyst.optimizer.SimplifyCasts === SerializeFromObject [input[0, int, true] AS value#36] SerializeFromObject [input[0, int, true] AS value#36] +- MapElements
TRACE SparkOptimizer: Fixed point reached for batch Operator Optimizations after 2 iterations. DEBUG SparkOptimizer: === Result of Batch Operator Optimizations === SerializeFromObject [input[0, int, true] AS value#36] SerializeFromObject [input[0, int, true] AS value#36] +- MapElements
== Analyzed Logical Plan == value: int SerializeFromObject [input[0, int, true] AS value#36] +- MapElements
== Optimized Logical Plan == SerializeFromObject [input[0, int, true] AS value#36] +- MapElements
== Physical Plan == *SerializeFromObject [input[0, int, true] AS value#36] +- *MapElements
// Case 2B. The input is of MapType
type and contains no null
elements. scala> val mapDF = Seq(("one", 1), ("two", 2)).toDF("k", "v").withColumn("m", map(col("k"), col("v"))) mapDF: org.apache.spark.sql.DataFrame = [k: string, v: int ... 1 more field]
scala> mapDF.printSchema root |-- k: string (nullable = true) |-- v: integer (nullable = false) |-- m: map (nullable = false) | |-- key: string | |-- value: integer (valueContainsNull = false)
scala> mapDF.selectExpr("""CAST (m AS map
== Analyzed Logical Plan == m: map
== Optimized Logical Plan == LocalRelation [m#272]
== Physical Plan == LocalTableScan [m#272]
=== [[apply]] Executing Rule -- apply
Method
[source, scala]¶
apply(plan: LogicalPlan): LogicalPlan¶
apply
...FIXME
apply
is part of the Rule abstraction.