AnalyzeTableCommand Logical Command¶
AnalyzeTableCommand
is a LeafRunnableCommand that computes statistics (and stores them in a metastore).
Creating Instance¶
AnalyzeTableCommand
takes the following to be created:
- Multi-part table identifier
-
noScan
flag (default:true
that indicates whether NOSCAN option was used or not)
AnalyzeTableCommand
is created when:
- ResolveSessionCatalog logical resolution rule is executed (and resolves an AnalyzeTable logical operator with no
PARTITION
s)
Executing Command¶
RunnableCommand
run(
sparkSession: SparkSession): Seq[Row]
run
is part of the RunnableCommand abstraction.
run
returns an empty collection.
Demo¶
AnalyzeTableCommand¶
// Seq((0, 0, "zero"), (1, 1, "one")).toDF("id", "p1", "p2").write.partitionBy("p1", "p2").saveAsTable("t1")
val sqlText = "ANALYZE TABLE t1 COMPUTE STATISTICS NOSCAN"
val plan = spark.sql(sqlText).queryExecution.logical
import org.apache.spark.sql.execution.command.AnalyzeTableCommand
val cmd = plan.asInstanceOf[AnalyzeTableCommand]
scala> println(cmd)
AnalyzeTableCommand `t1`, false
CREATE TABLE¶
sql("CREATE TABLE demo_cbo VALUES (0, 'zero'), (1, 'one') AS t1(id, name)")
scala> sql("desc extended demo_cbo").show(numRows = Integer.MAX_VALUE, truncate = false)
+----------------------------+----------------------------------------------------------+-------+
|col_name |data_type |comment|
+----------------------------+----------------------------------------------------------+-------+
|id |int |null |
|name |string |null |
| | | |
|# Detailed Table Information| | |
|Database |default | |
|Table |demo_cbo | |
|Owner |jacek | |
|Created Time |Sat Feb 12 19:51:15 CET 2022 | |
|Last Access |UNKNOWN | |
|Created By |Spark 3.2.1 | |
|Type |MANAGED | |
|Provider |hive | |
|Table Properties |[transient_lastDdlTime=1644691875] | |
|Statistics |13 bytes | |
|Location |file:/Users/jacek/dev/oss/spark/spark-warehouse/demo_cbo | |
|Serde Library |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | |
|InputFormat |org.apache.hadoop.mapred.TextInputFormat | |
|OutputFormat |org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat| |
|Storage Properties |[serialization.format=1] | |
|Partition Provider |Catalog | |
+----------------------------+----------------------------------------------------------+-------+
ANALYZE TABLE COMPUTE STATISTICS¶
sql("ANALYZE TABLE demo_cbo COMPUTE STATISTICS")
Note the extra Statistics
row.
scala> sql("desc extended demo_cbo").show(numRows = Integer.MAX_VALUE, truncate = false)
+----------------------------+----------------------------------------------------------+-------+
|col_name |data_type |comment|
+----------------------------+----------------------------------------------------------+-------+
|id |int |null |
|name |string |null |
| | | |
|# Detailed Table Information| | |
|Database |default | |
|Table |demo_cbo | |
|Owner |jacek | |
|Created Time |Sat Feb 12 19:51:15 CET 2022 | |
|Last Access |UNKNOWN | |
|Created By |Spark 3.2.1 | |
|Type |MANAGED | |
|Provider |hive | |
|Table Properties |[transient_lastDdlTime=1644691915] | |
|Statistics |13 bytes, 2 rows | |
|Location |file:/Users/jacek/dev/oss/spark/spark-warehouse/demo_cbo | |
|Serde Library |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | |
|InputFormat |org.apache.hadoop.mapred.TextInputFormat | |
|OutputFormat |org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat| |
|Storage Properties |[serialization.format=1] | |
|Partition Provider |Catalog | |
+----------------------------+----------------------------------------------------------+-------+
FOR ALL COLUMNS¶
sql("ANALYZE TABLE demo_cbo COMPUTE STATISTICS FOR ALL COLUMNS")
// Use describeColName
scala> sql("desc extended demo_cbo id").show(numRows = Integer.MAX_VALUE, truncate = false)
+--------------+----------+
|info_name |info_value|
+--------------+----------+
|col_name |id |
|data_type |int |
|comment |NULL |
|min |0 |
|max |1 |
|num_nulls |0 |
|distinct_count|2 |
|avg_col_len |4 |
|max_col_len |4 |
|histogram |NULL |
+--------------+----------+
scala> sql("desc extended demo_cbo name").show(numRows = Integer.MAX_VALUE, truncate = false)
+--------------+----------+
|info_name |info_value|
+--------------+----------+
|col_name |name |
|data_type |string |
|comment |NULL |
|min |NULL |
|max |NULL |
|num_nulls |0 |
|distinct_count|2 |
|avg_col_len |4 |
|max_col_len |4 |
|histogram |NULL |
+--------------+----------+