Skip to content

DescribeColumnCommand Logical Command

DescribeColumnCommand is a[logical command] for[DESCRIBE TABLE] SQL command with a single column only (i.e. no PARTITION specification).


[source, scala]

// Make the example reproducible val tableName = "t1" import org.apache.spark.sql.catalyst.TableIdentifier val tableId = TableIdentifier(tableName)

val sessionCatalog = spark.sessionState.catalog sessionCatalog.dropTable(tableId, ignoreIfNotExists = true, purge = true)

val df = Seq((0, 0.0, "zero"), (1, 1.4, "one")).toDF("id", "p1", "p2") df.write.saveAsTable("t1")

// DescribeColumnCommand represents DESC EXTENDED tableName colName SQL command val descExtSQL = "DESC EXTENDED t1 p1" val plan = spark.sql(descExtSQL).queryExecution.logical import org.apache.spark.sql.execution.command.DescribeColumnCommand val cmd = plan.asInstanceOf[DescribeColumnCommand] scala> println(cmd) DescribeColumnCommand t1, [p1], true

scala> spark.sql(descExtSQL).show +--------------+----------+ | info_name|info_value| +--------------+----------+ | col_name| p1| | data_type| double| | comment| NULL| | min| NULL| | max| NULL| | num_nulls| NULL| |distinct_count| NULL| | avg_col_len| NULL| | max_col_len| NULL| | histogram| NULL| +--------------+----------+

// Run ANALYZE TABLE...FOR COLUMNS SQL command to compute the column statistics val allCols = df.columns.mkString(",") val analyzeTableSQL = s"ANALYZE TABLE $tableName COMPUTE STATISTICS FOR COLUMNS $allCols" spark.sql(analyzeTableSQL)

scala> spark.sql(descExtSQL).show +--------------+----------+ | info_name|info_value| +--------------+----------+ | col_name| p1| | data_type| double| | comment| NULL| | min| 0.0| | max| 1.4| | num_nulls| 0| |distinct_count| 2| | avg_col_len| 8| | max_col_len| 8| | histogram| NULL| +--------------+----------+

[[output]] DescribeColumnCommand defines the[output schema] with the following columns:

  • info_name with "name of the column info" comment
  • info_value with "value of the column info" comment

describeTable Labeled Alternative

DescribeColumnCommand is described by describeTable labeled alternative in statement expression in SqlBaseParser.g4 and parsed using SparkSqlParser.

=== [[run]] Executing Logical Command (Describing Column with Optional Statistics) -- run Method

[source, scala]

run(session: SparkSession): Seq[Row]

NOTE: run is part of <> to execute (run) a logical command.

run resolves the <> in <> and makes sure that it is a "flat" field (i.e. not of a nested data type).

run requests the SessionCatalog for the table metadata.

NOTE: run uses the input SparkSession to access[SessionState] that in turn is used to access the[SessionCatalog].

run takes the[column statistics] from the table statistics if available.

NOTE:[Column statistics] are available (in the table statistics) only after[ANALYZE TABLE FOR COLUMNS] SQL command was run.

run adds comment metadata if available for the <>.

run gives the following rows (in that order):

. col_name . data_type . comment

If DescribeColumnCommand command was executed with <>, run gives the following additional rows (in that order):

. min . max . num_nulls . distinct_count . avg_col_len . max_col_len . <>

run gives NULL for the value of the comment and statistics if not available.

=== [[histogramDescription]] histogramDescription Internal Method

[source, scala]

histogramDescription(histogram: Histogram): Seq[Row]


NOTE: histogramDescription is used exclusively when DescribeColumnCommand is <> with EXTENDED or FORMATTED option turned on.

=== [[creating-instance]] Creating DescribeColumnCommand Instance

DescribeColumnCommand takes the following when created:

  • [[table]] TableIdentifier
  • [[colNameParts]] Column name
  • [[isExtended]] isExtended flag that indicates whether[EXTENDED or FORMATTED option] was used or not