InsertIntoDataSourceCommand Logical Command¶

InsertIntoDataSourceCommand is a RunnableCommand that <> (per <> flag).

InsertIntoDataSourceCommand is <> exclusively when DataSourceAnalysis logical resolution is executed (and resolves an <> unary logical operator with a <> on an InsertableRelation).

sql("DROP TABLE IF EXISTS t2")
sql("CREATE TABLE t2(id long)")

val query = "SELECT * FROM RANGE(1)"
// Using INSERT INTO SQL statement so we can access QueryExecution
// DataFrameWriter.insertInto returns no value
val q = sql("INSERT INTO TABLE t2 " + query)
val logicalPlan = q.queryExecution.logical
scala> println(logicalPlan.numberedTreeString)
00 'InsertIntoTable 'UnresolvedRelation `t2`, false, false
01 +- 'Project [*]
02    +- 'UnresolvedTableValuedFunction RANGE, [1]

val analyzedPlan = q.queryExecution.analyzed
scala> println(analyzedPlan.numberedTreeString)
00 InsertIntoHiveTable `default`.`t2`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, false, false, [id#6L]
01 +- Project [id#6L]
02    +- Range (0, 1, step=1, splits=None)

[[innerChildren]] InsertIntoDataSourceCommand returns the <> when requested for the inner nodes (that should be shown as an inner nested tree of this node).

val query = "SELECT * FROM RANGE(1)"
val sqlText = "INSERT INTO TABLE t2 " + query
val plan = spark.sessionState.sqlParser.parsePlan(sqlText)
scala> println(plan.numberedTreeString)
00 'InsertIntoTable 'UnresolvedRelation `t2`, false, false
01 +- 'Project [*]
02    +- 'UnresolvedTableValuedFunction RANGE, [1]

Creating Instance¶

InsertIntoDataSourceCommand takes the following to be created:

[[logicalRelation]] <> leaf logical operator
[[query]] <>
[[overwrite]] overwrite flag

=== [[run]] Executing Logical Command (Inserting or Overwriting Data in InsertableRelation) -- run Method

[source, scala]¶

run( session: SparkSession): Seq[Row]

run is part of the RunnableCommand abstraction.

run takes the InsertableRelation (that is the <> of the <>).

run then <> for the <> and the input SparkSession.

run requests the DataFrame for the <> that in turn is requested for the RDD (of the structured query). run requests the <> for the <>.

With the RDD and the output schema, run creates <> that is the RDD[InternalRow] with the schema applied.

run requests the InsertableRelation to insert or overwrite data.

In the end, since the data in the InsertableRelation has changed, run requests the CacheManager to recacheByPlan with the <>.

NOTE: run requests the SparkSession for <> that is in turn requested for the CacheManager.