Skip to content

CreateDataSourceTableAsSelectCommand Logical Command

CreateDataSourceTableAsSelectCommand is a <> that <> with the data from a <> (AS query).

NOTE: A DataSource table is a Spark SQL native table that uses any data source but Hive (per USING clause).

CreateDataSourceTableAsSelectCommand is <> when DataSourceAnalysis post-hoc logical resolution rule is executed (and resolves a CreateTable.md[CreateTable] logical operator for a Spark table with a <>).

NOTE: CreateDataSourceTableCommand.md[CreateDataSourceTableCommand] is used instead when a CreateTable.md[CreateTable] logical operator is used with no <>.

[source,plaintext]

val ctas = """ CREATE TABLE users USING csv COMMENT 'users table' LOCATION '/tmp/users' AS SELECT * FROM VALUES ((0, "jacek")) """ scala> sql(ctas) ... WARN HiveExternalCatalog: Couldn't find corresponding Hive SerDe for data source provider csv. Persisting data source table default.users into Hive metastore in Spark SQL specific format, which is NOT compatible with Hive.

val plan = sql(ctas).queryExecution.logical.numberedTreeString org.apache.spark.sql.AnalysisException: Table default.users already exists. You need to drop it first.; at org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.run(createDataSourceTables.scala:159) at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResultlzycompute(commands.scala:104) at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102) at org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:115) at org.apache.spark.sql.Dataset.anonfunlogicalPlan1(Dataset.scala:194) at org.apache.spark.sql.Dataset.anonfunwithAction2(Dataset.scala:3370) at org.apache.spark.sql.execution.SQLExecution.anonfunwithNewExecutionId1(SQLExecution.scala:78) at org.apache.spark.sql.execution.SQLExecution.withSQLConfPropagated(SQLExecution.scala:125) at org.apache.spark.sql.execution.SQLExecution.withNewExecutionId(SQLExecution.scala:73) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3370) at org.apache.spark.sql.Dataset.<init>(Dataset.scala:194) at org.apache.spark.sql.Dataset.ofRows(Dataset.scala:79) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642) ... 49 elided


Creating Instance

CreateDataSourceTableAsSelectCommand takes the following to be created:

=== [[run]] Executing Data-Writing Logical Command -- run Method

[source, scala]

run( sparkSession: SparkSession, child: SparkPlan): Seq[Row]


NOTE: run is part of DataWritingCommand.md#run[DataWritingCommand] contract.

run...FIXME

run throws an AssertionError when the tableType of the CatalogTable is VIEW or the provider is undefined.

saveDataIntoTable

saveDataIntoTable(
  session: SparkSession,
  table: CatalogTable,
  tableLocation: Option[URI],
  physicalPlan: SparkPlan,
  mode: SaveMode,
  tableExists: Boolean): BaseRelation

saveDataIntoTable creates a BaseRelation for...FIXME

saveDataIntoTable...FIXME