Skip to content

DataflowGraphRegistry

DataflowGraphRegistry is a registry of Dataflow Graphs.

Scala object

DataflowGraphRegistry is an object in Scala which means it is a class that has exactly one instance (itself). A Scala object is created lazily when it is referenced for the first time.

Learn more in Tour of Scala.

Demo

import org.apache.spark.sql.connect.pipelines.DataflowGraphRegistry

val graphId = DataflowGraphRegistry.createDataflowGraph(
  defaultCatalog=spark.catalog.currentCatalog(),
  defaultDatabase=spark.catalog.currentDatabase,
  defaultSqlConf=Map.empty)

Dataflow Graphs

dataflowGraphs: ConcurrentHashMap[String, GraphRegistrationContext]

DataflowGraphRegistry creates an empty collection of GraphRegistrationContexts by their UUIDs.

createDataflowGraph

createDataflowGraph(
  defaultCatalog: String,
  defaultDatabase: String,
  defaultSqlConf: Map[String, String]): String

createDataflowGraph...FIXME


createDataflowGraph is used when:

Find Dataflow Graph (or Throw SparkException)

getDataflowGraphOrThrow(
  dataflowGraphId: String): GraphRegistrationContext

getDataflowGraphOrThrow looks up the GraphRegistrationContext for the given dataflowGraphId or throws an SparkException if it does not exist.

Dataflow graph with id [graphId] could not be found

getDataflowGraphOrThrow is used when:

Find Dataflow Graph

getDataflowGraph(
  graphId: String): Option[GraphRegistrationContext]

getDataflowGraph finds the GraphRegistrationContext for the given graphId (in this dataflowGraphs registry).


getDataflowGraph is used when: