Skip to content

FlowAnalysis Utility

Singleton Object

FlowAnalysis is a Scala object which is a class that has exactly one instance. It is created lazily when it is referenced, like a lazy val.

Learn more in Tour of Scala.

createFlowFunctionFromLogicalPlan

createFlowFunctionFromLogicalPlan(
  plan: LogicalPlan): FlowFunction

createFlowFunctionFromLogicalPlan takes a LogicalPlan (that represents one of the supported logical commands) and creates a FlowFunction.

When executed, this FlowFunction creates a FlowAnalysisContext.

FlowFunction uses this FlowAnalysisContext to set the SQL configs (given to the FlowFunction being defined).

FlowFunction analyze this LogicalPlan (with the FlowAnalysisContext). This gives the result data (as a DataFrame).

In the end, FlowFunction creates a FlowFunctionResult with the result data (as a DataFrame) and the others (from the FlowAnalysisContext).


createFlowFunctionFromLogicalPlan is used when:

Analyze Logical Command

analyze(
  context: FlowAnalysisContext,
  plan: LogicalPlan): DataFrame

CTEs

analyze resolves pipeline-specific TVFs and CTEs.

SELECT ... FROM STREAM(t1)
SELECT ... FROM STREAM t1

Developers can define CTEs within their CREATE statements:

CREATE STREAMING TABLE a
WITH b AS (
   SELECT * FROM STREAM upstream
)
SELECT * FROM b

analyze...FIXME

Read Batch Input

readBatchInput(
  context: FlowAnalysisContext,
  name: String,
  batchReadOptions: BatchReadOptions): DataFrame

readBatchInput...FIXME


readBatchInput is used when:

  • FlowAnalysis is requested to analyze

Read External Batch Input

readExternalBatchInput(
  context: FlowAnalysisContext,
  inputIdentifier: ExternalDatasetIdentifier,
  name: String): DataFrame

readExternalBatchInput...FIXME

Read Stream Input

readStreamInput(
  context: FlowAnalysisContext,
  name: String,
  streamReader: DataStreamReader,
  streamingReadOptions: StreamingReadOptions): DataFrame

readStreamInput resolves the given name (in the given FlowAnalysisContext).

For an InternalDatasetIdentifier (that is defined by the current pipeline), readStreamInput readGraphInput.

For an ExternalDatasetIdentifier (that is external to the current pipeline), readStreamInput readExternalStreamInput.


readStreamInput is used when:

  • FlowAnalysis is requested to analyze

Read External Stream Input

readExternalStreamInput(
  context: FlowAnalysisContext,
  inputIdentifier: ExternalDatasetIdentifier,
  streamReader: DataStreamReader,
  name: String): DataFrame

readExternalStreamInput...FIXME

Read Graph Input

readGraphInput(
  ctx: FlowAnalysisContext,
  inputIdentifier: InternalDatasetIdentifier,
  readOptions: InputReadOptions): DataFrame

Load DataFrame

It is up to the Input (for the given InternalDatasetIdentifier) to load a DataFrame that may either be batch or streaming.

readGraphInput records the input dataset identifier in the given FlowAnalysisContext.

SparkException

For a dataset not defined in the dataflow graph (the given InternalDatasetIdentifier not being available in the FlowAnalysisContext), readGraphInput reports a SparkException.

readGraphInput finds the Input for the given InternalDatasetIdentifier (in the FlowAnalysisContext).

readGraphInput requests the Input to load a DataFrame (with the given InputReadOptions).

readGraphInput records a ResolvedInput in the FlowAnalysisContext (in streamingInputs or batchInputs for StreamingReadOptions or BatchReadOptions, respectively).

In the end, readGraphInput creates a (streaming or batch) Dataset.


readGraphInput is used when:

  • FlowAnalysis is requested to read batch and stream input