FlowAnalysis Utility¶
Singleton Object
FlowAnalysis is a Scala object which is a class that has exactly one instance. It is created lazily when it is referenced, like a lazy val.
Learn more in Tour of Scala.
createFlowFunctionFromLogicalPlan¶
createFlowFunctionFromLogicalPlan(
plan: LogicalPlan): FlowFunction
createFlowFunctionFromLogicalPlan takes a LogicalPlan (that represents one of the supported logical commands) and creates a FlowFunction.
When executed, this FlowFunction creates a FlowAnalysisContext.
FlowFunction uses this FlowAnalysisContext to set the SQL configs (given to the FlowFunction being defined).
FlowFunction analyze this LogicalPlan (with the FlowAnalysisContext). This gives the result data (as a DataFrame).
In the end, FlowFunction creates a FlowFunctionResult with the result data (as a DataFrame) and the others (from the FlowAnalysisContext).
createFlowFunctionFromLogicalPlan is used when:
PipelinesHandleris requested to define a flowSqlGraphRegistrationContextis requested to handle the following queries:
Analyze Logical Command¶
analyze(
context: FlowAnalysisContext,
plan: LogicalPlan): DataFrame
CTEs
analyze resolves pipeline-specific TVFs and CTEs.
SELECT ... FROM STREAM(t1)
SELECT ... FROM STREAM t1
Developers can define CTEs within their CREATE statements:
CREATE STREAMING TABLE a
WITH b AS (
SELECT * FROM STREAM upstream
)
SELECT * FROM b
analyze...FIXME
Read Batch Input¶
readBatchInput(
context: FlowAnalysisContext,
name: String,
batchReadOptions: BatchReadOptions): DataFrame
readBatchInput...FIXME
readBatchInput is used when:
FlowAnalysisis requested to analyze
Read External Batch Input¶
readExternalBatchInput(
context: FlowAnalysisContext,
inputIdentifier: ExternalDatasetIdentifier,
name: String): DataFrame
readExternalBatchInput...FIXME
Read Stream Input¶
readStreamInput(
context: FlowAnalysisContext,
name: String,
streamReader: DataStreamReader,
streamingReadOptions: StreamingReadOptions): DataFrame
readStreamInput resolves the given name (in the given FlowAnalysisContext).
For an InternalDatasetIdentifier (that is defined by the current pipeline), readStreamInput readGraphInput.
For an ExternalDatasetIdentifier (that is external to the current pipeline), readStreamInput readExternalStreamInput.
readStreamInput is used when:
FlowAnalysisis requested to analyze
Read External Stream Input¶
readExternalStreamInput(
context: FlowAnalysisContext,
inputIdentifier: ExternalDatasetIdentifier,
streamReader: DataStreamReader,
name: String): DataFrame
readExternalStreamInput...FIXME
Read Graph Input¶
readGraphInput(
ctx: FlowAnalysisContext,
inputIdentifier: InternalDatasetIdentifier,
readOptions: InputReadOptions): DataFrame
Load DataFrame
It is up to the Input (for the given InternalDatasetIdentifier) to load a DataFrame that may either be batch or streaming.
readGraphInput records the input dataset identifier in the given FlowAnalysisContext.
SparkException
For a dataset not defined in the dataflow graph (the given InternalDatasetIdentifier not being available in the FlowAnalysisContext), readGraphInput reports a SparkException.
readGraphInput finds the Input for the given InternalDatasetIdentifier (in the FlowAnalysisContext).
readGraphInput requests the Input to load a DataFrame (with the given InputReadOptions).
readGraphInput records a ResolvedInput in the FlowAnalysisContext (in streamingInputs or batchInputs for StreamingReadOptions or BatchReadOptions, respectively).
In the end, readGraphInput creates a (streaming or batch) Dataset.
readGraphInput is used when: