DataStreamReader¶
DataStreamReader
is an interface that Spark developers use to describe how to load data from a streaming data source.
Accessing DataStreamReader¶
DataStreamReader
is available using SparkSession.readStream
method.
import org.apache.spark.sql.SparkSession
val spark: SparkSession = ...
val streamReader = spark.readStream
Built-in Formats¶
DataStreamReader
supports loading streaming data from the following built-in data sources:
csv
json
orc
parquet
(default)text
textFile
Tip
Use spark.sql.sources.default
configuration property to change the default data source.
Note
Hive data source can only be used with tables, and it is an AnalysisException
when specified explicitly.
Loading Data¶
load(): DataFrame
load(
path: String): DataFrame
DataStreamReader
gives specialized methods for built-in data systems and formats.
In order to plug in a custom data source, DataStreamReader
gives format
and load
methods.
load
creates a streaming DataFrame
that represents a "loading" streaming data node (and is internally a logical plan with a StreamingRelationV2 or StreamingRelation leaf logical operators).
load
uses DataSource.lookupDataSource
utility to look up the data source by source
alias.
Tip
Learn more about DataSource.lookupDataSource utility in The Internals of Spark SQL online book.
SupportsRead Tables with MICRO_BATCH_READ or CONTINUOUS_READ¶
For a TableProvider
(that is not a FileDataSourceV2
), load
requests it for a Table
.
Tip
Learn more about TableProvider in The Internals of Spark SQL online book.
For a Table
with SupportsRead
with MICRO_BATCH_READ
or CONTINUOUS_READ
capabilities, load
creates a DataFrame
with StreamingRelationV2 leaf logical operator.
Tip
Learn more about Table, SupportsRead and capabilities in The Internals of Spark SQL online book.
If the DataSource
is a StreamSourceProvider, load
creates the StreamingRelationV2
with a StreamingRelation leaf logical operator.
For other Table
s, load
creates a DataFrame
with a StreamingRelation leaf logical operator.
Other Data Sources¶
load
creates a DataFrame
with a StreamingRelation leaf logical operator.