DataStreamReader¶
DataStreamReader is an interface that Spark developers use to describe how to load data from a streaming data source.

Accessing DataStreamReader¶
DataStreamReader is available using SparkSession.readStream method.
import org.apache.spark.sql.SparkSession
val spark: SparkSession = ...
val streamReader = spark.readStream
Built-in Formats¶
DataStreamReader supports loading streaming data from the following built-in data sources:
csvjsonorcparquet(default)texttextFile
Tip
Use spark.sql.sources.default configuration property to change the default data source.
Note
Hive data source can only be used with tables, and it is an AnalysisException when specified explicitly.
Loading Data¶
load(): DataFrame
load(
path: String): DataFrame
DataStreamReader gives specialized methods for built-in data systems and formats.
In order to plug in a custom data source, DataStreamReader gives format and load methods.
load creates a streaming DataFrame that represents a "loading" streaming data node (and is internally a logical plan with a StreamingRelationV2 or StreamingRelation leaf logical operators).
load uses DataSource.lookupDataSource utility to look up the data source by source alias.
Tip
Learn more about DataSource.lookupDataSource utility in The Internals of Spark SQL online book.
SupportsRead Tables with MICRO_BATCH_READ or CONTINUOUS_READ¶
For a TableProvider (that is not a FileDataSourceV2), load requests it for a Table.
Tip
Learn more about TableProvider in The Internals of Spark SQL online book.
For a Table with SupportsRead with MICRO_BATCH_READ or CONTINUOUS_READ capabilities, load creates a DataFrame with StreamingRelationV2 leaf logical operator.
Tip
Learn more about Table, SupportsRead and capabilities in The Internals of Spark SQL online book.
If the DataSource is a StreamSourceProvider, load creates the StreamingRelationV2 with a StreamingRelation leaf logical operator.
For other Tables, load creates a DataFrame with a StreamingRelation leaf logical operator.
Other Data Sources¶
load creates a DataFrame with a StreamingRelation leaf logical operator.