DataFrameReader¶
DataFrameReader[T]
is a high-level API for Spark SQL developers to describe the "read path" of a structured query (over rows of T
type).
DataFrameReader
is used to describe an input node in a data processing graph.
DataFrameReader
is used to describe the input data source format to be used to "load" data from a data source (e.g. files, Hive tables, JDBC or Dataset[String]
).
DataFrameReader
merely describes a process of loading a data (load specification) and does not trigger a Spark job (until an action is called unlike DataFrameWriter).
DataFrameReader
is available using SparkSession.read operator.
Creating Instance¶
DataFrameReader
takes the following to be created:
Demo¶
import org.apache.spark.sql.SparkSession
assert(spark.isInstanceOf[SparkSession])
val reader = spark.read
import org.apache.spark.sql.DataFrameReader
assert(reader.isInstanceOf[DataFrameReader])
format¶
format(
source: String): DataFrameReader
format
specifies the input data source format.
Built-in data source formats:
json
csv
parquet
orc
text
jdbc
libsvm
Use spark.sql.sources.default configuration property to specify the default format.
Loading Data¶
load(): DataFrame
load(
path: String): DataFrame
load(
paths: String*): DataFrame
load
loads a dataset from a data source (with optional support for multiple paths
) as an untyped DataFrame.
Internally, load
lookupDataSource for the data source format. load
then branches off per its type (i.e. whether it is of DataSourceV2
marker type or not).
For a "DataSource V2" data source, load
...FIXME
Otherwise, if the source is not a "DataSource V2" data source, load
loadV1Source.
load
throws a AnalysisException
when the source is hive
.
Hive data source can only be used with tables, you can not read files of Hive data source directly.
loadV1Source¶
loadV1Source(
paths: String*): DataFrame
loadV1Source
creates a DataSource and requests it to resolve the underlying relation (as a BaseRelation).
In the end, loadV1Source
requests the SparkSession to create a DataFrame from the BaseRelation.