RowEncoder¶
RowEncoder
is part of the Encoder framework and is used as the encoder of DataFrames (Datasets of Rows).
Note
DataFrame
type is a type alias for Dataset[Row]
that expects an Encoder[Row]
available in scope which is RowEncoder
itself.
Creating ExpressionEncoder of Rows¶
apply(
schema: StructType): ExpressionEncoder[Row]
apply
creates a BoundReference to a nullable field of ObjectType (with Row).
apply
creates a serializer for input objects as the BoundReference
with the given schema.
apply
creates a deserializer for a GetColumnByOrdinal
expression with the given schema.
In the end, apply
creates an ExpressionEncoder for Rows with the serializer and deserializer.
serializerFor¶
serializerFor(
inputObject: Expression,
inputType: DataType): Expression
serializerFor
is a recursive method that decomposes non-primitive (complex) DataTypes (e.g. ArrayType
, MapType
and StructType) to primitives.
For a StructType, creates a CreateNamedStruct with serializer expressions for the (inner) fields.
For native types, serializerFor
returns the given inputObject
expression (that ends recursion).
serializerFor
handles the following DataTypes in a custom way (and the given order):
PythonUserDefinedType
- UserDefinedType
TimestampType
DateType
DayTimeIntervalType
YearMonthIntervalType
DecimalType
StringType
- ArrayType
MapType
Demo¶
import org.apache.spark.sql.types._
val schema = StructType(
StructField("id", LongType, nullable = false) ::
StructField("name", StringType, nullable = false) :: Nil)
import org.apache.spark.sql.catalyst.encoders.RowEncoder
val encoder = RowEncoder(schema)
assert(encoder.flat == false, "RowEncoder is never flat")