RowEncoder¶
RowEncoder is part of the Encoder framework and is used as the encoder of DataFrames (Datasets of Rows).
Note
DataFrame type is a type alias for Dataset[Row] that expects an Encoder[Row] available in scope which is RowEncoder itself.
Creating ExpressionEncoder of Rows¶
apply(
schema: StructType): ExpressionEncoder[Row]
apply creates a BoundReference to a nullable field of ObjectType (with Row).
apply creates a serializer for input objects as the BoundReference with the given schema.
apply creates a deserializer for a GetColumnByOrdinal expression with the given schema.
In the end, apply creates an ExpressionEncoder for Rows with the serializer and deserializer.
serializerFor¶
serializerFor(
inputObject: Expression,
inputType: DataType): Expression
serializerFor is a recursive method that decomposes non-primitive (complex) DataTypes (e.g. ArrayType, MapType and StructType) to primitives.
For a StructType, creates a CreateNamedStruct with serializer expressions for the (inner) fields.
For native types, serializerFor returns the given inputObject expression (that ends recursion).
serializerFor handles the following DataTypes in a custom way (and the given order):
PythonUserDefinedType- UserDefinedType
TimestampTypeDateTypeDayTimeIntervalTypeYearMonthIntervalTypeDecimalTypeStringType- ArrayType
MapType
Demo¶
import org.apache.spark.sql.types._
val schema = StructType(
StructField("id", LongType, nullable = false) ::
StructField("name", StringType, nullable = false) :: Nil)
import org.apache.spark.sql.catalyst.encoders.RowEncoder
val encoder = RowEncoder(schema)
assert(encoder.flat == false, "RowEncoder is never flat")