ExpressionEncoder¶
ExpressionEncoder[T]
is the only built-in Encoder.
Important
ExpressionEncoder
is the only supported Encoder which is enforced when Dataset
is created (even though Dataset
data structure accepts a bare Encoder[T]
).
Creating Instance¶
ExpressionEncoder
takes the following to be created:
- Expression for object serialization
- Expression for object deserialization
-
ClassTag[T]
(Scala)
ExpressionEncoder
is created when:
Encoders
utility is used for genericSerializerExpressionEncoder
utility is used to create one, javaBean and tupleRowEncoder
utility is used to create one
Serializer¶
serializer: Seq[NamedExpression]
ExpressionEncoder
creates the serializer
(to be NamedExpressions) when created.
Encoders Utility¶
Encoders utility contains the ExpressionEncoder
for Scala and Java primitive types, e.g. boolean
, long
, String
, java.sql.Date
, java.sql.Timestamp
, Array[Byte]
.
Demo¶
import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
val stringEncoder = ExpressionEncoder[String]
scala> val row = stringEncoder.toRow("hello world")
row: org.apache.spark.sql.catalyst.InternalRow = [0,100000000b,6f77206f6c6c6568,646c72]
import org.apache.spark.sql.catalyst.expressions.UnsafeRow
scala> val unsafeRow = row match { case ur: UnsafeRow => ur }
unsafeRow: org.apache.spark.sql.catalyst.expressions.UnsafeRow = [0,100000000b,6f77206f6c6c6568,646c72]
Creating ExpressionEncoder¶
apply[T : TypeTag](): ExpressionEncoder[T]
apply
creates an ExpressionEncoder
with the following:
Creating ExpressionEncoder for Scala Tuples¶
tuple[T](
e: ExpressionEncoder[T]): ExpressionEncoder[Tuple1[T]]
tuple[T1, T2](
e1: ExpressionEncoder[T1],
e2: ExpressionEncoder[T2]): ExpressionEncoder[(T1, T2)]
tuple[T1, T2, T3](
e1: ExpressionEncoder[T1],
e2: ExpressionEncoder[T2],
e3: ExpressionEncoder[T3]): ExpressionEncoder[(T1, T2, T3)]
tuple[T1, T2, T3, T4](
e1: ExpressionEncoder[T1],
e2: ExpressionEncoder[T2],
e3: ExpressionEncoder[T3],
e4: ExpressionEncoder[T4]): ExpressionEncoder[(T1, T2, T3, T4)]
tuple[T1, T2, T3, T4, T5](
e1: ExpressionEncoder[T1],
e2: ExpressionEncoder[T2],
e3: ExpressionEncoder[T3],
e4: ExpressionEncoder[T4],
e5: ExpressionEncoder[T5]): ExpressionEncoder[(T1, T2, T3, T4, T5)]
tuple(
encoders: Seq[ExpressionEncoder[_]]): ExpressionEncoder[_]
tuple
...FIXME
tuple
is used when:
Dataset
is requested to selectUntyped, select, joinWithKeyValueGroupedDataset
is requested to aggUntypedEncoders
utility is used to tupleReduceAggregator
is requested forbufferEncoder
Creating ExpressionEncoder for Java Bean¶
javaBean[T](
beanClass: Class[T]): ExpressionEncoder[T]
javaBean
...FIXME
javaBean
is used when:
Encoders
utility is used to bean
resolveAndBind¶
resolveAndBind(
attrs: Seq[Attribute] = schema.toAttributes,
analyzer: Analyzer = SimpleAnalyzer): ExpressionEncoder[T]
resolveAndBind
creates a deserializer for a LocalRelation with the given Attributes (to create a dummy query plan).
resolveAndBind
...FIXME
resolveAndBind
is used when:
Dataset
is requested for resolvedEnc- others
Demo¶
case class Person(id: Long, name: String)
import org.apache.spark.sql.Encoders
val schema = Encoders.product[Person].schema
import org.apache.spark.sql.catalyst.encoders.{RowEncoder, ExpressionEncoder}
import org.apache.spark.sql.Row
val encoder: ExpressionEncoder[Row] = RowEncoder.apply(schema).resolveAndBind()
val deserializer = encoder.deserializer
import org.apache.spark.sql.catalyst.InternalRow
val input = InternalRow(1, "Jacek")
scala> deserializer.eval(input)
java.lang.UnsupportedOperationException: Only code-generated evaluation is supported
at org.apache.spark.sql.catalyst.expressions.objects.CreateExternalRow.eval(objects.scala:1105)
... 54 elided
import org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext
val ctx = new CodegenContext
val code = deserializer.genCode(ctx).code