Skip to content

SerializeFromObjectExec Unary Physical Operator

SerializeFromObjectExec is a unary physical operator that supports Java code generation.

SerializeFromObjectExec supports Java code generation with the <>, <> and <> methods.

SerializeFromObjectExec is a <>.

SerializeFromObjectExec is <> exclusively when BasicOperators execution planning strategy is executed.

[[inputRDDs]] [[outputPartitioning]] SerializeFromObjectExec uses the <> physical operator when requested for the input RDDs and the <>.

[[output]] SerializeFromObjectExec uses the <> for the <>.

=== [[creating-instance]] Creating SerializeFromObjectExec Instance

SerializeFromObjectExec takes the following when created:

  • [[serializer]] Serializer (as Seq[NamedExpression])
  • [[child]] Child <> (that supports Java code generation)

=== [[doExecute]] Executing Physical Operator (Generating RDD[InternalRow]) -- doExecute Method

[source, scala]

doExecute(): RDD[InternalRow]

doExecute is part of the SparkPlan abstraction.

doExecute requests the <> physical operator to <> (that triggers physical query planning and generates an RDD[InternalRow]) and transforms it by executing the following function on internal rows per partition with index (using RDD.mapPartitionsWithIndexInternal that creates another RDD):

. Creates an UnsafeProjection for the <>

. Requests the UnsafeProjection to initialize (for the partition index)

. Executes the UnsafeProjection on all internal binary rows in the partition

NOTE: doExecute (by RDD.mapPartitionsWithIndexInternal) adds a new MapPartitionsRDD to the RDD lineage. Use RDD.toDebugString to see the additional MapPartitionsRDD.