Skip to content

PandasGroupUtils

PandasGroupUtils utility is used by the following physical operators when executed:

executePython

executePython[T](
  data: Iterator[T],
  output: Seq[Attribute],
  runner: BasePythonRunner[T, ColumnarBatch]): Iterator[InternalRow]

executePython requests the given BasePythonRunner to compute the (partition) data (with the current task's TaskContext and the partition ID).

executePython...FIXME


executePython is used when:

groupAndProject

groupAndProject(
  input: Iterator[InternalRow],
  groupingAttributes: Seq[Attribute],
  inputSchema: Seq[Attribute],
  dedupSchema: Seq[Attribute]): Iterator[(InternalRow, Iterator[InternalRow])]

groupAndProject creates a GroupedIterator for the input iterator (of InternalRows), the groupingAttributes and the inputSchema.

groupAndProject...FIXME


groupAndProject is used when: