PythonArrowOutput¶
PythonArrowOutput
is an extension of the BasePythonRunner abstraction for vectorized (ColumnarBatch) runners.
Scala Definition
trait PythonArrowOutput[OUT <: AnyRef] {
self: BasePythonRunner[_, OUT] =>
// ...
}
Contract¶
Deserializing ColumnarBatch¶
deserializeColumnarBatch(
batch: ColumnarBatch,
schema: StructType): OUT
See:
Used when:
PythonArrowOutput
is requested to newReaderIterator (after a batch is loaded)
Performance Metrics¶
pythonMetrics: Map[String, SQLMetric]
SQLMetric
s (Spark SQL):
pythonNumRowsReceived
pythonDataReceived
Used when:
PythonArrowOutput
is requested to newReaderIterator (after a batch is loaded)
Implementations¶
ApplyInPandasWithStatePythonRunner
- BasicPythonArrowOutput