PythonRDD¶
PythonRDD
is an RDD
(RDD[Array[Byte]]
) that uses PythonRunner (to compute a partition).
Creating Instance¶
PythonRDD
takes the following to be created:
- Parent
RDD
- PythonFunction
-
preservePartitoning
flag -
isFromBarrier
flag (default:false
)
PythonRDD
is created when...FIXME
runJob¶
runJob(
sc: SparkContext,
rdd: JavaRDD[Array[Byte]],
partitions: JArrayList[Int]): Array[Any]
runJob
...FIXME
collectAndServe¶
collectAndServe[T](
rdd: RDD[T]): Array[Any]
collectAndServe
...FIXME
collectAndServeWithJobGroup¶
collectAndServeWithJobGroup[T](
rdd: RDD[T],
groupId: String,
description: String,
interruptOnCancel: Boolean): Array[Any]
collectAndServeWithJobGroup
...FIXME
serveIterator Utility¶
serveIterator(
items: Iterator[_],
threadName: String): Array[Any]
serveIterator
serveToStream with a writer function that...FIXME
serveIterator
is used when:
PythonRDD
utility is used to runJob, collectAndServe and collectAndServeWithJobGroupDataset
is requested tocollectToPython
,tailToPython
,getRowsToPython
serveToStream Utility¶
serveToStream(
threadName: String)(
writeFunc: OutputStream => Unit): Array[Any]
serveToStream
serveToStream with the authHelper and the input arguments.
serveToStream
is used when:
PythonRDD
utility is used to serveIteratorDataset
is requested tocollectAsArrowToPython
SocketAuthHelper¶
PythonRDD
uses a SocketAuthHelper.