PythonRDD¶
PythonRDD is an RDD (RDD[Array[Byte]]) that uses PythonRunner (to compute a partition).
Creating Instance¶
PythonRDD takes the following to be created:
- Parent
RDD - PythonFunction
-
preservePartitoningflag -
isFromBarrierflag (default:false)
PythonRDD is created when...FIXME
runJob¶
runJob(
sc: SparkContext,
rdd: JavaRDD[Array[Byte]],
partitions: JArrayList[Int]): Array[Any]
runJob...FIXME
collectAndServe¶
collectAndServe[T](
rdd: RDD[T]): Array[Any]
collectAndServe...FIXME
collectAndServeWithJobGroup¶
collectAndServeWithJobGroup[T](
rdd: RDD[T],
groupId: String,
description: String,
interruptOnCancel: Boolean): Array[Any]
collectAndServeWithJobGroup...FIXME
serveIterator Utility¶
serveIterator(
items: Iterator[_],
threadName: String): Array[Any]
serveIterator serveToStream with a writer function that...FIXME
serveIterator is used when:
PythonRDDutility is used to runJob, collectAndServe and collectAndServeWithJobGroupDatasetis requested tocollectToPython,tailToPython,getRowsToPython
serveToStream Utility¶
serveToStream(
threadName: String)(
writeFunc: OutputStream => Unit): Array[Any]
serveToStream serveToStream with the authHelper and the input arguments.
serveToStream is used when:
PythonRDDutility is used to serveIteratorDatasetis requested tocollectAsArrowToPython
SocketAuthHelper¶
PythonRDD uses a SocketAuthHelper.