PythonRDD¶

PythonRDD is an RDD (RDD[Array[Byte]]) that uses PythonRunner (to compute a partition).

Creating Instance¶

PythonRDD takes the following to be created:

PythonRDD is created when...FIXME

runJob(
  sc: SparkContext,
  rdd: JavaRDD[Array[Byte]],
  partitions: JArrayList[Int]): Array[Any]

runJob...FIXME

collectAndServe[T](
  rdd: RDD[T]): Array[Any]

collectAndServe...FIXME

collectAndServeWithJobGroup[T](
  rdd: RDD[T],
  groupId: String,
  description: String,
  interruptOnCancel: Boolean): Array[Any]

collectAndServeWithJobGroup...FIXME

serveIterator(
  items: Iterator[_],
  threadName: String): Array[Any]

serveIterator serveToStream with a writer function that...FIXME

serveIterator is used when:

PythonRDD utility is used to runJob, collectAndServe and collectAndServeWithJobGroup
Dataset is requested to collectToPython, tailToPython, getRowsToPython

serveToStream(
  threadName: String)(
  writeFunc: OutputStream => Unit): Array[Any]

serveToStream serveToStream with the authHelper and the input arguments.

serveToStream is used when:

PythonRDD uses a SocketAuthHelper.