Skip to content

PythonRDD

PythonRDD is an RDD (RDD[Array[Byte]]) that uses PythonRunner (to compute a partition).

Creating Instance

PythonRDD takes the following to be created:

  • Parent RDD
  • PythonFunction
  • preservePartitoning flag
  • isFromBarrier flag (default: false)

PythonRDD is created when...FIXME

runJob

runJob(
  sc: SparkContext,
  rdd: JavaRDD[Array[Byte]],
  partitions: JArrayList[Int]): Array[Any]

runJob...FIXME

collectAndServe

collectAndServe[T](
  rdd: RDD[T]): Array[Any]

collectAndServe...FIXME

collectAndServeWithJobGroup

collectAndServeWithJobGroup[T](
  rdd: RDD[T],
  groupId: String,
  description: String,
  interruptOnCancel: Boolean): Array[Any]

collectAndServeWithJobGroup...FIXME

serveIterator Utility

serveIterator(
  items: Iterator[_],
  threadName: String): Array[Any]

serveIterator serveToStream with a writer function that...FIXME

serveIterator is used when:

serveToStream Utility

serveToStream(
  threadName: String)(
  writeFunc: OutputStream => Unit): Array[Any]

serveToStream serveToStream with the authHelper and the input arguments.

serveToStream is used when:

  • PythonRDD utility is used to serveIterator
  • Dataset is requested to collectAsArrowToPython

SocketAuthHelper

PythonRDD uses a SocketAuthHelper.