PythonWorkerFactory¶
Creating Instance¶
PythonWorkerFactory
takes the following to be created:
- Python Executable
- Environment Variables (
Map[String, String]
)
PythonWorkerFactory
is created when SparkEnv
is requested to createPythonWorker
(when BasePythonRunner
is requested to compute a partition).
useDaemon Flag¶
PythonWorkerFactory
uses useDaemon
internal flag that is the value of spark.python.use.daemon configuration property to decide whether to use lighter daemon or non-daemon workers.
useDaemon
flag is used when PythonWorkerFactory
requested to create, stop or release a worker and stop a daemon module.
Python Daemon Module¶
PythonWorkerFactory
uses spark.python.daemon.module configuration property to define the Python Daemon Module.
The Python Daemon Module is used when PythonWorkerFactory
is requested to create and start a daemon module.
Python Worker Module¶
PythonWorkerFactory
uses spark.python.worker.module configuration property to specify the Python Worker Module.
The Python Worker Module is used when PythonWorkerFactory
is requested to create and start a worker.
Creating Python Worker¶
create(): Socket
create
...FIXME
create
is used when SparkEnv
is requested to createPythonWorker
.
Creating Daemon Worker¶
createThroughDaemon(): Socket
createThroughDaemon
...FIXME
createThroughDaemon
is used when PythonWorkerFactory
is requested to create a Python worker (with useDaemon flag enabled).
Starting Python Daemon Process¶
startDaemon(): Unit
startDaemon
...FIXME
Creating Simple Non-Daemon Worker¶
createSimpleWorker(): Socket
createSimpleWorker
...FIXME
createSimpleWorker
is used when PythonWorkerFactory
is requested to create a Python worker (with useDaemon flag disabled).
Logging¶
Enable ALL
logging level for org.apache.spark.api.python.PythonWorkerFactory
logger to see what happens inside.
Add the following line to conf/log4j.properties
:
log4j.logger.org.apache.spark.api.python.PythonWorkerFactory=ALL
Refer to Logging.