PythonGatewayServer¶
PythonGatewayServer
is a command-line application (process) that starts a Py4JServer on an ephemeral port.
PythonGatewayServer
is the Python runner for pyspark
shell script (Spark Core).
main¶
main
creates a Py4JServer and requests it to start.
main
requests the Py4JServer
for the listening port (boundPort) and prints out the following DEBUG message to the logs:
Started PythonGatewayServer on port [boundPort]
main
uses _PYSPARK_DRIVER_CONN_INFO_PATH environment variable for the path of a connection info file (for the associated python process) with the listening port and the secret.
main
pauses (blocks) until the Python driver finishes (by reading from the system input that blocks until input data is available, the end of the stream is detected, or an exception is thrown).
In the end, once the Python driver finishes, main
prints out the following DEBUG message to the logs:
Exiting due to broken pipe from Python driver
main
prints out the following ERROR message to the logs and exists when the listening port is -1
:
[server] failed to bind; exiting
_PYSPARK_DRIVER_CONN_INFO_PATH¶
PythonGatewayServer
uses _PYSPARK_DRIVER_CONN_INFO_PATH
environment variable for the path of a connection info file for communication between this and the Python processes.
_PYSPARK_DRIVER_CONN_INFO_PATH
is configured when java_gateway.py module is requested to launch_gateway.
Logging¶
Enable ALL
logging level for org.apache.spark.api.python.PythonGatewayServer
logger to see what happens inside.
Add the following line to conf/log4j2.properties
:
logger.PythonGatewayServer.name = org.apache.spark.api.python.PythonGatewayServer
logger.PythonGatewayServer.level = all
Refer to Logging.