Spark Python QA

Q: PySpark: java.lang.OutofMemoryError: Java heap space

1
PySpark: java.lang.OutofMemoryError: Java heap space
A
1
spark_conf.setAppName("recommend").setMaster("local[*]").set('spark.executor.memory', '12g').set('spark.driver.memory', '14g')

Q: Please install psutil to have better support with spilling

1
UserWarning: Please install psutil to have better support with spilling
A
1
pip install psutil

Q: {0}.{1} does not exist in the JVM

1
"{0}.{1} does not exist in the JVM".format(self._fqn, name))
A
1
pyspark 与spark版本不对

Q: Python worker failed to connect back

1
2
3
4
5
6
7
8
9
10
11
12
22/06/16 12:20:00 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1) (192.168.3.96 executor 0): org.apache.spark.SparkException: Python worker failed to connect back.
at org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:188)
at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:108)
at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:121)
at org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:162)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:65)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
A
1
2
PYSPARK_PYTHON=D:\ProgramData\Anaconda3\python.exe
PYSPARK_DRIVER_PYTHON=D:\ProgramData\Anaconda3\python.exe