$ pyspark
Python 3.11.2 (main, Feb 17 2023, 09:28:16) [GCC 8.5.0 20210514 (Red Hat 8.5.0-18)] on linux
Type "help", "copyright", "credits" or "license" for more information.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 3.4.0.
/_/
Using Python version 3.11.2 (main, Feb 17 2023 09:28:16)
Spark context Web UI available at http://xxx.xxx.xxx:4040
Spark context available as 'sc' (master = yarn, app id = application_9999999999999_9999).
SparkSession available as 'spark'.
>>>
>>> cp = spark._jvm.System.getProperty("java.class.path")
>>> for jar in sorted(cp.split(":")): print(jar)
...
/etc/hive/conf/
/opt/spark340/lib/hadoop/client/avro.jar
/opt/spark340/lib/hadoop/client/aws-java-sdk-bundle-1.12.599.jar
/opt/spark340/lib/hadoop/client/aws-java-sdk-bundle.jar
/opt/spark340/lib/hadoop/client/azure-data-lake-store-sdk-2.3.6.jar
/opt/spark340/lib/hadoop/client/azure-data-lake-store-sdk.jar
/opt/spark340/lib/hadoop/client/checker-qual-2.8.1.jar
:
:
恐怕这并不完全正确。pyspark 通过 py4j 模块与 JVM 的内存进行交互,您也可以通过 java 网关访问该内存。
例如,让我们检查 Spark JVM 的类路径(这些都是有可能加载到“用户内存”中的 Java 类):
补充一点——您的图表(如果有源代码链接就更好了!)既不准确也不完整。Spark 从 2.0 版开始使用统一内存模型,因此没有执行与存储内存(相应的配置设置在 3.0 版中已弃用并删除)。在 pyspark 中还有一个专门为 python 分配内存的选项。有关配置设置的完整列表,请查看https://spark.apache.org/docs/latest/configuration.html#application-properties