如何将 for 循环拆分为 3 个单独的数据框？

Question

K4liber

Asked: 2024-08-03 16:11:39 +0800 CST2024-08-03 16:11:39 +0800 CST 2024-08-03 16:11:39 +0800 CST

是什么原因导致 Python 3.13-0b3（禁用 GIL 进行编译）比 3.12.0 慢？

772

3.12.0我对3.13.0b3使用标志编译的python 进行了一个简单的性能测试。该程序使用或--disable-gil执行斐波那契数列的计算。介绍禁用 GIL 的 PEP 文档说，会产生一些开销，主要是由于偏差引用计数和随后的每个对象锁定（https://peps.python.org/pep-0703/#performance）。但它说 pyperformance 基准测试套件的开销约为 5-8%。我的简单基准测试显示性能上有显著差异。事实上，没有 GIL 的 python 3.13 使用所有 CPU，但它比带有 GIL 的 python 3.12 慢得多。根据 CPU 利用率和经过的时间，我们可以得出结论，与 3.12 相比，python 3.13 的时钟周期要多几倍。ThreadPoolExecutorProcessPoolExecutorThreadPoolExecutor

程序代码：

from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
import datetime
from functools import partial
import sys
import logging
import multiprocessing

logging.basicConfig(
    format='%(levelname)s: %(message)s',
)
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
cpus = multiprocessing.cpu_count()
pool_executor = ProcessPoolExecutor if len(sys.argv) > 1 and sys.argv[1] == '1' else ThreadPoolExecutor
python_version_str = f'{sys.version_info.major}.{sys.version_info.minor}.{sys.version_info.micro}'
logger.info(f'Executor={pool_executor.__name__}, python={python_version_str}, cpus={cpus}')


def fibonacci(n: int) -> int:
    if n < 0:
        raise ValueError("Incorrect input")
    elif n == 0:
        return 0
    elif n == 1 or n == 2:
        return 1
    else:
        return fibonacci(n-1) + fibonacci(n-2)

start = datetime.datetime.now()

with pool_executor(8) as executor:
    for task_id in range(30):
        executor.submit(partial(fibonacci, 30))

    executor.shutdown(wait=True)

end = datetime.datetime.now()
elapsed = end - start
logger.info(f'Elapsed: {elapsed.total_seconds():.2f} seconds')

检测结果：

# TEST Linux 5.15.0-58-generic, Ubuntu 20.04.6 LTS

INFO: Executor=ThreadPoolExecutor, python=3.12.0, cpus=2
INFO: Elapsed: 10.54 seconds

INFO: Executor=ProcessPoolExecutor, python=3.12.0, cpus=2
INFO: Elapsed: 4.33 seconds

INFO: Executor=ThreadPoolExecutor, python=3.13.0b3, cpus=2
INFO: Elapsed: 22.48 seconds

INFO: Executor=ProcessPoolExecutor, python=3.13.0b3, cpus=2
INFO: Elapsed: 22.03 seconds

有人能解释一下为什么我在将开销与 pyperformance 基准套件的开销进行比较时会遇到这样的差异吗？

编辑1

我尝试用->pool_executor(cpus)代替pool_executor(8)，仍然得到类似的结果。
我观看了这个视频https://www.youtube.com/watch?v=zWPe_CUR4yU并执行了以下测试：https://github.com/ArjanCodes/examples/blob/main/2024/gil/main.py

结果：

Version of python: 3.12.0a7 (main, Oct  8 2023, 12:41:37) [GCC 9.4.0]
GIL cannot be disabled
Single-threaded: 78498 primes in 6.67 seconds
Threaded: 78498 primes in 7.89 seconds
Multiprocessed: 78498 primes in 5.85 seconds

Version of python: 3.13.0b3 experimental free-threading build (heads/3.13.0b3:7b413952e8, Jul 27 2024, 11:19:31) [GCC 9.4.0]
GIL is disabled
Single-threaded: 78498 primes in 61.42 seconds
Threaded: 78498 primes in 32.29 seconds
Multiprocessed: 78498 primes in 39.85 seconds

因此，在我的计算机上再次进行测试时，性能最终会降低数倍。顺便说一句。在视频中，我们可以看到与 PEP 中描述的类似的开销结果。

编辑2

正如 @ekhumoro 所建议的，我确实使用以下标志配置了构建：
./configure --disable-gil --enable-optimizations
并且似乎该--enable-optimizations标志在考虑的基准测试中产生了显著差异。之前的构建是使用以下配置完成的：
./configure --with-pydebug --disable-gil。

测试结果：

斐波那契基准：

INFO: Executor=ThreadPoolExecutor, python=3.12.0, cpus=2
INFO: Elapsed: 10.25 seconds

INFO: Executor=ProcessPoolExecutor, python=3.12.0, cpus=2
INFO: Elapsed: 4.27 seconds

INFO: Executor=ThreadPoolExecutor, python=3.13.0, cpus=2
INFO: Elapsed: 6.94 seconds

INFO: Executor=ProcessPoolExecutor, python=3.13.0, cpus=2
INFO: Elapsed: 6.94 seconds

素数基准：

Version of python: 3.12.0a7 (main, Oct  8 2023, 12:41:37) [GCC 9.4.0]
GIL cannot be disabled
Single-threaded: 78498 primes in 5.77 seconds
Threaded: 78498 primes in 7.21 seconds
Multiprocessed: 78498 primes in 3.23 seconds

Version of python: 3.13.0b3 experimental free-threading build (heads/3.13.0b3:7b413952e8, Aug  3 2024, 14:47:48) [GCC 9.4.0]
GIL is disabled
Single-threaded: 78498 primes in 7.99 seconds
Threaded: 78498 primes in 4.17 seconds
Multiprocessed: 78498 primes in 4.40 seconds

因此，从 python 3.12 多处理转移到 python 3.12 no-gil 多线程的总体好处是节省大量内存（我们只有一个进程）。

当我们比较只有 2 个核心的机器的 CPU 开销时：

[Fibonacci] Python 3.13 多线程与 Python 3.12 多处理：（6.94 - 4.27）/4.27 * 100% ~= 63% 开销

[素数] Python 3.13 多线程与 Python 3.12 多处理：（4.17 - 3.23）/3.23 * 100% ~= 29% 开销

2 个回答

Voted

gill bates · Answer 1 · 2024-08-03T17:02:56+08:00

gill bates

2024-08-03T17:02:56+08:002024-08-03T17:02:56+08:00

您的代码是高度递归函数，具有指数时间复杂度。每次调用 fibonacci(30) 都可能导致许多冗余计算。它不是计算斐波那契数的有效方法，并且主要受 CPU 限制且计算成本高。

https://www.youtube.com/watch?v=Q83nN97LVOU

1

ekhumoro · Answer 2 · 2024-08-03T21:35:52+08:00

Best Answer

ekhumoro

2024-08-03T21:35:52+08:002024-08-03T21:35:52+08:00

从最新的问题编辑来看，用于测试的 Python-3.13 版本似乎是在启用调试模式的情况下构建的，并且没有启用优化。前一个标志尤其会对性能测试产生很大影响，而后者的影响要小得多，但仍然很重要。一般来说，在使用 Python 的开发版本进行测试时，最好避免得出有关性能问题的任何结论。

1

是什么原因导致 Python 3.13-0b3（禁用 GIL 进行编译）比 3.12.0 慢？

编辑1

编辑2

斐波那契基准：

素数基准：

`(表达式，左值) = 右值` 在 C 或 C++ 中是有效的赋值吗？为什么有些编译器会接受/拒绝它？

何时应使用 std::inplace_vector 而不是 std::vector？

在 C++ 中，一个不执行任何操作的空程序需要 204KB 的堆，但在 C 中则不需要

如果 T 既不可构造、不可复制、也不可移动，那么我可以拥有 std::optional<T> 吗？

为什么我可以定义一个 constinit 的 std::string 实例？如果对象需要动态初始化，constinit 不是被禁止的吗？

如何分配以后放置的新“如同新”

PowerBI 目前与 BigQuery 不兼容：Simba 驱动程序与 Windows 更新有关

将 NULL 和 nullptr 传递给模板参数有什么区别？

AdMob：MobileAds.initialize() - 对于某些设备，“java.lang.Integer 无法转换为 java.lang.String”

我正在尝试仅使用海龟随机和数学模块来制作吃豆人游戏

是什么原因导致 Python 3.13-0b3（禁用 GIL 进行编译）比 3.12.0 慢？

编辑1

编辑2

斐波那契基准：

素数基准：

2 个回答

相关问题