Ξένη Γήινος提出的问题 -coding

Ξένη Γήινος

Asked: 2025-04-14 22:18:46 +0800 CST

有什么更快的方法可以找到整数的所有唯一分区及其权重？

6

我看过很多关于这个主题的帖子，但没有一个是我想要的。

我想找到所有方法将大于 1 的正整数 N 表示为从 1 到 N 的最多 N 个整数之和，就像平常一样。

例如，在标准表示法中，这些都是 6 的所有分区：

[(1, 1, 1, 1, 1, 1),
 (1, 1, 1, 1, 2),
 (1, 1, 1, 2, 1),
 (1, 1, 1, 3),
 (1, 1, 2, 1, 1),
 (1, 1, 2, 2),
 (1, 1, 3, 1),
 (1, 1, 4),
 (1, 2, 1, 1, 1),
 (1, 2, 1, 2),
 (1, 2, 2, 1),
 (1, 2, 3),
 (1, 3, 1, 1),
 (1, 3, 2),
 (1, 4, 1),
 (1, 5),
 (2, 1, 1, 1, 1),
 (2, 1, 1, 2),
 (2, 1, 2, 1),
 (2, 1, 3),
 (2, 2, 1, 1),
 (2, 2, 2),
 (2, 3, 1),
 (2, 4),
 (3, 1, 1, 1),
 (3, 1, 2),
 (3, 2, 1),
 (3, 3),
 (4, 1, 1),
 (4, 2),
 (5, 1),
 (6,)]

现在，这种表示法的熵非常低，首先，每次数字出现都会增加特定分区的大小，这效率低下，而且当数字多次出现时，很难计算它们的出现次数。我想用一个二元素元组来替换所有数字的出现，其中第一个元素是数字，第二个元素是计数，例如等价(1, 1, 1, 1, 1, 1)于(1, 6)，它们都包含相同的信息，但显然前者更简洁。

其次，输出中有很多重复元素，例如，有五个分区，每个分区包含四个 1 和一个 2，它们会被算作五个独立的元素。这也很低效，因为加法是可交换的，改变数字的顺序不会改变结果，所以它们都是等价的，都是同一个元素。

然而，如果我们用一个元素替换所有五个元素，就会丢失信息。

我想用以下格式替换它：

Counter({((1, 2), (2, 2)): 6,
         ((1, 1), (2, 1), (3, 1)): 6,
         ((1, 4), (2, 1)): 5,
         ((1, 3), (3, 1)): 4,
         ((1, 2), (4, 1)): 3,
         ((1, 1), (5, 1)): 2,
         ((2, 1), (4, 1)): 2,
         ((1, 6),): 1,
         ((2, 3),): 1,
         ((3, 2),): 1,
         ((6, 1),): 1})

所以我希望结果是，Counter其中键是唯一的分区，而值是数字可以排列的方式数。

是的，我已经为此编写了一个函数，使用了暴力破解和记忆化技术。结果证明它非常高效。

首先这是以标准格式输出的实现，我将其发布在这里以供比较：

def partitions(number: int) -> list[tuple[int, ...]]:
    result = []
    stack = [(number, ())]

    while stack:
        remaining, path = stack.pop()
        if not remaining:
            result.append(path)
        else:
            stack.extend((remaining - i, path + (i,)) for i in range(remaining, 0, -1))

    return result

在 CPython 中查找所有 20 个分区需要 582 毫秒，在 PyPy3 中需要 200 毫秒：

CPython

In [22]: %timeit partitions(20)
582 ms ± 4.22 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

PyPy3

In [36]: %timeit partitions(20)
199 ms ± 3.17 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

现在，使用记忆法进行暴力破解，以预期的格式输出：

PARTITION_COUNTERS = {}


def partition_counter(number: int) -> Counter:
    if result := PARTITION_COUNTERS.get(number):
        return result
    
    result = Counter()
    for i in range(1, number):
        for run, count in partition_counter(number - i).items():
            new_run = []
            added = False
            for a, b in run:
                if a == i:
                    new_run.append((a, b + 1))
                    added = True
                else:
                    new_run.append((a, b))
            
            if not added:
                new_run.append((i, 1))
            
            result[tuple(sorted(new_run))] += count
    
    result[((number, 1),)] = 1
    PARTITION_COUNTERS[number] = result
    return result

CPython

In [23]: %timeit PARTITION_COUNTERS.clear(); partition_counter(20)
10.4 ms ± 72.1 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

PyPy3

In [37]: %timeit PARTITION_COUNTERS.clear(); partition_counter(20)
9.75 ms ± 58.3 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

找到所有 20 个分区只需要 10 毫秒，比第一个函数快得多，而且 PyPy3 并没有使其更快。

但我们如何才能做得更好呢？毕竟，我只是在用蛮力，我知道有很多用于整数分割的智能算法，但它们都无法生成预期格式的输出。

Ξένη Γήινος

Asked: 2025-04-12 15:38:21 +0800 CST

为什么这些几乎相同的功能表现却截然不同？

7

我已经编写了四个函数来修改一个方形二维数组，它将由相交的两条边和相应的 45 度对角线界定的方形数组的一半反映到由相同对角线分隔的另一半。

我为这四种可能的情况分别编写了一个函数，以product(('upper', 'lower'), ('left', 'right'))反映product(('lower', 'upper'), ('right', 'left'))。

他们使用 Numba 进行即时编译，并使用并行化，numba.prange因此比 NumPy 提供的方法快得多：

In [2]: sqr = np.random.randint(0, 256, (1000, 1000), dtype=np.uint8)

In [3]: %timeit x, y = np.tril_indices(1000); sqr[x, y] = sqr[y, x]
9.16 ms ± 30.9 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

如您所见，上述代码执行需要很长时间。

import numpy as np
import numba as nb


@nb.njit(cache=True, parallel=True, nogil=True)
def triangle_flip_LL2UR(arr: np.ndarray) -> None:
    height, width = arr.shape[:2]
    if height != width:
        raise ValueError("argument arr must be a square")

    for i in nb.prange(height):
        arr[i, i:] = arr[i:, i]


@nb.njit(cache=True, parallel=True, nogil=True)
def triangle_flip_UR2LL(arr: np.ndarray) -> None:
    height, width = arr.shape[:2]
    if height != width:
        raise ValueError("argument arr must be a square")

    for i in nb.prange(height):
        arr[i:, i] = arr[i, i:]


@nb.njit(cache=True, parallel=True, nogil=True)
def triangle_flip_LR2UL(arr: np.ndarray) -> None:
    height, width = arr.shape[:2]
    if height != width:
        raise ValueError("argument arr must be a square")

    last = height - 1
    for i in nb.prange(height):
        arr[i, last - i :: -1] = arr[i:, last - i]


@nb.njit(cache=True, parallel=True, nogil=True)
def triangle_flip_UL2LR(arr: np.ndarray) -> None:
    height, width = arr.shape[:2]
    if height != width:
        raise ValueError("argument arr must be a square")

    last = height - 1
    for i in nb.prange(height):
        arr[i:, last - i] = arr[i, last - i :: -1]

In [4]: triangle_flip_LL2UR(sqr)

In [5]: triangle_flip_UR2LL(sqr)

In [6]: triangle_flip_LR2UL(sqr)

In [7]: triangle_flip_UL2LR(sqr)

In [8]: %timeit triangle_flip_LL2UR(sqr)
194 μs ± 634 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [9]: %timeit triangle_flip_UR2LL(sqr)
488 μs ± 3.26 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [10]: %timeit triangle_flip_LR2UL(sqr)
196 μs ± 501 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [11]: %timeit triangle_flip_UL2LR(sqr)
486 μs ± 855 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

为什么它们的执行时间会有如此大的差异？尽管它们几乎完全相同，但其中两个执行时间约为 200 微秒，另外两个执行时间约为 500 微秒。

我发现了一些东西。triangle_flip_UR2LL(arr)和相同triangle_flip_LL2UR(sqr.T)，反之亦然。

现在，如果我在调用函数之前转置数组，性能趋势就会逆转：

In [109]: %timeit triangle_flip_UR2LL(sqr.T)
196 μs ± 1.15 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [110]: %timeit triangle_flip_LL2UR(sqr.T)
490 μs ± 1.24 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

为什么会发生这种情况？

Ξένη Γήινος

Asked: 2025-04-11 21:35:57 +0800 CST

如果我对另一个函数进行 JIT 编译，为什么这个使用 Numba JIT 的快速函数会变慢？

6

所以我有这个功能：

import numpy as np
import numba as nb


@nb.njit(cache=True, parallel=True, nogil=True)
def triangle_half_UR_LL(size: int, swap: bool = False) -> tuple[np.ndarray, np.ndarray]:
    total = (size + 1) * size // 2
    x_coords = np.full(total, 0, dtype=np.uint16)
    y_coords = np.full(total, 0, dtype=np.uint16)
    offset = 0
    side = np.arange(size, dtype=np.uint16)
    for i in nb.prange(size):
        offset = i * size - (i - 1) * i // 2
        end = offset + size - i
        x_coords[offset:end] = i
        y_coords[offset:end] = side[i:]
    
    return (x_coords, y_coords) if not swap else (y_coords, x_coords)

它的作用并不重要，重点是它是用 Numba 进行 JIT 编译的，因此速度非常快：

In [2]: triangle_half_UR_LL(10)
Out[2]:
(array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2,
        2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5,
        5, 6, 6, 6, 6, 7, 7, 7, 8, 8, 9], dtype=uint16),
 array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 2, 3, 4,
        5, 6, 7, 8, 9, 3, 4, 5, 6, 7, 8, 9, 4, 5, 6, 7, 8, 9, 5, 6, 7, 8,
        9, 6, 7, 8, 9, 7, 8, 9, 8, 9, 9], dtype=uint16))

In [3]: %timeit triangle_half_UR_LL(1000)
166 μs ± 489 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [4]: %timeit triangle_half_UR_LL(1000)
166 μs ± 270 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [5]: %timeit triangle_half_UR_LL(1000)
166 μs ± 506 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

现在，如果我定义另一个函数并使用 Numba 对其进行 JIT 编译，则快速函数的性能会莫名其妙地下降：

In [6]: @nb.njit(cache=True)
   ...: def dummy():
   ...:     pass

In [7]: dummy()

In [8]: %timeit triangle_half_UR_LL(1000)
980 μs ± 20 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [9]: %timeit triangle_half_UR_LL(1000)
976 μs ± 9.9 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [10]: %timeit triangle_half_UR_LL(1000)
974 μs ± 3.11 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

这是真的，我已经成功复现过很多次这个问题了，每次都成功。我启动一个新的解释器会话，粘贴代码，它运行得很快。我定义了一个虚拟函数，然后调用它，原本运行很快的函数就莫名其妙地变慢了。

截图为证：

我正在使用 Windows 11，我完全不知道到底发生了什么。

对此有什么解释吗？我该如何避免这个问题？

有趣的是，如果我去掉nogil参数而不改变任何其他东西，问题就会神奇地消失：

In [1]: import numpy as np
   ...: import numba as nb
   ...:
   ...:
   ...: @nb.njit(cache=True, parallel=True)
   ...: def triangle_half_UR_LL(size: int, swap: bool = False) -> tuple[np.ndarray, np.ndarray]:
   ...:     total = (size + 1) * size // 2
   ...:     x_coords = np.full(total, 0, dtype=np.uint16)
   ...:     y_coords = np.full(total, 0, dtype=np.uint16)
   ...:     offset = 0
   ...:     side = np.arange(size, dtype=np.uint16)
   ...:     for i in nb.prange(size):
   ...:         offset = i * size - (i - 1) * i // 2
   ...:         end = offset + size - i
   ...:         x_coords[offset:end] = i
   ...:         y_coords[offset:end] = side[i:]
   ...:
   ...:     return (x_coords, y_coords) if not swap else (y_coords, x_coords)

In [2]: %timeit triangle_half_UR_LL(1000)
186 μs ± 47.9 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [3]: %timeit triangle_half_UR_LL(1000)
167 μs ± 1.61 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [4]: %timeit triangle_half_UR_LL(1000)
166 μs ± 109 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [5]: @nb.njit(cache=True)
   ...: def dummy():
   ...:     pass

In [6]: dummy()

In [7]: %timeit triangle_half_UR_LL(1000)
167 μs ± 308 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [8]: %timeit triangle_half_UR_LL(1000)
166 μs ± 312 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [9]: %timeit triangle_half_UR_LL(1000)
167 μs ± 624 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

为什么会发生这种情况？

但事实并非如此，如果我定义其他函数，第一个函数的速度又会变慢。重现这个问题最简单的方法就是重新定义它：

In [7]: dummy()

In [8]: %timeit triangle_half_UR_LL(1000)
168 μs ± 750 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [9]: import numpy as np

In [10]: %timeit triangle_half_UR_LL(1000)
167 μs ± 958 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [11]: import numba as nb

In [12]: %timeit triangle_half_UR_LL(1000)
167 μs ± 311 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [13]: @nb.njit(cache=True, parallel=True)
    ...: def triangle_half_UR_LL(size: int, swap: bool = False) -> tuple[np.ndarray, np.ndarray]:
    ...:     total = (size + 1) * size // 2
    ...:     x_coords = np.full(total, 0, dtype=np.uint16)
    ...:     y_coords = np.full(total, 0, dtype=np.uint16)
    ...:     offset = 0
    ...:     side = np.arange(size, dtype=np.uint16)
    ...:     for i in nb.prange(size):
    ...:         offset = i * size - (i - 1) * i // 2
    ...:         end = offset + size - i
    ...:         x_coords[offset:end] = i
    ...:         y_coords[offset:end] = side[i:]
    ...:
    ...:     return (x_coords, y_coords) if not swap else (y_coords, x_coords)

In [14]: %timeit triangle_half_UR_LL(1000)
1.01 ms ± 94.3 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [15]: %timeit triangle_half_UR_LL(1000)
964 μs ± 2.02 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

如果我定义以下函数并调用它，速度也会变慢：

@nb.njit(cache=True)
def Farey_sequence(n: int) -> np.ndarray:
    a, b, c, d = 0, 1, 1, n
    result = [(a, b)]
    while 0 <= c <= n:
        k = (n + b) // d
        a, b, c, d = c, d, k * c - a, k * d - b
        result.append((a, b))

    return np.array(result, dtype=np.uint64)

In [6]: %timeit triangle_half_UR_LL(1000)
166 μs ± 296 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [7]: %timeit Farey_sequence(16)
The slowest run took 6.25 times longer than the fastest. This could mean that an intermediate result is being cached.
6.03 μs ± 5.72 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [8]: %timeit Farey_sequence(16)
2.77 μs ± 50.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [9]: %timeit triangle_half_UR_LL(1000)
966 μs ± 6.48 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Ξένη Γήινος

Asked: 2025-04-09 22:19:47 +0800 CST

如何找到正方形中与非约分相对应的所有网格点？

13

给定一个正整数N，我们可以把正方形内的所有网格点都标为N x N，从1开始，网格点总数为N x N，网格点为list(itertools.product(range(1, N + 1), repeat=2))。

现在，我想找到(x, y)满足条件 x/y 是非约简分数的所有元组，以下是一种保证正确的强力实现，但效率很低：

import math
from itertools import product


def find_complex_points(lim: int) -> list[tuple[int, int]]:
    return [
        (x, y)
        for x, y in product(range(1, lim + 1), repeat=2)
        if math.gcd(x, y) > 1
    ]

现在，下一个函数稍微智能一些，但它会生成重复项，因此速度只会明显加快，但不会快很多：

def find_complex_points_1(lim: int) -> set[tuple[int, int]]:
    lim += 1
    return {
        (x, y)
        for mult in range(2, lim)
        for x, y in product(range(mult, lim, mult), repeat=2)
    }

In [255]: %timeit find_complex_points(1024)
233 ms ± 4.44 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [256]: %timeit find_complex_points_1(1024)
194 ms ± 1.9 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

有没有更好的方法来实现这一点？

（我的目标很简单，我想创建一个形状为（N，N）的 uint8 类型的 NumPy 二维数组，用 255 填充它，并且如果（x + 1）/（y + 1）是非约分式，则使所有像素（x，y）为 0）

我设计了一种方法，它比我以前的方法更智能，而且速度也快得多，但它仍然会生成重复项，我选择不在set这里使用，以便您可以按原样复制粘贴代码并运行一些测试，并按生成顺序查看确切的输出：

def find_complex_points_2(lim: int) -> set[tuple[int, int]]:
    stack = dict.fromkeys(range(lim, 1, -1))
    lim += 1
    points = []
    while stack:
        x, _ = stack.popitem()
        points.append((x, x))
        mults = []
        for y in range(x * 2, lim, x):
            stack.pop(y, None)
            mults.append(y)
            points.extend([(x, y), (y, x)])
        
        for i, x in enumerate(mults):
            points.append((x, x))
            for y in mults[i + 1:]:
                points.extend([(x, y), (y, x)])
    
    return points

In [292]: sorted(set(find_complex_points_2(1024))) == find_complex_points(1024)
Out[292]: True

In [293]: %timeit find_complex_points_2(1024)
58.9 ms ± 580 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [294]: %timeit find_complex_points(1024)
226 ms ± 3.24 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

为了澄清起见，输出find_complex_points_2(10)是：

In [287]: find_complex_points_2(10)
Out[287]:
[(2, 2),
 (2, 4),
 (4, 2),
 (2, 6),
 (6, 2),
 (2, 8),
 (8, 2),
 (2, 10),
 (10, 2),
 (4, 4),
 (4, 6),
 (6, 4),
 (4, 8),
 (8, 4),
 (4, 10),
 (10, 4),
 (6, 6),
 (6, 8),
 (8, 6),
 (6, 10),
 (10, 6),
 (8, 8),
 (8, 10),
 (10, 8),
 (10, 10),
 (3, 3),
 (3, 6),
 (6, 3),
 (3, 9),
 (9, 3),
 (6, 6),
 (6, 9),
 (9, 6),
 (9, 9),
 (5, 5),
 (5, 10),
 (10, 5),
 (10, 10),
 (7, 7)]

如你所见，(10, 10)出现了两次。我想避免重复计算。

这也发生在中find_complex_points_1，如果我不使用集合，那么将会包含许多重复项，因为所使用的方法不可避免地会重复生成它们，通过使用集合仍然会有不必要的计算，它只是不会收集重复项。

不，我实际上希望用它之前的所有数字的总和来替换坐标，因此 N 被 (N ² + N) / 2 替换。

我只是实现了图像生成来更好地说明我想要的东西：

import numpy as np
import numba as nb


@nb.njit(cache=True)
def resize_img(img: np.ndarray, h_scale: int, w_scale: int) -> np.ndarray:
    height, width = img.shape
    result = np.empty((height, h_scale, width, w_scale), np.uint8)
    result[...] = img[:, None, :, None]
    return result.reshape((height * h_scale, width * w_scale))


def find_composite_points(lim: int) -> set[tuple[int, int]]:
    stack = dict.fromkeys(range(lim, 1, -1))
    lim += 1
    points = set()
    while stack:
        x, _ = stack.popitem()
        points.add((x, x))
        mults = []
        for y in range(x * 2, lim, x):
            stack.pop(y, None)
            mults.append(y)
            points.update([(x, y), (y, x)])

        for i, x in enumerate(mults):
            points.add((x, x))
            for y in mults[i + 1 :]:
                points.update([(x, y), (y, x)])

    return points


def natural_sum(n: int) -> int:
    return (n + 1) * n // 2


def composite_image(lim: int, scale: int) -> np.ndarray:
    length = natural_sum(lim)
    img = np.full((length, length), 255, dtype=np.uint8)
    for x, y in find_composite_points(lim):
        x1, y1 = natural_sum(x - 1), natural_sum(y - 1)
        img[x1 : x1 + x, y1 : y1 + y] = 0

    return resize_img(img, scale, scale)

composite_image(12, 12)

Ξένη Γήινος

Asked: 2025-03-22 21:38:07 +0800 CST

为什么反正切的连分式展开与半角公式相结合对 Machin 类级数不起作用？

8

抱歉标题太长了。我不知道这更像是数学问题还是编程问题，但我认为我的数学非常生疏，而我更擅长编程。

所以我有这个反正切的连分式展开式：

我从维基百科上找到的

我尝试找到一个简单的算法来计算它：

我做到了，我编写了一个连分数展开的无限精度实现，没有使用任何库，只使用基本的整数运算：

import json
import math
import random
from decimal import Decimal, getcontext
from typing import Callable, List, Tuple


Fraction = Tuple[int, int]


def arctan_cf(y: int, x: int, lim: int) -> Fraction:
    y_sq = y**2
    a1, a2 = y, 3 * x * y
    b1, b2 = x, 3 * x**2 + y_sq
    odd = 5
    for i in range(2, 2 + lim):
        t1, t2 = odd * x, i**2 * y_sq
        a1, a2 = a2, t1 * a2 + t2 * a1
        b1, b2 = b2, t1 * b2 + t2 * b1
        odd += 2

    return a2, b2

并且它比我以前使用的牛顿反正切级数收敛得更快。

现在我认为如果我将它与反正切的半角公式结合起来，它应该收敛得更快。

def half_arctan_cf(y: int, x: int, lim: int) -> Fraction:
    c = (x**2 + y**2) ** 0.5
    a, b = c.as_integer_ratio()
    a, b = arctan_cf(a - b * x, b * y, lim)
    return 2 * a, b

事实上，它确实收敛得更快：

def test_accuracy(lim: int) -> dict:
    result = {}
    for _ in range(lim):
        x, y = random.sample(range(1024), 2)
        while not x or not y:
            x, y = random.sample(range(1024), 2)

        atan2 = math.atan2(y, x)
        entry = {"atan": atan2}
        for fname, func in zip(
            ("arctan_cf", "half_arctan_cf"), (arctan_cf, half_arctan_cf)
        ):
            i = 1
            while True:
                a, b = func(y, x, i)
                if math.isclose(deci := a / b, atan2):
                    break

                i += 1

            entry[fname] = (i, deci)

        result[f"{y} / {x}"] = entry

    return result


print(json.dumps(test_accuracy(8), indent=4))

for v in test_accuracy(128).values():
    assert v["half_arctan_cf"][0] <= v["arctan_cf"][0]

{
    "206 / 136": {
        "atan": 0.9872880750087898,
        "arctan_cf": [
            16,
            0.9872880746658675
        ],
        "half_arctan_cf": [
            6,
            0.9872880746018052
        ]
    },
    "537 / 308": {
        "atan": 1.0500473287277563,
        "arctan_cf": [
            18,
            1.0500473281360896
        ],
        "half_arctan_cf": [
            7,
            1.0500473288158192
        ]
    },
    "331 / 356": {
        "atan": 0.7490241118247137,
        "arctan_cf": [
            10,
            0.7490241115996227
        ],
        "half_arctan_cf": [
            5,
            0.749024111913438
        ]
    },
    "744 / 613": {
        "atan": 0.8816364228048325,
        "arctan_cf": [
            13,
            0.8816364230439662
        ],
        "half_arctan_cf": [
            6,
            0.8816364227495634
        ]
    },
    "960 / 419": {
        "atan": 1.1592605364805093,
        "arctan_cf": [
            24,
            1.1592605359263286
        ],
        "half_arctan_cf": [
            7,
            1.1592605371181872
        ]
    },
    "597 / 884": {
        "atan": 0.5939827714677137,
        "arctan_cf": [
            7,
            0.5939827719895824
        ],
        "half_arctan_cf": [
            4,
            0.59398277135389
        ]
    },
    "212 / 498": {
        "atan": 0.40246578425167584,
        "arctan_cf": [
            5,
            0.4024657843859885
        ],
        "half_arctan_cf": [
            3,
            0.40246578431841773
        ]
    },
    "837 / 212": {
        "atan": 1.322727785860997,
        "arctan_cf": [
            41,
            1.322727786922624
        ],
        "half_arctan_cf": [
            8,
            1.3227277847674388
        ]
    }
}

对于大量样本来说，该断言块运行的时间相当长，但它从未引发异常。

所以我认为我可以使用反正切的连分式展开和类似 Machin 的级数来计算 π。（我使用了链接部分中的最后一个级数，因为它收敛速度最快）

def sum_fractions(fractions: List[Fraction]) -> Fraction:
    while (length := len(fractions)) > 1:
        stack = []
        for i in range(0, length - (odd := length & 1), 2):
            num1, den1 = fractions[i]
            num2, den2 = fractions[i + 1]
            stack.append((num1 * den2 + num2 * den1, den1 * den2))

        if odd:
            stack.append(fractions[-1])

        fractions = stack

    return fractions[0]


MACHIN_SERIES = ((44, 57), (7, 239), (-12, 682), (24, 12943))


def approximate_loop(lim: int, func: Callable) -> List[Fraction]:
    fractions = []
    for coef, denom in MACHIN_SERIES:
        dividend, divisor = func(1, denom, lim)
        fractions.append((coef * dividend, divisor))

    return fractions


def approximate_1(lim: int) -> List[Fraction]:
    return approximate_loop(lim, arctan_cf)


def approximate_2(lim: int) -> List[Fraction]:
    return approximate_loop(lim, half_arctan_cf)


approx_funcs = (approximate_1, approximate_2)


def calculate_pi(lim: int, approx: bool = 0) -> Fraction:
    dividend, divisor = sum_fractions(approx_funcs[approx](lim))
    dividend *= 4
    return dividend // (common := math.gcd(dividend, divisor)), divisor // common

getcontext().rounding = 'ROUND_DOWN'
def to_decimal(dividend: int, divisor: int, places: int) -> str:
    getcontext().prec = places + len(str(dividend // divisor))
    return str(Decimal(dividend) / Decimal(divisor))


def get_accuracy(lim: int, approx: bool = 0) -> Tuple[int, str]:
    length = 12
    fraction = calculate_pi(lim, approx)
    while True:
        decimal = to_decimal(*fraction, length)
        for i, e in enumerate(decimal):
            if Pillion[i] != e:
                return (max(0, i - 2), decimal[:i])

        length += 10


with open("D:/Pillion.txt", "r") as f:
    Pillion = f.read()

Pillion.txt包含 π 的前 1000001 位数字，Pi + Million = Pillion。

它确实有效，但只是部分有效。基本连分数展开与类似 Machin 的公式配合得很好，但与半角公式结合，无论如何我都只能得到 9 位正确的小数，事实上，我在第一次迭代中就得到了 9 位正确的数字，然后整个事情就再也没有改善了：

In [2]: get_accuracy(16)
Out[2]:
(73,
 '3.1415926535897932384626433832795028841971693993751058209749445923078164062')

In [3]: get_accuracy(32)
Out[3]:
(138,
 '3.141592653589793238462643383279502884197169399375105820974944592307816406286208998628034825342117067982148086513282306647093844609550582231')

In [4]: get_accuracy(16, 1)
Out[4]: (9, '3.141592653')

In [5]: get_accuracy(32, 1)
Out[5]: (9, '3.141592653')

In [6]: get_accuracy(1, 1)
Out[6]: (9, '3.141592653')

但数字确实发生了变化：

In [7]: to_decimal(*calculate_pi(1, 1), 32)
Out[7]: '3.14159265360948500093515231500093'

In [8]: to_decimal(*calculate_pi(2, 1), 32)
Out[8]: '3.14159265360945286794831052938917'

In [9]: to_decimal(*calculate_pi(3, 1), 32)
Out[9]: '3.14159265360945286857612896472974'

In [10]: to_decimal(*calculate_pi(4, 1), 32)
Out[10]: '3.14159265360945286857611676794770'

In [11]: to_decimal(*calculate_pi(5, 1), 32)
Out[11]: '3.14159265360945286857611676818392'

为什么半角连分数公式不适用于类似 Machin 的公式？是否有可能让它起作用？如果可以，那么如何起作用？我想要一个证明它是不可能的证据，或者一个证明它是可能的实例。

只需进行健全性检查，使用 π/4 = arctan(1) 我能够得出half_arctan_cfπ 的数字，但它的收敛速度要慢得多：

def approximate_3(lim: int) -> List[Fraction]:
    return [half_arctan_cf(1, 1, lim)]


approx_funcs = (approximate_1, approximate_2, approximate_3)

In [28]: get_accuracy(16, 2)
Out[28]: (15, '3.141592653589793')

In [29]: get_accuracy(16, 0)
Out[29]:
(73,
 '3.1415926535897932384626433832795028841971693993751058209749445923078164062')

同样的问题再次出现，在第 10 次迭代时达到 15 位数字的最大精度：

In [37]: get_accuracy(9, 2)
Out[37]: (14, '3.14159265358979')

In [38]: get_accuracy(10, 2)
Out[38]: (15, '3.141592653589793')

In [39]: get_accuracy(11, 2)
Out[39]: (15, '3.141592653589793')

In [40]: get_accuracy(32, 2)
Out[40]: (15, '3.141592653589793')

我刚刚重写了反正切连分数的实现并使其避免进行冗余计算。

在我的代码中，每次迭代 t1 都会增加 2 * y_sq，因此无需重复将 y_sq 乘以奇数，而是只需使用累积变量和 2 * y_sq 的步长。

并且每对连续平方数之间的差恰好是奇数，因此我可以使用累积变量的累积变量。

def arctan_cf_0(y: int, x: int, lim: int) -> Fraction:
    y_sq = y**2
    a1, a2 = y, 3 * x * y
    b1, b2 = x, 3 * x**2 + y_sq
    odd = 5
    for i in range(2, 2 + lim):
        t1, t2 = odd * x, i**2 * y_sq
        a1, a2 = a2, t1 * a2 + t2 * a1
        b1, b2 = b2, t1 * b2 + t2 * b1
        odd += 2

    return a2, b2


def arctan_cf(y: int, x: int, lim: int) -> Fraction:
    y_sq = y**2
    a1, a2 = y, 3 * x * y
    b1, b2 = x, 3 * x**2 + y_sq
    t1_step, t3_step = 2 * x, 2 * y_sq
    t1, t2 = 5 * x, 4 * y_sq
    t3 = t2 + y_sq
    for _ in range(lim):
        a1, a2 = a2, t1 * a2 + t2 * a1
        b1, b2 = b2, t1 * b2 + t2 * b1
        t1 += t1_step
        t2 += t3
        t3 += t3_step

    return a2, b2

In [301]: arctan_cf_0(4, 3, 100) == arctan_cf(4, 3, 100)
Out[301]: True

In [302]: %timeit arctan_cf_0(4, 3, 100)
58.6 μs ± 503 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [303]: %timeit arctan_cf(4, 3, 100)
54.3 μs ± 816 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

虽然这并没有显著提高速度，但这无疑是一种进步。

Ξένη Γήινος

Asked: 2025-03-08 20:03:28 +0800 CST

如何计算二进制中的前N个自然数？

6

这看起来可能微不足道，但我还没有找到解决这个问题的好办法。我甚至找到了这个：以最快的方式生成所有 n 位二进制数。但我还没有找到完全重复的。

问题很简单，给定一个正整数极限 N，按顺序生成最多为 N 个自然数的二进制表示（不包括 N，第一个自然数是 0，因此 N - 1 是需要表示的最大数字），形式上tuple，每个元组都用前导零填充，以使所有表示的长度相同。

例如，如果N为 4，则输出应该是[(0, 0), (0, 1), (1, 0), (1, 1)]。

此时这个问题确实很简单，但有一个问题，不允许使用“bin(n)和f'{n:b}'”之类的词，算法应该完全在二进制域中运行，因为据我所知，计算机中的所有内容（文本、照片、音乐、视频……）都是二进制数字，所以来回转换表示会增加不必要的计算，这些计算（基数转换）是完全多余的，应该被消除以产生最有效的程序（这是为了将问题限制在尽可能少的域中，以便我们只在这些域上操作）。

我编写了一个简单的程序，可以完成我所描述的操作：

from typing import Generator, Tuple

def count_in_binary(n: int) -> Generator[Tuple[int, ...], None, None]:
    if not isinstance(n, int) or n < 1:
        raise ValueError("The argument n must be a positive integer")

    l = (n - 1).bit_length() if n > 1 else 1
    numeral = [0] * l
    maxi = l - 1
    for _ in range(n):
        yield tuple(numeral)
        i = maxi
        while True:
            if not (d := numeral[i]):
                numeral[i] = 1
                break
            else:
                numeral[i] = 0
                i -= 1

但我不确定这是否是在 Python 中执行此操作的最有效方法。我没有使用过很多位运算，而且计算机已经将数字表示为二进制，因此应该有更快的方法来做到这一点。

问题是，有什么更快的方法吗？

为了进行比较，这里有一种使用 f'{n:b}' 的方法，因此更简洁，但实际上更慢、更愚蠢：

def count_in_binary1(n: int) -> Generator[Tuple[int, ...], None, None]:
    if not isinstance(n, int) or n < 1:
        raise ValueError("The argument n must be a positive integer")

    l = len(f'{n-1:b}')
    for i in range(n):
        yield tuple(map(int, f'{i:0{l}b}'))

In [50]: %timeit list(count_in_binary(256))
59.9 μs ± 209 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [51]: %timeit list(count_in_binary1(256))
452 μs ± 3.68 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

编辑

我没有对原始功能进行太多测试，我只是认为它会起作用，现在它已经修复了。

并且不，范围仅限于纯 Python，因此不允许使用 NumPy。

编辑2

现在我认为没有例外。

我已投票并接受了第二个答案，因为它确实回答了原始问题，尽管原始问题没有包含所有相关信息，所以我试图通过发布问题解决的问题仍然没有解决。

我发布了一个新问题，但没有全部相关信息，请回答：在纯 Python 中，不使用 itertools，以最快的方式找到宽度为 n 的 0、1 的所有排列？

Ξένη Γήινος

Asked: 2025-01-18 13:11:57 +0800 CST

如何在 Python 中正确实现费马因式分解？

7

我正在尝试用 Python 实现高效的素数分解算法。这与作业或工作无关，完全是出于好奇。

我了解到质因数分解非常困难：

我想为此实现有效的算法，作为自我挑战。我已决定首先实现费马分解法，因为它看起来足够简单。

直接从伪代码翻译过来的 Python 代码：

def Fermat_Factor(n):
    a = int(n ** 0.5 + 0.5)
    b2 = abs(a**2 - n)
    while int(b2**0.5) ** 2 != b2:
        a += 1
        b2 = a**2 - n

    return a - b2**0.5, a + b2**0.5

（我必须使用abs，否则b2很容易为负数，并且int转换将失败，TypeError因为根是complex）

如您所见，它返回两个整数，它们的乘积等于输入，但它只返回两个输出，并且不保证因子的素数。我不知道这个算法有多高效，但使用此方法对半素数进行因式分解比我上一个问题中使用的试除法要高效得多：为什么对相近素数的乘积进行因式分解比对不相似素数的乘积进行因式分解要慢得多。

In [20]: %timeit FermatFactor(3607*3803)
2.1 μs ± 28.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [21]: FermatFactor(3607*3803)
Out[21]: [3607, 3803]

In [22]: %timeit FermatFactor(3593 * 3671)
1.69 μs ± 31 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [23]: FermatFactor(3593 * 3671)
Out[23]: [3593, 3671]

In [24]: %timeit FermatFactor(7187 * 7829)
4.94 μs ± 47.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [25]: FermatFactor(7187 * 7829)
Out[25]: [7187, 7829]

In [26]: %timeit FermatFactor(8087 * 8089)
1.38 μs ± 12.9 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [27]: FermatFactor(8087 * 8089)
Out[27]: [8087, 8089]

所以我想用这个算法来生成任何给定整数的所有素因数（当然我知道这只适用于奇数，但这不是问题，因为可以使用位黑客轻松地分解出 2 的幂）。我能想到的最简单的方法是递归调用Fermat_Factor直到n是素数。我不知道如何在这个算法中检查一个数字是否是素数，但我注意到了一些东西：

In [3]: Fermat_Factor(3)
Out[3]: (1.0, 3.0)

In [4]: Fermat_Factor(5)
Out[4]: (1.0, 3.0)

In [5]: Fermat_Factor(7)
Out[5]: (1.0, 7.0)

In [6]: Fermat_Factor(11)
Out[6]: (1.0, 11.0)

In [7]: Fermat_Factor(13)
Out[7]: (1.0, 13.0)

In [8]: Fermat_Factor(17)
Out[8]: (3.0, 5.0)

In [9]: Fermat_Factor(19)
Out[9]: (1.0, 19.0)

In [10]: Fermat_Factor(23)
Out[10]: (1.0, 23.0)

In [11]: Fermat_Factor(29)
Out[11]: (3.0, 7.0)

In [12]: Fermat_Factor(31)
Out[12]: (1.0, 31.0)

In [13]: Fermat_Factor(37)
Out[13]: (5.0, 7.0)

In [14]: Fermat_Factor(41)
Out[14]: (1.0, 41.0)

对于许多素数，此算法输出的第一个数字是 1，但并非所有素数都是如此，因此它不能用于确定递归何时应停止。我学到这个教训很艰难。

因此，我决定改用预生成素数集的成员资格检查。RecursionError: maximum recursion depth exceeded当输入的素数大于集合的最大值时，这自然会导致这种情况。由于我的内存有限，因此这需要考虑实施细节。

所以我已经实现了一个工作版本（针对某些输入），但是对于某些有效输入（限制内的素数的乘积），该算法无法给出正确的输出：

import numpy as np
from itertools import cycle

TRIPLE = ((4, 2), (9, 6), (25, 10))
WHEEL = ( 4, 2, 4, 2, 4, 6, 2, 6 )
def prime_sieve(n):
    primes = np.ones(n + 1, dtype=bool)
    primes[:2] = False
    for square, double in TRIPLE:
        primes[square::double] = False
    
    wheel = cycle(WHEEL)
    k = 7
    while (square := k**2) <= n:
        if primes[k]:
            primes[square::2*k] = False
        
        k += next(wheel)
    
    return np.flatnonzero(primes)
    
PRIMES = list(map(int, prime_sieve(1048576)))
PRIME_SET = set(PRIMES)
TEST_LIMIT = PRIMES[-1] ** 2

def FermatFactor(n):
    if n > TEST_LIMIT:
        raise ValueError('Number too large')
    
    if n in PRIME_SET:
        return [n]
    
    a = int(n ** 0.5 + 0.5)
    if a ** 2 == n:
        return FermatFactor(a) + FermatFactor(a)
    
    b2 = abs(a**2 - n)
    while int(b2**0.5) ** 2 != b2:
        a += 1
        b2 = a**2 - n
    
    return FermatFactor(factor := int(a - b2**0.5)) + FermatFactor(n // factor)

它适用于多种输入：

In [18]: FermatFactor(255)
Out[18]: [3, 5, 17]

In [19]: FermatFactor(511)
Out[19]: [7, 73]

In [20]: FermatFactor(441)
Out[20]: [3, 7, 3, 7]

In [21]: FermatFactor(3*5*823)
Out[21]: [3, 5, 823]

In [22]: FermatFactor(37*333667)
Out[22]: [37, 333667]

In [23]: FermatFactor(13 * 37 * 151 * 727 * 3607)
Out[23]: [13, 37, 727, 151, 3607]

但并非全部：

In [25]: FermatFactor(5 * 53 * 163)
Out[25]: [163, 13, 2, 2, 5]

In [26]: FermatFactor(3*5*73*283)
Out[26]: [17, 3, 7, 3, 283]

In [27]: FermatFactor(3 * 11 * 29 * 71 *  137)
Out[27]: [3, 11, 71, 61, 7, 3, 3]

为什么会出现这种情况？我该如何解决？

Ξένη Γήινος

Asked: 2025-01-18 01:58:27 +0800 CST

为什么相近素数乘积的分解比不相似素数乘积的分解慢得多

6

这是一个纯学术问题，没有任何实际考虑。这不是家庭作业，我很久以前就辍学了。我只是好奇，不知道为什么我就睡不好觉。

我正在摆弄 Python。我决定分解大整数并测量每个输入的调用运行时间。

我使用了许多数字，发现有些数字的分解因数比其他数字要长得多。

然后我决定进一步研究，我很快编写了一个素数筛选函数来生成素数进行测试。我发现一对中等大小的素数（两个四位数的素数）的乘积比一个非常大的素数（六位数以上）和一个小素数（<=三位数）的乘积需要更长的因式分解时间。

起初我以为我的第一个简单测试函数效率低下，事实确实如此，所以我编写了第二个函数，直接从预先生成的素数列表中提取素数，第二个函数确实更高效，但奇怪的是它表现出相同的模式。

以下是我使用的一些数字：

13717421 == 3607 * 3803
13189903 == 3593 * 3671
56267023 == 7187 * 7829
65415743 == 8087 * 8089

12345679 == 37 * 333667
38760793 == 37 * 1047589
158202851 == 151 * 1047701
762312571 == 727 * 1048573

代码：

import numpy as np
from itertools import cycle

def factorize(n):
    factors = []
    while not n % 2:
        factors.append(2)
        n //= 2

    i = 3
    while i**2 <= n:
        while not n % i:
            factors.append(i)
            n //= i
        i += 2
    
    return factors if n == 1 else factors + [n]

TRIPLE = ((4, 2), (9, 6), (25, 10))
WHEEL = ( 4, 2, 4, 2, 4, 6, 2, 6 )
def prime_sieve(n):
    primes = np.ones(n + 1, dtype=bool)
    primes[:2] = False
    for square, double in TRIPLE:
        primes[square::double] = False
    
    wheel = cycle(WHEEL)
    k = 7
    while (square := k**2) <= n:
        if primes[k]:
            primes[square::2*k] = False
        
        k += next(wheel)
    
    return np.flatnonzero(primes)
    
PRIMES = list(map(int, prime_sieve(1048576)))
TEST_LIMIT = PRIMES[-1] ** 2

def factorize_sieve(n):
    if n > TEST_LIMIT:
        raise ValueError('Number too large')

    factors = []
    for p in PRIMES:
        if p**2 > n:
            break
        while not n % p:
            factors.append(p)
            n //= p
    
    return factors if n == 1 else factors + [n]

测试结果：

In [2]: %timeit factorize(13717421)
279 μs ± 4.29 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [3]: %timeit factorize(12345679)
39.6 μs ± 749 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [4]: %timeit factorize_sieve(13717421)
64.1 μs ± 688 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [5]: %timeit factorize_sieve(12345679)
12.6 μs ± 146 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [6]: %timeit factorize_sieve(13189903)
64.6 μs ± 964 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [7]: %timeit factorize_sieve(56267023)
117 μs ± 3.88 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [8]: %timeit factorize_sieve(65415743)
130 μs ± 1.38 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [9]: %timeit factorize_sieve(38760793)
21.1 μs ± 232 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [10]: %timeit factorize_sieve(158202851)
21.4 μs ± 385 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [11]: %timeit factorize_sieve(762312571)
22.1 μs ± 409 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

可以清楚地看到，分解两个中等素数所需的时间比分解两个极端素数所需的时间要长得多。为什么会这样呢？

Ξένη Γήινος

Asked: 2024-05-25 19:32:04 +0800 CST

如何使用索引序列展开for循环？

5

我正在尝试尽可能有效地将无符号整数数据转换为内存中的二进制表示形式。

我编写了四个模板函数来将整数转换为小端和大端，其中两个使用位操作，另外两个使用指针来复制数据。

它们被验证是正确的，而且非常高效，因为我已经确定小端函数与一样快std::memcpy，但大端函数不知何故需要更长的时间。

这些功能是：

#include <vector>

using std::vector;
typedef vector<uint8_t> bytes;

template<class T>
inline bytes LittleEndian(const T& data) {
    size_t size = sizeof(T);
    bytes _bytes(size);
    uint8_t mask = 255;
    for (size_t i = 0, shift = 0; i < size; i++, shift += 8) {
        _bytes[i] = (data >> shift) & mask;
    }
    return _bytes;
}

template<class T>
inline bytes BigEndian(const T& data) {
    size_t size = sizeof(T);
    bytes _bytes(size);
    uint8_t mask = 255;
    for (size_t i = size, shift = 0; i-- > 0; shift += 8) {
        _bytes[i] = (data >> shift) & mask;
    }
    return _bytes;
}

template<class T>
inline bytes CPU_Endian(const T& data) {
    size_t size = sizeof(T);
    bytes _bytes(size);
    uint8_t* dst = (uint8_t *)_bytes.data(), * src = (uint8_t *) & data;
    for (size_t i = 0; i < size; i++) {
        *dst++ = *src++;
    }
    return _bytes;
}

template<class T>
inline bytes Flip_CPU_Endian(const T& data) {
    size_t size = sizeof(T);
    bytes _bytes(size);
    uint8_t* dst = (uint8_t *)_bytes.data(), * src = (uint8_t *)&data + size - 1;
    for (size_t i = 0; i < size; i++) {
        *dst++ = *src--;
    }
    return _bytes;
}

我想使用展开 for 循环std::index_sequence，并且因为它们是相关的，所以我将它们放在一个问题中。它们涉及三件事：重复执行某件事 N 次，制作一个减少而不是增加的索引序列，以及使用索引来设置值。

我尝试自己做，但这不起作用：

template<class T>
inline bytes CPU_Endian2(const T& data) {
    size_t size = sizeof(T);
    bytes _bytes(size);
    uint8_t* dst = (uint8_t*)_bytes.data(), * src = (uint8_t*)&data;
    [&]<std::size_t...N>(std::index_sequence<N...>){
        ((*dst++ = *src++),...);
    }(std::make_index_sequence<size>{});
    return _bytes;
}

无法编译，错误日志：

Build started at 18:54...
1>------ Build started: Project: hexlify_test, Configuration: Release x64 ------
1>hexlify_test.cpp
1>C:\Users\Estranger\source\repos\hexlify_test\hexlify_test.cpp(98,3): error C7515: a fold expression must contain an unexpanded parameter pack
1>C:\Users\Estranger\source\repos\hexlify_test\hexlify_test.cpp(99,3): error C3878: syntax error: unexpected token '(' following 'expression'
1>C:\Users\Estranger\source\repos\hexlify_test\hexlify_test.cpp(99,3): message : error recovery skipped: '( identifier ::  . . . {'
1>C:\Users\Estranger\source\repos\hexlify_test\hexlify_test.cpp(99,35): error C2760: syntax error: '}' was unexpected here; expected ';'
1>Done building project "hexlify_test.vcxproj" -- FAILED.
========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========
========== Build completed at 18:54 and took 01.796 seconds ==========

如何将这些函数转换为使用std::index_sequencefor 循环的函数？

添加constexpr未能size_t size = sizeof(T);使其编译。

Ξένη Γήινος

Asked: 2024-05-23 19:23:04 +0800 CST

C++函数将字节序列转换为字符串表示时结果出现垃圾输出[重复]

6

我写了几个简单的C++函数来将字节序列转换为字符串表示。

这相当直接，我相信我的逻辑是正确的，我认为这非常容易，直到我开始打印字符串，我发现输出是垃圾：

#include <iostream>
#include <string>
#include <vector>

using std::vector;
typedef vector<uint8_t> bytes;
using std::string;
using std::cout;
using namespace std::literals;

string DIGITS = "0123456789abcdef"s;

static inline string hexlify(bytes arr) {
    string repr = ""s;
    for (auto& chr : arr) {
        repr += " " + DIGITS[(chr & 240) >> 4] + DIGITS[chr & 15];
    }
    repr.erase(0, 1);
    return repr;
}

bytes text = {
    84, 111, 32, 98, 101, 32,
    111, 114, 32, 110, 111, 116,
    32, 116, 111, 32, 98, 101
}; // 存在还是不存在

int main() {
    cout << hexlify(text);
}

2♠
÷82♠
÷82♠
÷82♠
÷

为什么会这样？

我知道我的逻辑是正确的，以下是直接翻译到Python的版本：

digits = "0123456789abcdef"
def bytes_string(data):
    s = ""
    for i in data:
        s += " " + digits[(i & 240) >> 4] + digits[i & 15]
    return s[1:]

它工作正常：

>>> bytes_string(b"To be or not to be")
'54 6f 20 62 65 20 6f 72 20 6e 6f 74 20 74 6f 20 62 65'

但为什么在C++中不起作用呢？

我使用的是Visual Studio 2022 V17.9.7，编译器标志：

/permissive- /ifcOutput "hexlify_test\x64\Release\" /GS /GL /W3 /Gy /Zc:wchar_t /Zi /Gm- /O2 /sdl /Fd"hexlify_test\x64\Release\vc143.pdb" /Zc:inline /fp:precise /D "NDEBUG" /D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /errorReport:prompt /WX- /Zc:forScope /std:c17 /Gd /Oi /MD /std:c++20 /FC /Fa"hexlify_test\x64\Release\" /EHsc /nologo /Fo"hexlify_test\x64\Release\" /Ot /Fp"hexlify_test\x64\Release\hexlify_test.pch" /diagnostics:column

我刚发现垃圾输出只在调试模式下发生，在实现修复后，我针对C++20在调试模式下，不知何故代码在调试模式下导致垃圾输出，切换到发布模式解决了问题

Ξένη Γήινος

Asked: 2023-09-24 20:47:04 +0800 CST

如何在 C++ 中将不同参数和返回类型的函数作为参数传递

6

pow我发现像、exp、log等基本数学函数效率cmath极低，因此我决定实现我自己的既准确又高效的版本。我做到了。

我在区间 [1, 2) 或 [0, 1) 上使用九阶多项式逼近来逼近函数，我曾经numpy.polyfit获得多项式，并且使用std::index_sequence它来应用多项式，速度非常快，而我的方法不是与库的一样精确，我可以使用获得至少 13 个正确的小数位fp:precise，并且我的方法给出与库代码相同的精度fp:fast。

现在我想要一个函数来测量函数的执行时间并确定它们的准确性。

以前，我只是为每个想要进行基准测试的新函数复制粘贴相同的代码行，这不是很整洁，而且很容易出错。

我的函数很简单，它们都返回一个浮点数，一个单一值，即floator double。它们接受一个或两个参数，第二个参数是一个数字，可以是int, floator double，其类型没有区别。不过，第一个参数的类型很重要，它可以是floator double，其类型与函数的返回类型相同，并且其类型必须与函数中声明的类型完全匹配，因为我使用std::bit_cast.

作为示例，以下是我提出的用于测量采用单个float参数并返回 a的函数的执行时间的方法float：

#include <chrono>
#include <functional>
#include <vector>

using std::chrono::steady_clock;
using std::chrono::duration;
using std::vector;
using std::function;

double timeit(const function<float(float)>& func, const vector<float>& values, int runs = 1048576){
    auto start = steady_clock::now();
    size_t len = values.size();
    for (int64_t i = 0; i < runs; i++) {
        func(values[i % len]);
    }
    auto end = steady_clock::now();
    duration<double, std::nano> time = end - start;
    return time.count() / runs;
}

我还没有添加验证，稍后我会添加。

我想知道，如何重用相同的函数来对返回 a 的函数进行基准测试double？我知道函数重载，但这意味着函数具有相同的函数体，只是具有不同的参数列表。我想我将不得不对两个参数的函数使用另一个函数，但我不想对只有一个参数的函数使用重载。

我知道，std::variant但我想比较这些值，据我所知它不支持与构成类型进行比较。

怎么解决这个问题呢？

有什么更快的方法可以找到整数的所有唯一分区及其权重？

为什么这些几乎相同的功能表现却截然不同？

如果我对另一个函数进行 JIT 编译，为什么这个使用 Numba JIT 的快速函数会变慢？

如何找到正方形中与非约分相对应的所有网格点？

为什么反正切的连分式展开与半角公式相结合对 Machin 类级数不起作用？

如何计算二进制中的前N个自然数？

编辑

编辑2

如何在 Python 中正确实现费马因式分解？

为什么相近素数乘积的分解比不相似素数乘积的分解慢得多

如何使用索引序列展开for循环？

C++函数将字节序列转换为字符串表示时结果出现垃圾输出[重复]

如何在 C++ 中将不同参数和返回类型的函数作为参数传递

为什么 C++20 概念会导致循环约束错误，而老式的 SFINAE 不会？

VScode 自动卸载扩展的问题（Material 主题）

Vue 3：创建时出错“预期标识符但发现‘导入’”[重复]

具有指定基础类型但没有枚举器的“枚举类”的用途是什么？

如何修复未手动导入的模块的 MODULE_NOT_FOUND 错误？

`(表达式，左值) = 右值` 在 C 或 C++ 中是有效的赋值吗？为什么有些编译器会接受/拒绝它？

何时应使用 std::inplace_vector 而不是 std::vector？

在 C++ 中，一个不执行任何操作的空程序需要 204KB 的堆，但在 C 中则不需要

PowerBI 目前与 BigQuery 不兼容：Simba 驱动程序与 Windows 更新有关

AdMob：MobileAds.initialize() - 对于某些设备，“java.lang.Integer 无法转换为 java.lang.String”

Ξένη Γήινος's questions

编辑

编辑2