Pro Q提出的问题 -coding

Pro Q

Asked: 2025-02-27 17:15:50 +0800 CST

如何扩展存根文件中类的类型提示

5

我有这个代码，但让我烦恼的是我必须进行f两次转换：

    with h5py.File(computed_properties_path, "r") as f:
        # get the set of computed metrics
        computed_metrics = set()
        # iterating through the file iterates through the keys which are dataset names
        f = cast(Iterable[str], f)
        dataset_name: str
        for dataset_name in f:
            # re-cast it as a file
            f = cast(h5py.File, f)
            dataset_group = index_hdf5(f, [dataset_name], h5py.Group)
            for metric_name in dataset_group:
                logger.info(f"Dataset: {dataset_name}, Metric: {metric_name}")

我只是想告诉静态类型检查器，如果我遍历一个文件，我将得到字符串（它们是文件中组和数据集的键）。

我尝试创建这个.pyi存根来创建一个执行此操作的类，但出现错误，提示文件未定义。我猜这是因为 Pylance 现在仅依赖于我的存根，而不是在原始文件中查找额外的定义。

我已经通过 Claude 和 ChatGPT 尝试了很多不同的选项，但似乎无法弄清楚如何扩展类型提示，以便 Pylance 知道遍历h5py.File对象会产生字符串。

Pro Q

Asked: 2024-09-19 11:25:17 +0800 CST

如何对动态创建的数据类进行类型提示

6

我讨厌写两次东西，所以我想出了一个不错的方法来避免写两次东西。然而，这似乎破坏了我的类型提示：

from enum import Enum
from dataclasses import make_dataclass, field, dataclass

class DatasetNames(Enum):
    test1 = "test1_string"
    test2 = "test2_string"
    test3 = "test3_string"

def get_path(s: str) -> str:
    return s + "_path"

# the normal way to do this, but I have to type every new dataset name twice
# and there's a lot of duplicate code
@dataclass(frozen=True)
class StaticDatasetPaths:
    test1 = get_path("test1_string")
    test2 = get_path("test2_string")
    test3 = get_path("test3_string")

# mypy recognizes that `StaticDatasetPaths` is a class
# mypy recognizes that `StaticDatasetPaths.test2` is a string
print(StaticDatasetPaths.test2) # 'test2_string_path'

# this is my way of doing it, without having to type every new dataset name twice and no duplicate code
DynamicDatasetPaths = make_dataclass(
    'DynamicDatasetPaths', 
    [
        (
            name.name,
            str,
            field(default=get_path(name.value))
        )
        for name in DatasetNames
    ],
    frozen=True
)

# mypy thinks `DynamicDatasetPaths` is a `variable` of type `type`
# mypy thinks that `DynamicDatasetPaths.test2` is an `function` of type `Unknown`
print(DynamicDatasetPaths.test2) # 'test2_string_path'

我如何让 mypy 知道 DynamicDatasetPaths 是一个属性为字符串的冻结数据类？

通常，当我遇到这种情况时，我只需要使用 acast并告诉 mypy 正确的类型是什么，但我不知道“属性为字符串的冻结数据类”的正确类型。

（此外，如果有更好的方法来避免重复的代码，我也会很高兴听到。）

Pro Q

Asked: 2024-07-01 23:02:53 +0800 CST

为什么使用太长的布尔系列索引一系列值时不会引发警告？

6

我有以下代码：

import pandas as pd

series_source = pd.Series([1, 2, 3, 4], dtype=int)
normal_index = pd.Series([True, False, True, True], dtype=bool)
big_index = pd.Series([True, False, True, True, False, True], dtype=bool)

# Both indexes give back: pd.Series([1, 2, 3, 4], dtype=int)
# no warnings are raised!
assert (series_source[normal_index] == series_source[big_index]).all() 

df_source = pd.DataFrame(
    [
        [1, 2, 3, 4],
        [5, 6, 7, 8],
        [9, 10, 11, 12],
        [13, 14, 15, 16]
    ]
)

# no warning - works as expected: grabs rows 0, 2, and 3
df_normal_result = df_source[normal_index]

# UserWarning: Boolean Series key will be reindexed to match DataFrame index.
# (but still runs)
df_big_result = df_source[big_index]

# passes - they are equivalent
assert df_normal_result.equals(df_big_result)
print("Complete")

为什么使用索引series_source不会big_index引发警告，即使大索引的值比源多？ pandas 在后台做了什么来执行 Series 索引？

（与索引相比df_source，会引发明确警告，big_index需要重新索引才能使操作正常工作。）

在索引文档中，它声称：

使用布尔向量来索引 Series 的方式与 NumPy ndarray 完全相同

然而，如果我这样做

import numpy as np

a = np.array([1, 2, 3, 4, 5])
b = np.array([True, False, True, True, False])
c = np.array([True, False, True, True, False, True, True])

# returns an ndarray of [1,3, 4] as expected
print(a[b])

# raises IndexError: boolean index did not match indexed array along axis 0;
# size of axis is 5 but size of corresponding boolean axis is 7
print(a[c])

因此，此功能似乎与 Numpy 不匹配，正如文档所述。发生了什么？

（我的版本是pandas==2.2.2和numpy==2.0.0。）

Pro Q

Asked: 2024-06-28 10:14:41 +0800 CST

如何在索引 Pandas DataFrame 后预测结果类型

7

我有一只 Pandas DataFrame，定义如下：

df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Aritra'],
                   'Age': [25, 30, 35],
                   'Location': ['Seattle', 'New York', 'Kona']},
                  index=([10, 20, 30]))

但是，当我对此进行索引时DataFrame，我无法准确预测索引将产生什么类型的对象：

# (1) str
df.iloc[0, df.columns.get_loc('Name')]
# (2) Series
df.iloc[0:1, df.columns.get_loc('Name')]

# (3) Series
df.iloc[0:2, df.columns.get_loc('Name')]
# (4) DataFrame
df.iloc[0:2, df.columns.get_loc('Name'):df.columns.get_loc('Age')]

# (5) Series
df.iloc[0, df.columns.get_loc('Name'):df.columns.get_loc('Location')]
# (6) DataFrame
df.iloc[0:1, df.columns.get_loc('Name'):df.columns.get_loc('Location')]

请注意，上面的每一对都包含相同的数据。（例如，(2)一个包含单个字符串的 Series、(4)一个包含单个列的 DataFrame 等）

为什么它们会输出不同类型的对象？我如何预测将输出哪种类型的对象？

根据数据，规则似乎是基于索引中有多少个切片（冒号）：

0 切片（(1)）：标量值
1 片（(2)，(3)，(5)）：Series
2 片（(4)，(6)）：DataFrame

但是，我不确定这是否总是正确的，即使它总是正确的，我想知道为什么会这样背后的机制。

我花了一些时间查看索引文档，但它似乎没有清楚地描述此行为。该函数的文档iloc也没有描述返回类型。

我也对loc而不是的同一问题感兴趣iloc，但是，由于loc包括，结果并不那么令人困惑。（也就是说，您无法获得具有不同类型的索引对，而索引应该提取完全相同的数据。）

Pro Q

Asked: 2023-09-24 17:57:54 +0800 CST

df 列中的数字，但不在该列的列表版本中

6

我有以下代码：

    if 0 in df[RATING_COL]:
        rating_col_list = df[RATING_COL].to_list()
        assert 0 in rating_col_list

断言正在触发一个AssertionError. 这怎么可能？为什么列中有一个 0，但是当我将列转换为列表时，0 消失了？

我正在加载的数据帧基于 MovieLens-1M，如下所示：

user_id,item_id,rating
1,1193000,2
1,1193001,3
1,1193002,4
1,1193003,5
1,1193004,6
1,1193005,7
1,1193006,8
1,1193007,9
1,1193008,10
1,661000,6
1,661001,7
1,661002,8
1,661003,9
1,661004,10
1,661005,9
1,661006,8
1,661007,7
1,661008,6

在此格式中，1,1193008,10表示用户 1 对项目 1193 的评分为 8。10 表示这是评分，以 1193 开头的所有其他项目的评分将低于 10。（因此表示用户 1 对项目 661 的评分为 8）1,661004,10。 4.)

（另外，我已经用 CTRL-F 检查过：评级栏中没有 0 评级。）

Pro Q

Asked: 2023-09-24 16:36:45 +0800 CST

为什么 torch.cuda.OutOfMemoryError 不是有效的错误类？

5

我有以下代码：

        try:
            # faster, but requires more memory
            G = self.sparse.to_dense().t() @ self.sparse.to_dense()
        except torch.cuda.OutOfMemoryError:
            # slower, but requires less memory
            G = torch.sparse.mm(self.sparse.t(), self.sparse)

我的 pylance 似乎认为这torch.cuda.OutOfMemoryError不是一个有效的错误类。（见图。）

但是，当我运行代码时，torch.sparse.mm运行显示检测到异常。

为什么明明有效的pylance却认为它无效？

如何扩展存根文件中类的类型提示

如何对动态创建的数据类进行类型提示

为什么使用太长的布尔系列索引一系列值时不会引发警告？

如何在索引 Pandas DataFrame 后预测结果类型

df 列中的数字，但不在该列的列表版本中

为什么 torch.cuda.OutOfMemoryError 不是有效的错误类？

重新格式化数字，在固定位置插入分隔符

为什么 C++20 概念会导致循环约束错误，而老式的 SFINAE 不会？

VScode 自动卸载扩展的问题（Material 主题）

Vue 3：创建时出错“预期标识符但发现‘导入’”[重复]

具有指定基础类型但没有枚举器的“枚举类”的用途是什么？

如何修复未手动导入的模块的 MODULE_NOT_FOUND 错误？

`(表达式，左值) = 右值` 在 C 或 C++ 中是有效的赋值吗？为什么有些编译器会接受/拒绝它？

在 C++ 中，一个不执行任何操作的空程序需要 204KB 的堆，但在 C 中则不需要

PowerBI 目前与 BigQuery 不兼容：Simba 驱动程序与 Windows 更新有关

AdMob：MobileAds.initialize() - 对于某些设备，“java.lang.Integer 无法转换为 java.lang.String”

Pro Q's questions