如何将 for 循环拆分为 3 个单独的数据框？

Question

RKIDEV

Asked: 2025-04-29 12:26:20 +0800 CST2025-04-29 12:26:20 +0800 CST 2025-04-29 12:26:20 +0800 CST

如果存在并初始化了 2 个以上的数据框，则合并它们

772

我正在尝试使用 Intersection() 合并三个数据帧。如何在运行 Intersection() 之前检查所有数据帧是否存在/已初始化，而无需使用多个 if-else 检查块？如果任何数据帧未赋值，则在执行 Intersection() 时不要使用它。有时我会收到错误 - UnboundLocalError: 赋值前引用了局部变量“df_2”，因为 file2 不存在。

或者还有其他简单的方法可以实现以下目标吗？

以下是我的方法：

if os.path.exists(file1):
        df_1 = pd.read_csv(file1, header=None, names=header_1, sep=',', index_col=None)
if os.path.exists(file2):
        df_2 = pd.read_csv(file2, header=None, names=header_2, sep=',', index_col=None)
if os.path.exists(file3):
        df_3 = pd.read_csv(file3, header=None, names=header_3, sep=',', index_col=None)

common_columns = df_1.columns.intersection(df_2.columns).intersection(df_3.columns)
filtered_1 = df_1[common_columns]
filtered_2 = df_2[common_columns]
filtered_3 = df_3[common_columns]
concatenated_df = pd.concat([filtered_1, filtered_2, filtered_3], ignore_index=True)

2 个回答

Voted

Raymond · Answer 1 · 2025-04-29T12:44:52+08:00

Best Answer

Raymond

2025-04-29T12:44:52+08:002025-04-29T12:44:52+08:00

你的代码已经很好了。你当前版本中有很多重复的元素。为了使其更简洁，你可以使用列表推导式，例如[function(x) for x in a_list]

files = [file1, file2, file3]
headers = [header_1, header_2, header_3]

dfs = [pd.read_csv(f, header=None, names=h, sep=',') for f, h in zip(files, headers) if os.path.exists(f)]

if dfs:
    common_columns = set.intersection(*(set(df.columns) for df in dfs))
    concatenated_df = pd.concat([df[list(common_columns)] for df in dfs], ignore_index=True)
else:
    concatenated_df = pd.DataFrame()

1

suhail · Answer 2 · 2025-04-29T12:38:43+08:00

当处理可能并非始终可用的文件时，您可以使用此方法安全地合并现有的 DataFrame，而不会遇到初始化错误。该解决方案会动态加载可用数据，识别所有成功加载的数据集中的公共列，并自动合并它们：

import os
import pandas as pd
from functools import reduce

# Update these with your actual file paths and column headers
file_config = [
    ('data/source1.csv', ['id', 'name', 'date']),
    ('data/source2.csv', ['id', 'value', 'date']),
    ('data/source3.csv', ['id', 'category', 'notes'])
]

def safe_dataframe_merge():
    """Handles DataFrame merging with missing file tolerance"""
    loaded_sets = []
    
    # Load available files
    for path, headers in file_config:
        if os.path.exists(path):
            loaded_sets.append(
                pd.read_csv(path, header=None, names=headers)
            )
    
    # Exit early if no data found
    if not loaded_sets:
        return pd.DataFrame()
    
    # Find columns common to all loaded DataFrames
    common_fields = reduce(
        lambda x, y: x.intersection(y),
        (df.columns for df in loaded_sets)
    )
    
    # Combine data while preserving structure
    return pd.concat(
        [df[common_fields] for df in loaded_sets],
        ignore_index=True
    )

# Usage example
merged_data = safe_dataframe_merge()

此实现会在加载前检查文件是否存在，完全跳过所有缺失的数据源，并确保仅合并所有可用数据集中存在的列。该reduce操作能够高效地查找所有已加载 DataFrame 之间的共同列，而基于列表的方法则可防止对未初始化变量的引用错误。如果所有文件都不存在，它会优雅地返回一个空的 DataFrame，而不是抛出错误。您可以修改列表file_config来添加或删除数据源，而无需更改核心逻辑。

如果存在并初始化了 2 个以上的数据框，则合并它们

重新格式化数字，在固定位置插入分隔符

为什么 C++20 概念会导致循环约束错误，而老式的 SFINAE 不会？

VScode 自动卸载扩展的问题（Material 主题）

Vue 3：创建时出错“预期标识符但发现‘导入’”[重复]

具有指定基础类型但没有枚举器的“枚举类”的用途是什么？

如何修复未手动导入的模块的 MODULE_NOT_FOUND 错误？

`(表达式，左值) = 右值` 在 C 或 C++ 中是有效的赋值吗？为什么有些编译器会接受/拒绝它？

在 C++ 中，一个不执行任何操作的空程序需要 204KB 的堆，但在 C 中则不需要

PowerBI 目前与 BigQuery 不兼容：Simba 驱动程序与 Windows 更新有关

AdMob：MobileAds.initialize() - 对于某些设备，“java.lang.Integer 无法转换为 java.lang.String”

如果存在并初始化了 2 个以上的数据框，则合并它们

2 个回答

相关问题