从重复行中提取字符串，删除重复项，给出字符串计数[重复]

Question

TRK

Asked: 2025-02-09 06:26:15 +0800 CST2025-02-09 06:26:15 +0800 CST 2025-02-09 06:26:15 +0800 CST

如何在分组和聚合时自定义列名？

772

我有一个包含以下列的数据框：region_id，，，，，，和。nameparentparent_namet2md2mtp

我想以特定方式对列值进行分组和聚合。为了实现这一点，我定义了以下列表：

w_params = ['t2m', 't2m', 't2m', 'd2m', 'tp']
operation = ['max', 'min', 'mean', 'mean', 'sum']
common_cols = ['region_id', 'name', 'parent', 'parent_name']

我已经编写了函数来按和聚合agg_daily对列值进行分组。dateregion_id

def agg_daily(df, common_cols, w_params, operation):
    """
    Aggregate the data for each day.

    Parameters
    ----------
    df : pandas dataframe
        Dataframe containing daily data.

    Returns
    -------
    agg_daily_df : pandas dataframe
        Dataframe containing aggregated data for each day.

    """
    agg_daily_df = df.groupby(['date', 'region_id']).agg(
        name=('name', 'first'),
        parent=('parent', 'first'),
        parent_name=('parent_name', 'first'),
        t2m_max=('t2m', 'max'),
        t2m_min=('t2m', 'min'),
        t2m_mean=('t2m', 'mean'),
        d2m=('d2m', 'mean'),
        tp=('tp', 'sum')
    ).reset_index()
    agg_daily_df = agg_daily_df.sort_values(['region_id', 'date'], ascending=[True, True]).reset_index(drop=True)
    return agg_daily_df

但是，请注意agg_daily，中的参数（agg例如t2m_max）是硬编码的。相反，我想将、、作为参数传递给，避免硬编码，同时让函数执行所需的操作。t2m_mint2m_meancommon_colsw_paramsoperationagg_dailyagg_daily

请注意，对于属于的列common_cols，我不希望在最终输出中创建新的列名。但是，对于属于的列w_params，我希望创建一个与operation正在执行的相对应的列。

有人能帮助我获得一个可定制的功能吗？

2 个回答

Voted

sammywemmy · Answer 1 · 2025-02-09T11:58:35+08:00

解压由命名聚合中的 w_params 和操作配对创建的字典：

def agg_daily(df, common_cols, w_params, operation):
    mapped = zip(w_params, operation)
    mapped = {f"{col}_{func}": (col, func) for col, func in mapped}
    outcome = df.groupby(common_cols, as_index=False).agg(**mapped)
    return outcome

应用：

data = {'model': {0: 'Mazda RX4', 1: 'Mazda RX4 Wag', 2: 'Datsun 710'},
 'mpg': {0: 21.0, 1: 21.0, 2: 22.8},
 'cyl': {0: 6, 1: 6, 2: 4},
 'disp': {0: 160.0, 1: 160.0, 2: 108.0},
 'hp': {0: 110, 1: 110, 2: 93},
 'drat': {0: 3.9, 1: 3.9, 2: 3.85},
 'wt': {0: 2.62, 1: 2.875, 2: 2.32},
 'qsec': {0: 16.46, 1: 17.02, 2: 18.61},
 'vs': {0: 0, 1: 0, 2: 1},
 'am': {0: 1, 1: 1, 2: 1},
 'gear': {0: 4, 1: 4, 2: 4},
 'carb': {0: 4, 1: 4, 2: 1}}

mtcars = pd.DataFrame(data)
agg_daily(df=mtcars, 
          common_cols='cyl', 
          w_params=['disp','hp','drat'], 
          operation=['min','max','min'])
   cyl  disp_min  hp_max  drat_min
0    4     108.0      93      3.85
1    6     160.0     110      3.90

理想情况下，您会添加一些检查 - w_params 的长度应该与操作相同，操作中的条目应该是字符串（如果不是，您必须考虑如何获取名称 -.__name__()可能），...

Metin AKTAŞ · Answer 2 · 2025-02-09T08:58:18+08:00

这对你有用吗？

w_params = ['t2m', 't2m', 't2m', 'd2m', 'tp']
operation = ['max', 'min', 'mean', 'mean', 'sum']
common_cols = ['name', 'parent', 'parent_name']

def agg_df(df, common_cols, w_params, operation):

    # get list of col methods
    cols = pd.DataFrame(zip(w_params, operation), columns=["col","method"]).groupby("col").agg(list).reset_index()
    # create agg_dict and add common_cols methods
    aggs = pd.concat([cols,pd.DataFrame(zip(common_cols,len(common_cols)*[["first"]]), columns=["col","method"])], ignore_index=True)
    # aggregation with created dict
    result_df = df.groupby(['date', 'region_id']).agg(aggs.set_index("col").method).sort_values(['region_id', 'date'], ascending=[True, True]).reset_index()
    # and rename columns have multiindex
    have_multi_method_cols = aggs[aggs.method.apply(len) > 1].col.tolist()
    result_df.columns = result_df.columns.map(lambda x: x[0] if x[0] not in have_multi_method_cols else "_".join(x))
    # return df
    return result_df

agg_df(data, common_cols, w_params, operation)

如何在分组和聚合时自定义列名？

重新格式化数字，在固定位置插入分隔符

为什么 C++20 概念会导致循环约束错误，而老式的 SFINAE 不会？

VScode 自动卸载扩展的问题（Material 主题）

Vue 3：创建时出错“预期标识符但发现‘导入’”[重复]

具有指定基础类型但没有枚举器的“枚举类”的用途是什么？

如何修复未手动导入的模块的 MODULE_NOT_FOUND 错误？

`(表达式，左值) = 右值` 在 C 或 C++ 中是有效的赋值吗？为什么有些编译器会接受/拒绝它？

在 C++ 中，一个不执行任何操作的空程序需要 204KB 的堆，但在 C 中则不需要

PowerBI 目前与 BigQuery 不兼容：Simba 驱动程序与 Windows 更新有关

AdMob：MobileAds.initialize() - 对于某些设备，“java.lang.Integer 无法转换为 java.lang.String”

如何在分组和聚合时自定义列名？

2 个回答

相关问题