AskOverflow.Dev

AskOverflow.Dev Logo AskOverflow.Dev Logo

AskOverflow.Dev Navigation

  • 主页
  • 系统&网络
  • Ubuntu
  • Unix
  • DBA
  • Computer
  • Coding
  • LangChain

Mobile menu

Close
  • 主页
  • 系统&网络
    • 最新
    • 热门
    • 标签
  • Ubuntu
    • 最新
    • 热门
    • 标签
  • Unix
    • 最新
    • 标签
  • DBA
    • 最新
    • 标签
  • Computer
    • 最新
    • 标签
  • Coding
    • 最新
    • 标签
主页 / coding / 问题 / 79450810
Accepted
khteh
khteh
Asked: 2025-02-19 17:32:13 +0800 CST2025-02-19 17:32:13 +0800 CST 2025-02-19 17:32:13 +0800 CST

Pandas 按多列分组,聚合某些列,添加每组的计数列[重复]

  • 772
这个问题已经有答案了:
Pandas groupBy 多列和聚合 (1 个答案)
20 小时前关闭。

我正在处理的数据:

data (140631115432592), ndim: 2, size: 3947910, shape: (232230, 17)
VIN (1-10)                                            object
County                                                object
City                                                  object
State                                                 object
Postal Code                                          float64
Model Year                                             int64
Make                                                  object
Model                                                 object
Electric Vehicle Type                                 object
Clean Alternative Fuel Vehicle (CAFV) Eligibility     object
Electric Range                                       float64
Base MSRP                                            float64
Legislative District                                 float64
DOL Vehicle ID                                         int64
Vehicle Location                                      object
Electric Utility                                      object
2020 Census Tract                                    float64
dtype: object
   VIN (1-10)    County      City State  Postal Code  ...  Legislative District DOL Vehicle ID             Vehicle Location                               Electric Utility 2020 Census Tract
0  2T3YL4DV0E      King  Bellevue    WA      98005.0  ...                  41.0      186450183   POINT (-122.1621 47.64441)  PUGET SOUND ENERGY INC||CITY OF TACOMA - (WA)      5.303302e+10
1  5YJ3E1EB6K      King   Bothell    WA      98011.0  ...                   1.0      478093654  POINT (-122.20563 47.76144)  PUGET SOUND ENERGY INC||CITY OF TACOMA - (WA)      5.303302e+10
2  5UX43EU02S  Thurston   Olympia    WA      98502.0  ...                  35.0      274800718  POINT (-122.92333 47.03779)                         PUGET SOUND ENERGY INC      5.306701e+10
3  JTMAB3FV5R  Thurston   Olympia    WA      98513.0  ...                   2.0      260758165  POINT (-122.81754 46.98876)                         PUGET SOUND ENERGY INC      5.306701e+10
4  5YJYGDEE8M    Yakima     Selah    WA      98942.0  ...                  15.0      236581355  POINT (-120.53145 46.65405)                                     PACIFICORP      5.307700e+10

csv 格式的数据:

VIN (1-10),County,City,State,Postal Code,Model Year,Make,Model,Electric Vehicle Type,Clean Alternative Fuel Vehicle (CAFV) Eligibility,Electric Range,Base MSRP,Legislative District,DOL Vehicle ID,Vehicle Location,Electric Utility,2020 Census Tract
2T3YL4DV0E,King,Bellevue,WA,98005,2014,TOYOTA,RAV4,Battery Electric Vehicle (BEV),Clean Alternative Fuel Vehicle Eligible,103,0,41,186450183,POINT (-122.1621 47.64441),PUGET SOUND ENERGY INC||CITY OF TACOMA - (WA),53033023604
5YJ3E1EB6K,King,Bothell,WA,98011,2019,TESLA,MODEL 3,Battery Electric Vehicle (BEV),Clean Alternative Fuel Vehicle Eligible,220,0,1,478093654,POINT (-122.20563 47.76144),PUGET SOUND ENERGY INC||CITY OF TACOMA - (WA),53033022102
5UX43EU02S,Thurston,Olympia,WA,98502,2025,BMW,X5,Plug-in Hybrid Electric Vehicle (PHEV),Clean Alternative Fuel Vehicle Eligible,40,0,35,274800718,POINT (-122.92333 47.03779),PUGET SOUND ENERGY INC,53067011902
JTMAB3FV5R,Thurston,Olympia,WA,98513,2024,TOYOTA,RAV4 PRIME,Plug-in Hybrid Electric Vehicle (PHEV),Clean Alternative Fuel Vehicle Eligible,42,0,2,260758165,POINT (-122.81754 46.98876),PUGET SOUND ENERGY INC,53067012332
5YJYGDEE8M,Yakima,Selah,WA,98942,2021,TESLA,MODEL Y,Battery Electric Vehicle (BEV),Eligibility unknown as battery range has not been researched,0,0,15,236581355,POINT (-120.53145 46.65405),PACIFICORP,53077003200
3C3CFFGE1G,Thurston,Olympia,WA,98501,2016,FIAT,500,Battery Electric Vehicle (BEV),Clean Alternative Fuel Vehicle Eligible,84,0,22,294762219,POINT (-122.89166 47.03956),PUGET SOUND ENERGY INC,53067010802
5YJ3E1EA4J,Snohomish,Marysville,WA,98271,2018,TESLA,MODEL 3,Battery Electric Vehicle (BEV),Clean Alternative Fuel Vehicle Eligible,215,0,39,270125096,POINT (-122.1677 48.11026),PUGET SOUND ENERGY INC,53061052808
5YJ3E1EA3K,King,Seattle,WA,98102,2019,TESLA,MODEL 3,Battery Electric Vehicle (BEV),Clean Alternative Fuel Vehicle Eligible,220,0,43,238776492,POINT (-122.32427 47.63433),CITY OF SEATTLE - (WA)|CITY OF TACOMA - (WA),53033006600
1N4AZ0CP5E,Thurston,Yelm,WA,98597,2014,NISSAN,LEAF,Battery Electric Vehicle (BEV),Clean Alternative Fuel Vehicle Eligible,84,0,2,257246118,POINT (-122.60735 46.94239),PUGET SOUND ENERGY INC,53067012421

过滤和分组:

filt = (data["Model Year"] >= 2018) & (data["Electric Vehicle Type"] == "Battery Electric Vehicle (BEV)")
data = data[filt].groupby(["State", "Make"], sort=False, observed=True, as_index=False).agg( avg_electric_range=pd.NamedAgg(column="Electric Range", aggfunc="mean"), oldest_model_year=pd.NamedAgg(column="Model Year", aggfunc="min"))

目前它产生下表:

  State       Make  avg_electric_range  oldest_model_year
0    WA      TESLA           52.143448               2018
1    WA     NISSAN           60.051874               2018
<snip>

如何添加一Count列来显示每个组的数量,以便进一步过滤?注意:排除apply所有东西,因为一切都应该留在 Pandas'land 中。

python
  • 2 2 个回答
  • 40 Views

2 个回答

  • Voted
  1. Best Answer
    mozway
    2025-02-19T17:43:14+08:002025-02-19T17:43:14+08:00

    您的问题将受益于一个最少的可重现的例子。

    也就是说,只要您没有缺失值,计数实际上并不依赖于特定的列,因此选择任何符合该标准的列并添加另一个聚合(您可以使用其中一个分组列,或者Model Year因为您知道它必须是一个有效数字):

    out = (data[filt].groupby(["State", "Make"], sort=False, observed=True, as_index=False)
            .agg(avg_electric_range=pd.NamedAgg(column="Electric Range", aggfunc="mean"),
                 oldest_model_year=pd.NamedAgg(column="Model Year", aggfunc="min"),
                 count=pd.NamedAgg(column="Model Year", aggfunc="count"),
                )
           )
    

    示例输出:

      State Make  avg_electric_range  oldest_model_year  count
    0    WA    X                 0.5               2018      2
    1    WA    Y                 3.0               2018      3
    
    • 1
  2. IlBuonTini
    2025-02-19T17:42:52+08:002025-02-19T17:42:52+08:00

    我不确定这是否是您想要的,但您可以创建一个自定义聚合函数,例如:

    pd.NamedAgg(column="Model Year", aggfunc=lambda x: np.count(x))
    

    或者

    pd.NamedAgg(column="Model Year", aggfunc=lambda x: len(x))
    
    • 0

相关问题

  • 如何将 for 循环拆分为 3 个单独的数据框?

  • 如何检查 Pandas DataFrame 中的所有浮点列是否近似相等或接近

  • “load_dataset”如何工作,因为它没有检测示例文件?

  • 为什么 pandas.eval() 字符串比较返回 False

  • Python tkinter/ ttkboostrap dateentry 在只读状态下不起作用

Sidebar

Stats

  • 问题 205573
  • 回答 270741
  • 最佳答案 135370
  • 用户 68524
  • 热门
  • 回答
  • Marko Smith

    重新格式化数字,在固定位置插入分隔符

    • 6 个回答
  • Marko Smith

    为什么 C++20 概念会导致循环约束错误,而老式的 SFINAE 不会?

    • 2 个回答
  • Marko Smith

    VScode 自动卸载扩展的问题(Material 主题)

    • 2 个回答
  • Marko Smith

    Vue 3:创建时出错“预期标识符但发现‘导入’”[重复]

    • 1 个回答
  • Marko Smith

    具有指定基础类型但没有枚举器的“枚举类”的用途是什么?

    • 1 个回答
  • Marko Smith

    如何修复未手动导入的模块的 MODULE_NOT_FOUND 错误?

    • 6 个回答
  • Marko Smith

    `(表达式,左值) = 右值` 在 C 或 C++ 中是有效的赋值吗?为什么有些编译器会接受/拒绝它?

    • 3 个回答
  • Marko Smith

    在 C++ 中,一个不执行任何操作的空程序需要 204KB 的堆,但在 C 中则不需要

    • 1 个回答
  • Marko Smith

    PowerBI 目前与 BigQuery 不兼容:Simba 驱动程序与 Windows 更新有关

    • 2 个回答
  • Marko Smith

    AdMob:MobileAds.initialize() - 对于某些设备,“java.lang.Integer 无法转换为 java.lang.String”

    • 1 个回答
  • Martin Hope
    Fantastic Mr Fox msvc std::vector 实现中仅不接受可复制类型 2025-04-23 06:40:49 +0800 CST
  • Martin Hope
    Howard Hinnant 使用 chrono 查找下一个工作日 2025-04-21 08:30:25 +0800 CST
  • Martin Hope
    Fedor 构造函数的成员初始化程序可以包含另一个成员的初始化吗? 2025-04-15 01:01:44 +0800 CST
  • Martin Hope
    Petr Filipský 为什么 C++20 概念会导致循环约束错误,而老式的 SFINAE 不会? 2025-03-23 21:39:40 +0800 CST
  • Martin Hope
    Catskul C++20 是否进行了更改,允许从已知绑定数组“type(&)[N]”转换为未知绑定数组“type(&)[]”? 2025-03-04 06:57:53 +0800 CST
  • Martin Hope
    Stefan Pochmann 为什么 {2,3,10} 和 {x,3,10} (x=2) 的顺序不同? 2025-01-13 23:24:07 +0800 CST
  • Martin Hope
    Chad Feller 在 5.2 版中,bash 条件语句中的 [[ .. ]] 中的分号现在是可选的吗? 2024-10-21 05:50:33 +0800 CST
  • Martin Hope
    Wrench 为什么双破折号 (--) 会导致此 MariaDB 子句评估为 true? 2024-05-05 13:37:20 +0800 CST
  • Martin Hope
    Waket Zheng 为什么 `dict(id=1, **{'id': 2})` 有时会引发 `KeyError: 'id'` 而不是 TypeError? 2024-05-04 14:19:19 +0800 CST
  • Martin Hope
    user924 AdMob:MobileAds.initialize() - 对于某些设备,“java.lang.Integer 无法转换为 java.lang.String” 2024-03-20 03:12:31 +0800 CST

热门标签

python javascript c++ c# java typescript sql reactjs html

Explore

  • 主页
  • 问题
    • 最新
    • 热门
  • 标签
  • 帮助

Footer

AskOverflow.Dev

关于我们

  • 关于我们
  • 联系我们

Legal Stuff

  • Privacy Policy

Language

  • Pt
  • Server
  • Unix

© 2023 AskOverflow.DEV All Rights Reserve