我正在处理的数据:
data (140631115432592), ndim: 2, size: 3947910, shape: (232230, 17)
VIN (1-10) object
County object
City object
State object
Postal Code float64
Model Year int64
Make object
Model object
Electric Vehicle Type object
Clean Alternative Fuel Vehicle (CAFV) Eligibility object
Electric Range float64
Base MSRP float64
Legislative District float64
DOL Vehicle ID int64
Vehicle Location object
Electric Utility object
2020 Census Tract float64
dtype: object
VIN (1-10) County City State Postal Code ... Legislative District DOL Vehicle ID Vehicle Location Electric Utility 2020 Census Tract
0 2T3YL4DV0E King Bellevue WA 98005.0 ... 41.0 186450183 POINT (-122.1621 47.64441) PUGET SOUND ENERGY INC||CITY OF TACOMA - (WA) 5.303302e+10
1 5YJ3E1EB6K King Bothell WA 98011.0 ... 1.0 478093654 POINT (-122.20563 47.76144) PUGET SOUND ENERGY INC||CITY OF TACOMA - (WA) 5.303302e+10
2 5UX43EU02S Thurston Olympia WA 98502.0 ... 35.0 274800718 POINT (-122.92333 47.03779) PUGET SOUND ENERGY INC 5.306701e+10
3 JTMAB3FV5R Thurston Olympia WA 98513.0 ... 2.0 260758165 POINT (-122.81754 46.98876) PUGET SOUND ENERGY INC 5.306701e+10
4 5YJYGDEE8M Yakima Selah WA 98942.0 ... 15.0 236581355 POINT (-120.53145 46.65405) PACIFICORP 5.307700e+10
csv 格式的数据:
VIN (1-10),County,City,State,Postal Code,Model Year,Make,Model,Electric Vehicle Type,Clean Alternative Fuel Vehicle (CAFV) Eligibility,Electric Range,Base MSRP,Legislative District,DOL Vehicle ID,Vehicle Location,Electric Utility,2020 Census Tract
2T3YL4DV0E,King,Bellevue,WA,98005,2014,TOYOTA,RAV4,Battery Electric Vehicle (BEV),Clean Alternative Fuel Vehicle Eligible,103,0,41,186450183,POINT (-122.1621 47.64441),PUGET SOUND ENERGY INC||CITY OF TACOMA - (WA),53033023604
5YJ3E1EB6K,King,Bothell,WA,98011,2019,TESLA,MODEL 3,Battery Electric Vehicle (BEV),Clean Alternative Fuel Vehicle Eligible,220,0,1,478093654,POINT (-122.20563 47.76144),PUGET SOUND ENERGY INC||CITY OF TACOMA - (WA),53033022102
5UX43EU02S,Thurston,Olympia,WA,98502,2025,BMW,X5,Plug-in Hybrid Electric Vehicle (PHEV),Clean Alternative Fuel Vehicle Eligible,40,0,35,274800718,POINT (-122.92333 47.03779),PUGET SOUND ENERGY INC,53067011902
JTMAB3FV5R,Thurston,Olympia,WA,98513,2024,TOYOTA,RAV4 PRIME,Plug-in Hybrid Electric Vehicle (PHEV),Clean Alternative Fuel Vehicle Eligible,42,0,2,260758165,POINT (-122.81754 46.98876),PUGET SOUND ENERGY INC,53067012332
5YJYGDEE8M,Yakima,Selah,WA,98942,2021,TESLA,MODEL Y,Battery Electric Vehicle (BEV),Eligibility unknown as battery range has not been researched,0,0,15,236581355,POINT (-120.53145 46.65405),PACIFICORP,53077003200
3C3CFFGE1G,Thurston,Olympia,WA,98501,2016,FIAT,500,Battery Electric Vehicle (BEV),Clean Alternative Fuel Vehicle Eligible,84,0,22,294762219,POINT (-122.89166 47.03956),PUGET SOUND ENERGY INC,53067010802
5YJ3E1EA4J,Snohomish,Marysville,WA,98271,2018,TESLA,MODEL 3,Battery Electric Vehicle (BEV),Clean Alternative Fuel Vehicle Eligible,215,0,39,270125096,POINT (-122.1677 48.11026),PUGET SOUND ENERGY INC,53061052808
5YJ3E1EA3K,King,Seattle,WA,98102,2019,TESLA,MODEL 3,Battery Electric Vehicle (BEV),Clean Alternative Fuel Vehicle Eligible,220,0,43,238776492,POINT (-122.32427 47.63433),CITY OF SEATTLE - (WA)|CITY OF TACOMA - (WA),53033006600
1N4AZ0CP5E,Thurston,Yelm,WA,98597,2014,NISSAN,LEAF,Battery Electric Vehicle (BEV),Clean Alternative Fuel Vehicle Eligible,84,0,2,257246118,POINT (-122.60735 46.94239),PUGET SOUND ENERGY INC,53067012421
过滤和分组:
filt = (data["Model Year"] >= 2018) & (data["Electric Vehicle Type"] == "Battery Electric Vehicle (BEV)")
data = data[filt].groupby(["State", "Make"], sort=False, observed=True, as_index=False).agg( avg_electric_range=pd.NamedAgg(column="Electric Range", aggfunc="mean"), oldest_model_year=pd.NamedAgg(column="Model Year", aggfunc="min"))
目前它产生下表:
State Make avg_electric_range oldest_model_year
0 WA TESLA 52.143448 2018
1 WA NISSAN 60.051874 2018
<snip>
如何添加一Count
列来显示每个组的数量,以便进一步过滤?注意:排除apply
所有东西,因为一切都应该留在 Pandas'land 中。
您的问题将受益于一个最少的可重现的例子。
也就是说,只要您没有缺失值,计数实际上并不依赖于特定的列,因此选择任何符合该标准的列并添加另一个聚合(您可以使用其中一个分组列,或者
Model Year
因为您知道它必须是一个有效数字):示例输出:
我不确定这是否是您想要的,但您可以创建一个自定义聚合函数,例如:
或者