我正在尝试弄清楚如何在 Python 中创建与 SUMIF 等效的函数。我目前的解决方案有效,但效率太低,运行需要 20 分钟。
什么是最有效的方法来实现我想要的结果?
这是我目前正在做的事情,归结为一个非常简单的形式。在“实际”代码中,还有更多条件。
**sales_data customer_1**
Transactions | Product Dimension 4 | Product Dimension 2 | Product Dimension 3 | sum_of_sales
-------------- | ------------------- | ------------------- | --------------------| -------------
1 | 50 | F80 | ETQ546 | 80
2 | 50 | F80 | SAS978 | 20
3 | 50 | C36 | JBH148 | 10
4 | 50 | F80 | ETQ546 | 80
5 | 50 | F80 | SAS978 | 20
6 | 50 | C36 | JBH148 | 10
7 | 20 | A20 | OPW269 | 15
8 | 20 | A20 | DUW987 | 65
9 | 20 | v90 | OWQ897 | 47
**condition_types BEFORE ADDING SUMIF TO TABLE**
Transactions | Type | Product Dimensions |
-------------- | ------------------- | ------------------- |
customer_1 | ABC | 50 |
customer_1 | DEF | F80 |
customer_1 | GHI | JBH148 |
**condition_types AFTER ADDING SUMIF TO TABLE**
Transactions | Type | Product Dimensions | sum_of_sales
-------------- | ------------------- | ------------------- | -------------
customer_1 | ABC | 50 | 220
customer_1 | DEF | F80 | 200
customer_1 | GHI | JBH148 | 20
定义 sumif 函数
def sumif(row, value_column):
if row['Type'] == "ABC":
filtered_data = sales_data.loc[
(sales_data['Product_dimension_4'] == row['Product Dimensions'])
]
elif row['Type'] == "DEF" and row['Product Dimensions'] in sales_data['Product_dimension_2'].unique():
filtered_data = sales_data.loc[
(sales_data['Product_dimension_2'] == row['Product Dimensions'])
]
elif row['Type'] == "GHI" and row['Product Dimensions'] in sales_data['Product_dimension_3'].unique():
filtered_data = sales_data.loc[
(sales_data['Product_dimension_3'] == row['Product Dimensions'])
]
else:
return 0 # Return 0 instead of an empty string for consistency
return filtered_data[value_column].sum()
使用 loc 应用 sumif 函数
condition_types['sum_of_sales'] = condition_types.apply(lambda row: sumif(row, value_column="sum_of_sales"), axis=1)
我希望这足够清楚并且例子不太复杂。