我在 PySpark 代码中创建了以下数据框:
+---------------+-------------+---------------+------+
|TransactionDate|AccountNumber|TransactionType|Amount|
+---------------+-------------+---------------+------+
| 2023-01-01| 100| Credit| 1000|
| 2023-01-02| 100| Credit| 1500|
| 2023-01-03| 100| Debit| 1000|
| 2023-01-02| 200| Credit| 3500|
| 2023-01-03| 200| Debit| 2000|
| 2023-01-04| 200| Credit| 3500|
| 2023-01-13| 300| Credit| 4000|
| 2023-01-14| 300| Debit| 4500|
| 2023-01-15| 300| Credit| 5000|
+---------------+-------------+---------------+------+
我需要将另一列打印为CurrentBalance。
预期输出:
+---------------+-------------+---------------+------+--------------+
|TransactionDate|AccountNumber|TransactionType|Amount|CurrentBalance|
+---------------+-------------+---------------+------+--------------+
| 2023-01-01| 100| Credit| 1000| 1000|
| 2023-01-02| 100| Credit| 1500| 2500|
| 2023-01-03| 100| Debit| 1000| 1500|
| 2023-01-02| 200| Credit| 3500| 3500|
| 2023-01-03| 200| Debit| 2000| 1500|
| 2023-01-04| 200| Credit| 3500| 5000|
| 2023-01-13| 300| Credit| 4000| 4000|
| 2023-01-14| 300| Debit| 4500| -500|
| 2023-01-15| 300| Credit| 5000| 1000|
+---------------+-------------+---------------+------+--------------+
我已尝试使用最小日期并在条件中传递日期来计算贷方和借方,但似乎不起作用。
# Find minimum date in TransactionDate column, grouped by AccountNumber column
df_new.groupBy('AccountNumber').agg(f.min('TransactionDate').alias('min_date'))