我有一张像这样的表格,显示每个客户在不同日期的付款:
| stream_datetime | customer_id | order_status | rn |
|:----------------:|:-----------:|:------------:|:--:|
| 04/06/20 11:19AM | 1 | completed | 1 |
| 05/06/20 10:54AM | 1 | completed | 2 |
| 06/06/20 10:59AM | 1 | completed | 3 |
| 08/06/20 09:27AM | 2 | failed | 1 |
| 09/06/20 11:02AM | 2 | failed | 2 |
| 01/11/20 05:59PM | 3 | completed | 1 |
| 02/11/20 05:59PM | 3 | completed | 2 |
| 03/11/20 10:01AM | 3 | cancelled | 3 |
| 04/11/20 09:20AM | 3 | completed | 4 |
| 05/11/20 10:25AM | 3 | completed | 5 |
| 01/13/20 03:29PM | 4 | completed | 1 |
| 02/13/20 03:29PM | 4 | completed | 2 |
| 03/13/20 03:29PM | 4 | cancelled | 3 |
| 04/13/20 03:29PM | 4 | completed | 4 |
| 05/13/20 03:29PM | 4 | completed | 5 |
| 06/13/20 03:29PM | 4 | completed | 6 |
| 07/13/20 03:29PM | 4 | completed | 7 |
| 08/13/20 03:29PM | 4 | cancelled | 8 |
| 06/20/20 03:29PM | 5 | failed | 1 |
| 07/20/20 03:29PM | 5 | completed | 2 |
| 08/20/20 03:29PM | 5 | completed | 3 |
| 09/20/20 03:29PM | 5 | failed | 4 |
| 10/20/20 03:29PM | 5 | completed | 5 |
我想计算一个客户canceled
他的计划的天差。
这里的挑战是客户端可以取消多次,因此客户端4
必须被计为churned
客户端两次,但客户端3
只会被计为一次churned
客户端。
我只想考虑order_status = completed
后面有(不一定在下个月)的客户order_status = cancelled
。
我也想创建一个名为purchase_day
记录付款日期的列。
Obs.:该列rn
表示特定客户端的行号。
编辑: 对不起。我犯了一些错误并写了问题。
也许order_status = cancelled
先到先得order_status = completed
。这是由于业务中的错误,但它可能发生。如果发生这种情况,那么我们不能将其视为流失客户。
所以这是我的预期结果(现在可以了):
| purchase_day | customer_id | lifetime|
|:----------------:|:-----------:|:-------:|
| 01/11/20 05:59PM | 3 | 60 |
| 01/13/20 03:29PM | 4 | 60 |
| 04/13/20 03:29PM | 4 | 122 |
如你看到的:
- 客户
1
从未取消(因此他不需要出现在结果中) - 客户
2
从未取消(因此他不需要出现在结果中) - 客户
3
取消了一次(他的生命周期等于 60 天) - 客户
4
取消了两次(一次为 60 天,另一次为 122 天) - 客户
5
从未取消(因此他不需要出现在结果中)