我有两个pl.DataFrame
:
from datetime import date
import polars as pl
df1 = pl.DataFrame(
{
"symbol": [
"sec1", "sec1", "sec1", "sec1", "sec1", "sec1",
"sec2", "sec2", "sec2", "sec2", "sec2",
],
"date": [
date(2021, 9, 14),
date(2021, 9, 15),
date(2021, 9, 16),
date(2021, 9, 17),
date(2021, 8, 31),
date(2020, 12, 31),
date(2021, 9, 14),
date(2021, 9, 15),
date(2021, 8, 31),
date(2021, 12, 30),
date(2020, 12, 31),
],
"price": range(11),
}
)
df2 = pl.DataFrame(
{
"symbol": ["sec1", "sec2"],
"current_date": [date(2021, 9, 17), date(2021, 9, 15)],
"mtd": [date(2021, 8, 31), date(2021, 8, 31)],
"ytd": [date(2020, 12, 31), date(2020, 12, 30)],
}
)
with pl.Config(tbl_rows=-1):
print(df1)
print(df2)
shape: (11, 3)
┌────────┬────────────┬───────┐
│ symbol ┆ date ┆ price │
│ --- ┆ --- ┆ --- │
│ str ┆ date ┆ i64 │
╞════════╪════════════╪═══════╡
│ sec1 ┆ 2021-09-14 ┆ 0 │
│ sec1 ┆ 2021-09-15 ┆ 1 │
│ sec1 ┆ 2021-09-16 ┆ 2 │
│ sec1 ┆ 2021-09-17 ┆ 3 │
│ sec1 ┆ 2021-08-31 ┆ 4 │
│ sec1 ┆ 2020-12-31 ┆ 5 │
│ sec2 ┆ 2021-09-14 ┆ 6 │
│ sec2 ┆ 2021-09-15 ┆ 7 │
│ sec2 ┆ 2021-08-31 ┆ 8 │
│ sec2 ┆ 2021-12-30 ┆ 9 │
│ sec2 ┆ 2020-12-31 ┆ 10 │
└────────┴────────────┴───────┘
shape: (2, 4)
┌────────┬──────────────┬────────────┬────────────┐
│ symbol ┆ current_date ┆ mtd ┆ ytd │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ date ┆ date ┆ date │
╞════════╪══════════════╪════════════╪════════════╡
│ sec1 ┆ 2021-09-17 ┆ 2021-08-31 ┆ 2020-12-31 │
│ sec2 ┆ 2021-09-15 ┆ 2021-08-31 ┆ 2020-12-30 │
└────────┴──────────────┴────────────┴────────────┘
我需要筛选df1
每个组的价格,并根据相应的日期进行筛选df2
。我需要合并所有类型的列date
。这些列的数量df2
可能不固定。
我正在寻找以下结果:
shape: (11, 3)
┌────────┬────────────┬───────┐
│ symbol ┆ date ┆ price │
│ --- ┆ --- ┆ --- │
│ str ┆ date ┆ i64 │
╞════════╪════════════╪═══════╡
│ sec1 ┆ 2021-09-17 ┆ 3 │
│ sec1 ┆ 2021-08-31 ┆ 4 │
│ sec1 ┆ 2020-12-31 ┆ 5 │
│ sec2 ┆ 2021-09-15 ┆ 7 │
│ sec2 ┆ 2021-08-31 ┆ 8 │
│ sec2 ┆ 2020-12-30 ┆ 9 │
└────────┴────────────┴───────┘
我原本想df1
通过进行筛选,然后对 的每一列symbol
执行连接操作。然后我会将结果数据框连接起来。不过,可能还有更优雅的解决方案。date
df2
unpivot
然后您可以join
:输出: