我正在将一些代码从 迁移Pandas
到Polars
。我尝试使用cut
但polars
存在差异(没有bin
,所以我必须计算它)。
label
但我还是不明白极坐标的结果。
我必须使用比我想要的更多的标签才能获得相同的结果pandas
。
import numpy as np
import pandas as pd
import polars as pl
# Exemple de DataFrame Polars
data = {
"value": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
}
df_pl = pl.DataFrame(data)
# Convertir en DataFrame Pandas pour obtenir les breakpoints
df_pd = df_pl.to_pandas()
# Use returbins to get the breakpoints (from pandas)
df_pd["cut_label_pd"], breakpoints = pd.cut(df_pd["value"], 4, labels=["low", "medium", "hight", "very high"], retbins=True)
print(pl.from_pandas(df_pd))
shape: (10, 2)
┌───────┬──────────────┐
│ value ┆ cut_label_pd │
│ --- ┆ --- │
│ i64 ┆ cat │
╞═══════╪══════════════╡
│ 1 ┆ low │
│ 2 ┆ low │
│ 3 ┆ low │
│ 4 ┆ medium │
│ 5 ┆ medium │
│ 6 ┆ hight │
│ 7 ┆ hight │
│ 8 ┆ very high │
│ 9 ┆ very high │
│ 10 ┆ very high │
└───────┴──────────────┘
print(breakpoints)
# [ 0.991 3.25 5.5 7.75 10. ]
labels
有没有更好的方法?(注意中的值polars
cut
)
# Cut in polars
labels = ["don't use it", "low", "medium", "hight", "very high", "don't use it too"]
df_pl = df_pl.with_columns(
pl.col("value").cut(breaks=breakpoints, labels=labels).alias("cut_label_pl")
)
print(df_pl)
shape: (10, 2)
┌───────┬──────────────┐
│ value ┆ cut_label_pl │
│ --- ┆ --- │
│ i64 ┆ cat │
╞═══════╪══════════════╡
│ 1 ┆ low │
│ 2 ┆ low │
│ 3 ┆ low │
│ 4 ┆ medium │
│ 5 ┆ medium │
│ 6 ┆ hight │
│ 7 ┆ hight │
│ 8 ┆ very high │
│ 9 ┆ very high │
│ 10 ┆ very high │
└───────┴──────────────┘