从重复行中提取字符串，删除重复项，给出字符串计数[重复]

Question

Pablo

Asked: 2024-12-23 22:19:58 +0800 CST2024-12-23 22:19:58 +0800 CST 2024-12-23 22:19:58 +0800 CST

Pandas Dataframe：创建一个新列并根据其他列上的两个条件语句填充值

772

我编写了这个脚本，可以根据满足两个条件的值创建新列。

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df=pd.DataFrame()

df['variable 1']= np.arange(0,1.1,0.1)
df['variable 2']= 0.2*df['variable 1']
df['variable 3']= 0.4 -0.2*df['variable 1']


# Create new columns 

slope = [2, 1.5, 1, 0.5]

for i in range(len(slope)):

    df['slope = ' + str(slope[i])]=''
    for j in range(len(df['variable 1'])):
    # Calculating Scl_disp_sd with equation 1
        curve = 0.5 - slope[i]*df['variable 1'][j]
        df['slope = ' + str(slope[i])][j]= np.where((curve>df['variable 2'][j]) & (curve<df['variable 3'][j]), curve,np.nan)

display(df)

plt.plot(df['variable 1'], df['variable 2'], 'o', label='variable 2')
plt.plot(df['variable 1'], df['variable 3'], 'o', label='variable 3')
plt.plot(df['variable 1'], df.filter(like='slope =', axis=1), marker='.')
plt.legend()

在此处输入图片描述

该脚本有效，但是我收到以下消息：

/var/folders/m0/_y1fs5x50xx99pjg2yf42y7r0000gp/T/ipykernel_1964/2618301266.py:11: FutureWarning: ChainedAssignmentError: behaviour will change in pandas 3.0!
You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:

df["col"][row_indexer] = value

Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  df['slope = ' + str(slope[i])][j]= np.where((curve>df['variable 2'][j]) & (curve<df['variable 3'][j]),
/var/folders/m0/_y1fs5x50xx99pjg2yf42y7r0000gp/T/ipykernel_1964/2618301266.py:11: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['slope = ' + str(slope[i])][j]= np.where((curve>df['variable 2'][j]) & (curve<df['variable 3'][j]),
/var/folders/m0/_y1fs5x50xx99pjg2yf42y7r0000gp/T/ipykernel_1964/2618301266.py:11: FutureWarning: ChainedAssignmentError: behaviour will change in pandas 3.0!
You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:

df["col"][row_indexer] = value

Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
...
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['slope = ' + str(slope[i])][j]= np.where((curve>df['variable 2'][j]) & (curve<df['variable 3'][j]),
Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...

如果有人能有其他想法来编写此脚本以避免出现此消息，我将不胜感激

1 个回答

Voted

mozway · Answer 1 · 2024-12-23T22:34:09+08:00

不需要嵌套循环。只需将您的操作作为向量应用即可：

slope = [2, 1.5, 1, 0.5]

for i in range(len(slope)):
    curve = 0.5 - slope[i]*df['variable 1']
    df['slope = ' + str(slope[i])] = np.where((curve>df['variable 2'])
                                               & (curve<df['variable 3']),
                                              curve,np.nan)

或者使用numpy的完整矢量：

curve = 0.5 - slope*df['variable 1'].to_numpy()[:, None]
cols = [f'slope = {c}' for c in slope]
df[cols] = np.where(  (curve > df[['variable 2']].to_numpy())
                    & (curve < df[['variable 3']].to_numpy()),
                    curve, np.nan)

输出：

    variable 1  variable 2  variable 3  slope = 2  slope = 1.5  slope = 1  slope = 0.5
0          0.0        0.00        0.40        NaN          NaN        NaN          NaN
1          0.1        0.02        0.38        0.3         0.35        NaN          NaN
2          0.2        0.04        0.36        0.1         0.20        0.3          NaN
3          0.3        0.06        0.34        NaN          NaN        0.2          NaN
4          0.4        0.08        0.32        NaN          NaN        0.1         0.30
5          0.5        0.10        0.30        NaN          NaN        NaN         0.25
6          0.6        0.12        0.28        NaN          NaN        NaN         0.20
7          0.7        0.14        0.26        NaN          NaN        NaN         0.15
8          0.8        0.16        0.24        NaN          NaN        NaN          NaN
9          0.9        0.18        0.22        NaN          NaN        NaN          NaN
10         1.0        0.20        0.20        NaN          NaN        NaN          NaN

Pandas Dataframe：创建一个新列并根据其他列上的两个条件语句填充值

Vue 3：创建时出错“预期标识符但发现‘导入’”[重复]

为什么这个简单而小的 Java 代码在所有 Graal JVM 上的运行速度都快 30 倍，但在任何 Oracle JVM 上却不行？

具有指定基础类型但没有枚举器的“枚举类”的用途是什么？

如何修复未手动导入的模块的 MODULE_NOT_FOUND 错误？

`(表达式，左值) = 右值` 在 C 或 C++ 中是有效的赋值吗？为什么有些编译器会接受/拒绝它？

何时应使用 std::inplace_vector 而不是 std::vector？

在 C++ 中，一个不执行任何操作的空程序需要 204KB 的堆，但在 C 中则不需要

PowerBI 目前与 BigQuery 不兼容：Simba 驱动程序与 Windows 更新有关

AdMob：MobileAds.initialize() - 对于某些设备，“java.lang.Integer 无法转换为 java.lang.String”

我正在尝试仅使用海龟随机和数学模块来制作吃豆人游戏

Pandas Dataframe：创建一个新列并根据其他列上的两个条件语句填充值

1 个回答

相关问题