PeCaDe提出的问题 -coding

PeCaDe

Asked: 2025-02-12 22:36:26 +0800 CST

lightgbm 强制变量拆分

8

我正在尝试找到一种方法来训练 lightgbm 模型，强制将某些特征放入分割中，即“具有特征重要性”，然后预测会受到这些变量的影响。

下面是一个建模代码示例，其中有一个无用的变量，因为它是常量，但其想法是，从业务角度来看，可能有一个重要的变量不在功能中

from lightgbm import LGBMRegressor
import pandas as pd
import numpy as np
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Generar un dataset de regresión aleatorio
X, y = make_regression(n_samples=1000, n_features=10, noise=0.9, random_state=42)
feature_names = [f"feature_{i}" for i in range(X.shape[1])]

# Convertir a DataFrame para mayor legibilidad
X = pd.DataFrame(X, columns=feature_names)

# Agregar características inútiles
X["useless_feature_1"] = 1

# Dividir los datos en conjuntos de entrenamiento y prueba
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Definir el modelo LGBMRegressor
model = LGBMRegressor(
    objective="regression",
    metric="rmse",
    random_state=1,
    n_estimators=100
)

# Entrenar el modelo
model.fit(X_train, y_train, eval_set=[(X_test, y_test)])

# Predicciones y evaluación
y_pred = model.predict(X_test)
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
print(f"Test RMSE: {rmse:.4f}")

# Importancia de características
importance = pd.DataFrame({
    "feature": X.columns,
    "importance": model.feature_importances_
}).sort_values(by="importance", ascending=False)

print("\nFeature Importance:")
print(importance)

预期解决方案：应该有一些解决方法，但最有趣的是在拟合或回归方法中使用一些参数的方法。

PeCaDe

Asked: 2024-11-26 17:49:34 +0800 CST

移动设备中的 GPS 跟踪简化

6

我正在运行 Streamlit 应用，尝试在 streamlit 中检索用户的地理位置。但是，当使用 geocoder.ip("me") 时，返回的坐标为 45, -121，指向美国俄勒冈州，而不是我的实际位置。

这是我使用的功能：

def get_lat_lon():
    # Use geocoder to get the location based on IP
    g = geocoder.ip('me')
    
    if g.ok:
        lat = g.latlng[0]  # Latitude
        lon = g.latlng[1]  # Longitude
        return lat, lon
    else:
        st.error("Could not retrieve location from IP address.")
        return None, None

我想找到一个可以在 streamlit 应用程序中运行的解决方案，因此通过单击st.button我可以调用一个检索我的纬度和经度的函数。

PeCaDe

Asked: 2024-11-13 19:24:28 +0800 CST

plotly 没有通过下拉菜单交互正确更新信息

6

plotly在与交互时，我遇到了更新散点图上中线的问题dropdown。下拉菜单允许用户选择一列（Y 轴），我希望所选 Y 轴的中值相应更新。但是，当我从下拉菜单中选择一个新变量时，中线不会按预期更新。

我分享一个玩具样本数据：

import pandas as pd

df_input = pd.DataFrame({
    'rows': range(1, 101),
    'column_a': [i + (i % 10) for i in range(1, 101)],
    'column_b': [i * 2 for i in range(1, 101)],
    'column_c': [i ** 0.5 for i in range(1, 101)],
    'outlier_prob': [0.01 * (i % 10) for i in range(1, 101)]
})

这是我使用的功能

import plotly.graph_objects as go

def plot_dq_scatter_dropdown(df):
    # Initialize the figure
    fig = go.Figure()

    # Function to add median lines (vertical for rows, horizontal for selected Y)
    def add_median_lines(y):
        fig.data = []  # Clear previous data

        # Add a scatter trace for the selected Y variable
        fig.add_trace(go.Scatter(
            x=df["rows"],
            y=df[y],
            mode='markers',
            marker=dict(color=df['outlier_prob'], colorscale='viridis', showscale=True, colorbar=dict(title='Outlier Probability')),
            hoverinfo='text',
            text=df.index,  # Or use other columns for hover data if needed
            name=f'{y} vs rows',  # This will still be used for the hover and data display
            showlegend=False  # Hide the legend for each individual trace
        ))

        # Calculate medians for both X and selected Y
        median_x = df["rows"].median()  # Median of X (rows)
        median_y = df[y].median()  # Median of selected Y-variable

        # Add vertical median line for 'rows'
        fig.add_vline(x=median_x, line=dict(color="orange", dash="dash", width=2), 
                      annotation_text="Median rows", annotation_position="top left")

        # Add horizontal median line for selected Y-variable
        fig.add_hline(y=median_y, line=dict(color="orange", dash="dash", width=2), 
                      annotation_text=f"Median {y}, {median_y}", annotation_position="top left")

        # Update layout after adding the data and median lines
        fig.update_layout(
            title=f"Scatter Plot: rows vs {y}",
            xaxis_title="rows",
            yaxis_title=y,
            autosize=True
        )

    # Add a dropdown menu for selecting the Y-axis variable
    fig.update_layout(
        updatemenus=[dict(
            type="dropdown",
            x=0.17,
            y=1.15,
            showactive=True,
            buttons=[
                dict(
                    label=f"{y}",
                    method="update",
                    args=[{
                        'y': [df[y]],
                        'x': [df["rows"]],
                        'type': 'scatter',
                        'mode': 'markers',
                        'marker': dict(color=df['outlier_prob'], colorscale='viridis', showscale=True, colorbar=dict(title='Outlier Probability')),
                        'hoverinfo': 'text',
                        'text': df.index,
                        'name': f'{y} vs rows',
                        'showlegend': False
                    }, {
                        'title': f"Scatter Plot: rows vs {y}",
                        'yaxis.title': y
                    }]
                ) for y in df.columns if y not in ["rows", "outlier_prob"]
            ]
        )]
    )

    # Display the initial plot (default to the second column for the first plot)
    add_median_lines(df.columns[1])

    # Show the plot
    fig.show()

以下是函数调用的示例：

# Call the function to plot the graph
plot_dq_scatter_dropdown(df_input)

这是我直观地看到的错误：

水平轨迹（以绿色勾勒）意外地保持恒定column_a，因为这是我在下拉菜单中与之交互的轨迹column_b。垂直轨迹应该固定，因为它不与该轴交互。

lightgbm 强制变量拆分

移动设备中的 GPS 跟踪简化

plotly 没有通过下拉菜单交互正确更新信息

为什么 C++20 概念会导致循环约束错误，而老式的 SFINAE 不会？

VScode 自动卸载扩展的问题（Material 主题）

Vue 3：创建时出错“预期标识符但发现‘导入’”[重复]

具有指定基础类型但没有枚举器的“枚举类”的用途是什么？

如何修复未手动导入的模块的 MODULE_NOT_FOUND 错误？

`(表达式，左值) = 右值` 在 C 或 C++ 中是有效的赋值吗？为什么有些编译器会接受/拒绝它？

何时应使用 std::inplace_vector 而不是 std::vector？

在 C++ 中，一个不执行任何操作的空程序需要 204KB 的堆，但在 C 中则不需要

PowerBI 目前与 BigQuery 不兼容：Simba 驱动程序与 Windows 更新有关

AdMob：MobileAds.initialize() - 对于某些设备，“java.lang.Integer 无法转换为 java.lang.String”

PeCaDe's questions