我正在尝试找到一种方法来训练 lightgbm 模型,强制将某些特征放入分割中,即“具有特征重要性”,然后预测会受到这些变量的影响。
下面是一个建模代码示例,其中有一个无用的变量,因为它是常量,但其想法是,从业务角度来看,可能有一个重要的变量不在功能中
from lightgbm import LGBMRegressor
import pandas as pd
import numpy as np
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
# Generar un dataset de regresión aleatorio
X, y = make_regression(n_samples=1000, n_features=10, noise=0.9, random_state=42)
feature_names = [f"feature_{i}" for i in range(X.shape[1])]
# Convertir a DataFrame para mayor legibilidad
X = pd.DataFrame(X, columns=feature_names)
# Agregar características inútiles
X["useless_feature_1"] = 1
# Dividir los datos en conjuntos de entrenamiento y prueba
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Definir el modelo LGBMRegressor
model = LGBMRegressor(
objective="regression",
metric="rmse",
random_state=1,
n_estimators=100
)
# Entrenar el modelo
model.fit(X_train, y_train, eval_set=[(X_test, y_test)])
# Predicciones y evaluación
y_pred = model.predict(X_test)
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
print(f"Test RMSE: {rmse:.4f}")
# Importancia de características
importance = pd.DataFrame({
"feature": X.columns,
"importance": model.feature_importances_
}).sort_values(by="importance", ascending=False)
print("\nFeature Importance:")
print(importance)
预期解决方案:应该有一些解决方法,但最有趣的是在拟合或回归方法中使用一些参数的方法。