我使用 pytorch 和 keras 实现了一个简单的线性模型,以学习每个库的基础知识。我建立了一个用于线性回归的一层模型。两种模型似乎都有效,并且损失正在减少,但是当 pytorch 模型达到最小值时,损失会反弹,而在 keras 中,损失是稳定的。我使用的数据是合成的并经过线性测试。
这是keras模型
IceCream = pd.read_csv("IceCreamData.csv")
x_values = IceCream[["Temperature"]]
y_values = IceCream["Revenue"]
x_train, x_test, y_train, y_test = train_test_split(x_values, y_values, test_size=0.25)
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(units=1,
kernel_initializer=tf.keras.initializers.RandomNormal(stddev=0.01),
bias_initializer=tf.keras.initializers.Zeros())
)
model.compile(optimizer=tf.keras.optimizers.Adam(0.01, epsilon=1e-07), loss='mean_squared_error')
model.fit(x_train, y_train, epochs=25, batch_size=1)
给出以下训练数据:
375/375 [==============================] - 0s 617us/step - loss: 261208.2969
Epoch 2/25
375/375 [==============================] - 0s 568us/step - loss: 192060.6094
Epoch 3/25
375/375 [==============================] - 0s 577us/step - loss: 137438.0000
(...)
Epoch 20/25
375/375 [==============================] - 0s 536us/step - loss: 667.3316
Epoch 21/25
375/375 [==============================] - 0s 535us/step - loss: 665.7455
Epoch 22/25
375/375 [==============================] - 0s 535us/step - loss: 666.8908
Epoch 23/25
375/375 [==============================] - 0s 577us/step - loss: 665.0857
Epoch 24/25
375/375 [==============================] - 0s 536us/step - loss: 662.0533
Epoch 25/25
375/375 [==============================] - 0s 534us/step - loss: 661.3047
这是pytorch模型:
class RegressionDataset(Dataset):
def __init__(self, x, y):
super().__init__()
self.x = torch.from_numpy(x.astype("float32"))
self.y = torch.from_numpy(y.astype("float32"))
def __len__(self):
return len(self.x)
def __getitem__(self, index):
return self.x[index], self.y[index].unsqueeze(0)
class LinearRegressionModel(nn.Module):
def __init__(self):
super().__init__()
self.linear = nn.Linear(1, 1)
self.loss_function = nn.MSELoss()
self.optimizer_function = torch.optim.Adam(self.parameters(), lr=0.01, eps=1e-07)
torch.nn.init.normal_(self.linear.weight, mean=0.0, std=1.0)
def forward(self, inputs):
return self.linear(inputs)
def backward(self, train_loader, epoch, num_epochs):
self.train()
for x_values, y_values in train_loader:
prediction = self.linear(x_values)
loss = self.loss_function(prediction, y_values)
loss.backward()
self.optimizer_function.step()
self.optimizer_function.zero_grad()
print(f"Epoch [{epoch + 1:03}/{num_epochs:3}] | Train Loss: {loss.item():.4f}")
def validate(self, val_loader):
self.eval()
with torch.no_grad():
for inputs, targets in val_loader:
outputs = self.linear(inputs)
loss = self.loss_function(outputs, targets)
print(f'Validation Loss: {loss.item():.4f}')
data = pd.read_csv("./IceCreamData.csv", delimiter=",")
x_values = data[["Temperature"]].to_numpy()
y_values = data["Revenue"].to_numpy()
dataset = RegressionDataset(x_values, y_values)
train_dataset, test_dataset = random_split(dataset, lengths=[0.75, 0.25])
train_loader = DataLoader(dataset=train_dataset, batch_size=1, shuffle=True)
test_loader = DataLoader(dataset=test_dataset, batch_size=1, shuffle=True)
model = LinearRegressionModel()
num_epochs = 25
for epoch in range(num_epochs):
model.backward(train_loader, epoch, num_epochs)
model.validate(test_loader)
给出以下训练结果:
Epoch [001/ 25] | Train Loss: 248788.2500
Validation Loss: 257732.9062
Epoch [002/ 25] | Train Loss: 96519.7422
Validation Loss: 110466.8281
Epoch [003/ 25] | Train Loss: 76869.0547
Validation Loss: 178772.9375
(...)
Epoch [020/ 25] | Train Loss: 679.7694
Validation Loss: 1674.3351
Epoch [021/ 25] | Train Loss: 2065.5454
Validation Loss: 1177.6052
Epoch [022/ 25] | Train Loss: 269.6078
Validation Loss: 595.9854
Epoch [023/ 25] | Train Loss: 115.4116
Validation Loss: 0.1172
Epoch [024/ 25] | Train Loss: 2134.9248
Validation Loss: 9816.9375
Epoch [025/ 25] | Train Loss: 37.1115
Validation Loss: 2869.8569
首先,我认为它是由不同的权重初始值导致的,因此我实现了带有标准差的初始化。
我还使用不同的学习率值进行了测试,但也没有更好的结果。根据文档,其他参数(如动量、贝塔值等)应该是相同的。只有 epsilon 不同,我在代码中进行了调整。
为什么pytorch模型的loss达到最小值时会上下跳动,而keras模型loss却稳定?
Keras中返回的loss值
fit
是整个epoch的平均值;来自文档:您问题中的火炬代码仅打印出单个批次的损失(在本例中为单个样本)。您可以将所有批次的损失相加,然后报告平均值。我这样做了,损失下降得更平稳了。
做了一些小修改:
或者,测试
batch_size=len(test_data)
加载器的设置应该为您提供更平滑的验证损失平均值(尽管它会占用更多内存并且速度可能会更慢)。这种方法对于训练损失来说效果不佳,因为通常需要相对较小的批量大小才能进行有效训练。