在有和沒有重置狀態的情況下訓練 LSTM

July 26, 2017

我對深度學習和 Keras 很陌生，我想知道 LSTM RNN 的這兩種訓練方法有什麼區別。
1:
for i in range(10): #training
   model.fit(trainX, trainY, epochs=1, batch_size=batch_size, verbose=0, 
             shuffle=False)
   model.reset_states()

2:
model.fit(trainX, trainY, epochs=10, batch_size=batch_size, verbose=0, 
         shuffle=False)
在這兩種情況下，網絡不是在整個數據集上訓練 10 次嗎？我意識到，在示例中，我們可以在每個數據批次迭代中重置狀態，但即使我刪除了重置指令，結果也會大不相同。我糊塗了。

是的，你是對的。在這兩種情況下，模型都訓練了 10 個 epoch。在每個 epoch 中，訓練數據中的所有示例都流經網絡。批量大小決定了模型的權重或參數更新後的示例數量。

第一種情況和第二種情況的區別在於，第一種情況允許您fit()在 epoch 之間的方法之外執行一些處理，例如model.reset_states(). 但是，類似的處理也可以fit()通過自定義回調類應用於方法中的第二種情況，包括例如on_epoch_begin、、和函數。on_epoch_end``on_batch_begin``on_batch_end

model.reset_states()關於從第一種情況中刪除時兩種情況得到完全不同結果的問題：它不應該發生。如果您在一種情況下重置時代之間的模型狀態，但在另一種情況下不重置，您將從每種情況下得到不同的結果。如果您在兩個時期之間不重置任何一種情況下的狀態，則結果（一定數量的時期後的損失）將是相同的，在導入 Keras 之前初始化一個偽隨機數生成器並在運行這兩種情況之間重新啟動 Python 解釋器。我通過以下示例驗證了這一點，其目標是從嘈雜的波形中學習純正弦波。以下代碼片段已使用 Python 3.5、NumPy 1.12.1、Keras 2.0.4 和 Matplotlib 2.0.2 實現：
import numpy as np

# Needed for reproducible results
np.random.seed(1)

from keras.models import Sequential
from keras.layers import LSTM, Dense

# Generate example data
# -----------------------------------------------------------------------------
x_train = y_train = [np.sin(i) for i in np.arange(start=0, stop=10, step=0.01)]
noise = np.random.normal(loc=0, scale=0.1, size=len(x_train))
x_train += noise

n_examples = len(x_train)
n_features = 1
n_outputs = 1
time_steps = 1

x_train = np.reshape(x_train, (n_examples, time_steps, n_features))
y_train = np.reshape(y_train, (n_examples, n_outputs))

# Initialize LSTM
# -----------------------------------------------------------------------------
batch_size = 100
model = Sequential()
model.add(LSTM(units=10, input_shape=(time_steps, n_features),
              return_sequences=True, stateful=True, batch_size=batch_size))
model.add(LSTM(units=10, return_sequences=False, stateful=True))
model.add(Dense(units=n_outputs, activation='linear'))
model.compile(loss='mse', optimizer='adadelta')

# Train LSTM
# -----------------------------------------------------------------------------
epochs = 70

# Case 1
for i in range(epochs):
   model.fit(x_train, y_train, epochs=1, batch_size=batch_size, verbose=2,
             shuffle=False)

# !!! To get exactly the same results between the cases, do the following:
# !!! * To record the loss of the 1st case, run all the code until here.
# !!! * To record the loss of the 2nd case,
# !!! restart Python, comment out the 1st case and run all the code.

# Case 2
model.fit(x_train, y_train, epochs=epochs, batch_size=batch_size, verbose=2,
         shuffle=False)
另外，以下是未重置狀態的任何一種情況的結果的可視化：
import matplotlib.pyplot as plt

plt.style.use('ggplot')
ax = plt.figure(figsize=(10, 6)).add_subplot(111)
ax.plot(x_train[:, 0], label='x_train', color='#111111', alpha=0.8, lw=3)
ax.plot(y_train[:, 0], label='y_train', color='#E69F00', alpha=1, lw=3)
ax.plot(model.predict(x_train, batch_size=batch_size)[:, 0],
       label='Predictions for x_train after %i epochs' % epochs,
       color='#56B4E9', alpha=0.8, lw=3)
plt.legend(loc='lower right')
在 Keras 網站上，RNN 的狀態性在循環層文檔和常見問題解答中進行了討論。

編輯：上述解決方案目前適用於 Theano 後端，但不適用於 TensorFlow 後端。

引用自：https://stats.stackexchange.com/questions/294501

在有和沒有重置狀態的情況下訓練 LSTM

相關問答

我如何使我的二元分類器偏愛假陽性錯誤而不是假陰性？

LSTM（長短期記憶）死了嗎？

為什麼神經網絡很難學習恆等函數？

為什麼單個 ReLU 不能學習 ReLU？

何時在神經網絡中“添加”層以及何時“連接”？

有沒有辦法將新數據合併到已經訓練好的神經網絡中，而無需對我在 Keras 中的所有數據進行重新訓練？