MNIST 數字識別：僅使用完全連接的 NN，我們能獲得的最佳效果是什麼？（沒有CNN）

November 10, 2018

為了完全理解它是如何在內部工作的，我只用 Python + numpy 從頭開始重寫一個神經網絡。（因為它是出於學習目的，所以性能不是問題）。

在轉向卷積網絡 (CNN) 或更複雜的工具等之前，我想確定我們希望僅使用標準 NN（一些全連接隱藏層 + 激活函數）的最大準確度，使用MNIST 數字數據庫。

我的準確率最高可達**~96.2%**：

網絡結構：[784, 200, 80, 10]
學習率：0.01
時代：3
沒有使用偏見
激活函數：sigmoid ( 1/(1+exp(-x)))
初始化權重：[-1, 1]截斷正態分佈
優化過程：純隨機梯度下降

我過去讀到，即使使用標準 NN，也有可能達到 98%。

**問題：您將使用哪些參數（如上所示）在具有標準 NN 的 MNIST 數字數據庫上獲得超過 98% 的準確度？**請參閱下面的完整代碼。

到目前為止我已經嘗試過：

用正態分佈乘以各種因素（“He et al init method”或“Xavier”init）替換權重，另請參閱什麼是神經網絡中好的初始權重？：

wm = np.random.randn(nodes_out, nodes_in + bias_node) * np.sqrt(2/nodes_in)  # also tried with np.sqrt(1/nodes_in)

但它並沒有顯著改變任何東西，我注意到在這種情況下情況更糟

用 ReLU 替換了 sigmoid：

def activation_function(x): 
   return np.maximum(0, x)

由於未知原因，準確率下降到 10%（即 NN 沒用！）activiation_function。

自包含代碼（約 100 行代碼），您可以直接運行（主要來自https://www.python-course.eu/neural_network_mnist.php，但有點重寫），您只需要下載mnist_train。 csv和mnist_test.csv首先：

from __future__ import division
import matplotlib.pyplot as plt
import numpy as np
from scipy.special import expit as activation_function  # 1/(1+exp(-x)), sigmoid
from scipy.stats import truncnorm

if True:  # recreate MNIST arrays. Do it only once, after that modify to False
   train_data = np.loadtxt("mnist_train.csv", delimiter=",")
   test_data = np.loadtxt("mnist_test.csv", delimiter=",")
   train_imgs = np.asfarray(train_data[:, 1:]) / 255.0
   test_imgs = np.asfarray(test_data[:, 1:]) / 255.0
   train_labels = np.asfarray(train_data[:, :1])
   test_labels = np.asfarray(test_data[:, :1])
   lr = np.arange(10)
   train_labels_one_hot = (lr==train_labels).astype(np.float)
   test_labels_one_hot = (lr==test_labels).astype(np.float)
   for i, d in enumerate([train_imgs, test_imgs, train_labels, test_labels, train_labels_one_hot, test_labels_one_hot]):
       np.save('%i.array' % i, d)

(train_imgs, test_imgs, train_labels, test_labels, train_labels_one_hot, test_labels_one_hot) = [np.load('%i.array.npy' % i) for i in range(6)]

print 'Data loaded.'

if False:  # show images
   for i in range(10):
       img = train_imgs[i].reshape((28,28))
       plt.imshow(img, cmap="Greys")
       plt.show()

class NeuralNetwork:
   def __init__(self, network_structure, learning_rate, bias=None):  
       self.structure = network_structure
       self.no_of_layers = len(self.structure)
       self.learning_rate = learning_rate 
       self.bias = bias
       self.create_weight_matrices()

   def create_weight_matrices(self):
       bias_node = 1 if self.bias else 0
       self.weights_matrices = []
       for k in range(self.no_of_layers-1):
           nodes_in = self.structure[k]
           nodes_out = self.structure[k+1]
           n = (nodes_in + bias_node) * nodes_out
           X = truncnorm(-1, 1,  loc=0, scale=1)
           #X = truncnorm(-1 / np.sqrt(nodes_in), 1 / np.sqrt(nodes_in),  loc=0, scale=1)  # accuracy is worse
           wm = X.rvs(n).reshape((nodes_out, nodes_in + bias_node))
           self.weights_matrices.append(wm)

   def train(self, input_vector, target_vector): 
       input_vector = np.array(input_vector, ndmin=2).T
       res_vectors = [input_vector]
       for k in range(self.no_of_layers-1):
           in_vector = res_vectors[-1]
           if self.bias:
               in_vector = np.concatenate((in_vector, [[self.bias]]))
               res_vectors[-1] = in_vector
           x = np.dot(self.weights_matrices[k], in_vector)
           out_vector = activation_function(x)
           res_vectors.append(out_vector)    

       target_vector = np.array(target_vector, ndmin=2).T
       output_errors = target_vector - out_vector  
       for k in range(self.no_of_layers-1, 0, -1):
           out_vector = res_vectors[k]
           in_vector = res_vectors[k-1]
           if self.bias and not k==(self.no_of_layers-1):
               out_vector = out_vector[:-1,:].copy()
           tmp = output_errors * out_vector * (1.0 - out_vector)  # sigma'(x) = sigma(x) (1 - sigma(x))
           tmp = np.dot(tmp, in_vector.T)
           self.weights_matrices[k-1] += self.learning_rate * tmp
           output_errors = np.dot(self.weights_matrices[k-1].T, output_errors)
           if self.bias:
               output_errors = output_errors[:-1,:]

   def run(self, input_vector):
       if self.bias:
           input_vector = np.concatenate((input_vector, [self.bias]))
       in_vector = np.array(input_vector, ndmin=2).T
       for k in range(self.no_of_layers-1):
           x = np.dot(self.weights_matrices[k], in_vector)
           out_vector = activation_function(x)
           in_vector = out_vector
           if self.bias:
               in_vector = np.concatenate((in_vector, [[self.bias]]))
       return out_vector

   def evaluate(self, data, labels):
       corrects, wrongs = 0, 0
       for i in range(len(data)):
           res = self.run(data[i])
           res_max = res.argmax()
           if res_max == labels[i]:
               corrects += 1
           else:
               wrongs += 1
       return corrects, wrongs

ANN = NeuralNetwork(network_structure=[784, 200, 80, 10], learning_rate=0.01, bias=None)

for epoch in range(3):
   for i in range(len(train_imgs)):
       if i % 1000 == 0:
           print 'epoch:', epoch, 'img number:', i, '/', len(train_imgs)
       ANN.train(train_imgs[i], train_labels_one_hot[i])

corrects, wrongs = ANN.evaluate(test_imgs, test_labels)
print("accruracy: test", corrects / (corrects + wrongs))

編輯：在 10 個 epoch、結構[784, 400, 400, 10]和其他參數相同的情況下，我終於得到了 97.8% 的準確率！這是過度擬合的情況（如評論中所述）？

另一項測試：20 個 epochs，結構[784, 700, 500, 10]，其他參數相同，準確率為 97.9%。

Yann LeCun 編制了一份關於 MNIST的結果（和相關論文）的大清單，這可能會引起人們的興趣。

Cireşan、Meier、Gambardella 和 Schmidhuber (2010) ( arXiv )的非卷積神經網絡結果最好，他們報告的準確率為 99.65%。正如他們的摘要所描述的，他們的方法本質上是蠻力：

用於普通多層感知器的老式在線反向傳播在著名的 MNIST 手寫數字基准上產生了非常低的 0.35% 錯誤率。到目前為止，我們需要很多隱藏層、每層很多神經元、大量變形的訓練圖像和顯卡來大大加快學習速度。

網絡本身是一個六層 MLP，每層有 2500、2000、1500、1000、500 和 10 個神經元，並且訓練集通過仿射和彈性變形進行了擴充。唯一的其他秘密成分是大量的計算——最後幾頁描述了它們是如何並行化它的。

一年後，同一組 (Meier et al., 2011)使用 25 個單層神經網絡的集合報告了類似的結果（0.39% 的測試誤差*）。這些單獨較小（800 個隱藏單元），但訓練策略有點花哨。與 convnets 類似的策略做得更好一些（~0.23% 測試錯誤*）。由於它們是通用近似值，我不明白為什麼合適的 MLP 無法匹配它，儘管它可能非常大且難以訓練。

令人煩惱的是，這些論文中很少報告置信區間、標準誤差或類似的東西，因此很難直接比較這些結果。

引用自：https://stats.stackexchange.com/questions/376312

comments powered by Disqus

MNIST 數字識別：僅使用完全連接的 NN，我們能獲得的最佳效果是什麼？（沒有CNN）

相關問答

一個神經網絡可以只用111隱藏層解決了什麼問題？

深度學習的模擬退火：為什麼無梯度統計學習不是主流？

哪個是第一位的 - 領域專業知識或實驗方法？

神經網絡可以處理負輸入和零輸入嗎？

在實踐中如何實現神經網絡參數的先驗分佈？

基礎模型：它是統計和機器學習的新範式嗎？