Machine-Learning

為什麼我會得到 100% 準確率的決策樹?

  • March 22, 2018

我的決策樹獲得了 100% 的準確率。我究竟做錯了什麼?

這是我的代碼:

import pandas as pd
import json
import numpy as np
import sklearn
import matplotlib.pyplot as plt


data = np.loadtxt("/Users/Nadjla/Downloads/allInteractionsnum.csv", delimiter=',')


x = data[0:14]
y = data[-1]


from sklearn.cross_validation import train_test_split

x_train = x[0:2635]
x_test = x[0:658]
y_train = y[0:2635]
y_test = y[0:658]


from sklearn.tree import DecisionTreeClassifier
tree = DecisionTreeClassifier()
tree.fit(x_train.astype(int), y_train.astype(int))


from sklearn.metrics import accuracy_score

y_predicted = tree.predict(x_test.astype(int))
accuracy_score(y_test.astype(int), y_predicted)

您的測試樣本是訓練樣本的子集:

x_train = x[0:2635]
x_test = x[0:658]
y_train = y[0:2635]
y_test = y[0:658]

這意味著您在部分訓練數據上評估您的模型,即您正在進行樣本內評估。眾所周知,樣本內準確度是樣本外準確度的一個很差的指標,最大化樣本內準確度會導致過度擬合。因此,應該始終在完全獨立於訓練數據的真實保留樣本上評估模型。

確保您的訓練和測試數據是不相交的,例如,

x_train = x[659:2635]
x_test = x[0:658]
y_train = y[659:2635]
y_test = y[0:658]

引用自:https://stats.stackexchange.com/questions/336055

comments powered by Disqus