Why do naive Bayesian classifiers perform so well?

February 8, 2012

Naive Bayes classifiers are a popular choice for classification problems. There are many reasons for this, including:

“Zeitgeist” - widespread awareness after the success of spam filters about ten years ago

Easy to write

The classifier model is fast to build

The model can be modified with new training data without having to rebuild the model

However, they are ‘naive’ - i.e. they assume the features are independent - this contrasts with other classifiers such as Maximum Entropy classifiers (which are slow to compute).

The independence assumption cannot usually be assumed, and in many (most?) cases, including the spam filter example, it is simply wrong.

So why does the Naive Bayes Classifier still perform very well in such applications, even when the features are not independent of each other?

This paper seems to prove (I can’t follow the math) that bayes is good not only when features are independent, but also when dependencies of features from each other are similar between features:

In this paper, we propose a novel explanation on the superb classiﬁcation performance of naive Bayes. We show that, essentially, the dependence distribution; i.e., how the local dependence of a node distributes in each class, evenly or unevenly, and how the local dependencies of all nodes work together, consistently (supporting a certain classiﬁcation) or inconsistently (canceling each other out), plays a crucial role. Therefore, no matter how strong the dependences among attributes are, naive Bayes can still be optimal if the dependences distribute evenly in classes, or if the dependences cancel each other out

引用自：https://stats.stackexchange.com/questions/23490

Why do naive Bayesian classifiers perform so well?

相關問答

除了 SVM，還有哪些算法需要特徵縮放？

“好”分類器破壞了我的 Precision-Recall 曲線。發生了什麼？

代數分類器，更多信息？

如何使用樸素貝葉斯解決多類問題？

樸素貝葉斯如何成為線性分類器？

樸素貝葉斯中 log-sum-exp 技巧如何工作的示例