This chapter introduces the most common and simple generative classifiers—Naïve Bayes. As mentioned earlier, generative classifiers are supervised learning algorithms that attempt to fit a joint probability distribution p(X,Y) of two X and Y events, representing two sets of observed and hidden (or latent) variables, x and y.
In this chapter, you will learn, and hopefully appreciate, the simplicity of the Naïve Bayes technique through a concrete example. Then, you will learn how to build a Naïve Bayes classifier to predict the stock price movement, given some prior technical indicators in the analysis of financial markets.
Finally, you will learn how to apply Naïve Bayes to text mining by predicting stock prices using financial news feed and press releases.
Let's start with a refresher course in basic statistics.
Given two events or observations X and Y, the joint probability of X and Y is defined as p(X,Y) = p(X∩Y). If the observations X and Y are not related, an assumption known as conditional independence, then p(X,Y) = p(X).p(Y). The conditional probability of an event Y, given X, is defined as p(Y|X) = p(X,Y)/p(X).
These two definitions are quite simple. However, probabilistic reasoning can be difficult to read in the case of large numbers of variables and sequences of conditional probabilities. As a picture is worth a thousand words, researchers introduced graphical models to describe a probabilistic relation between random variables using graphs [5:1].
There are two categories of graphs, and therefore, graphical models, which are as follows:
Directed graphical models are directed acyclic graphs that have been introduced to:
A Bayesian network is a directed graphical model that defines a joint probability over a set of variables [5:2].
The two joint probabilities p(X,Y) and p(X,Y,Z) can be graphically modeled using Bayesian networks, as follows:
The conditional probability p(Y|X) is represented by an arrow directed from the output (or symptoms) Y to the input (or cause) X. Elaborate models can be described as a large directed graph between variables.
Here is an example of a real-world Bayesian network; the functioning of a smoke detector:
The flow diagram is as follows:
This representation may be a bit counterintuitive, as the vertices are directed from the symptoms (or output) to the cause (or input). Directed graphical models are used in many different models, besides Bayesian networks [5:3].
The Naïve Bayes models are probabilistic models based on the Bayes's theorem under the assumption of features independence, as mentioned in the Generative models section under Supervised learning in Chapter 1, Getting Started.