Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 5. Naïve Bayes Classifiers

This chapter introduces the most common and simple generative classifiers—Naïve Bayes. As mentioned earlier, generative classifiers are supervised learning algorithms that attempt to fit a joint probability distribution p(X,Y) of two X and Y events, representing two sets of observed and hidden (or latent) variables, x and y.

In this chapter, you will learn, and hopefully appreciate, the simplicity of the Naïve Bayes technique through a concrete example. Then, you will learn how to build a Naïve Bayes classifier to predict the stock price movement, given some prior technical indicators in the analysis of financial markets.

Finally, you will learn how to apply Naïve Bayes to text mining by predicting stock prices using financial news feed and press releases.

Probabilistic graphical models

Let's start with a refresher course in basic statistics.

Given two events or observations X and Y, the joint probability of X and Y is defined as p(X,Y) = p(X∩Y). If the observations X and Y are not related, an assumption known as conditional independence, then p(X,Y) = p(X).p(Y). The conditional probability of an event Y, given X, is defined as p(Y|X) = p(X,Y)/p(X).

These two definitions are quite simple. However, probabilistic reasoning can be difficult to read in the case of large numbers of variables and sequences of conditional probabilities. As a picture is worth a thousand words, researchers introduced graphical models to describe a probabilistic relation between random variables using graphs [5:1].

There are two categories of graphs, and therefore, graphical models, which are as follows:

Directed graphs such as Bayesian networks
Undirected graphs such as conditional random fields (refer to the Conditional random fields section in Chapter 7, Sequential Data Models)

Directed graphical models are directed acyclic graphs that have been introduced to:

Provide a simple way to visualize a probabilistic model
Describe the conditional dependence between variables
Represent a statistical inference in terms of connectivity between graphical objects

A Bayesian network is a directed graphical model that defines a joint probability over a set of variables [5:2].

The two joint probabilities p(X,Y) and p(X,Y,Z) can be graphically modeled using Bayesian networks, as follows:

Examples of probabilistic graphical models

The conditional probability p(Y|X) is represented by an arrow directed from the output (or symptoms) Y to the input (or cause) X. Elaborate models can be described as a large directed graph between variables.

Note

A metaphor for graphical models

From a software engineering perspective, graphical models visualize probabilistic equations in the same way the UML class diagram visualizes the object-oriented source code.

Here is an example of a real-world Bayesian network; the functioning of a smoke detector:

A fire may generate smoke.
Smoke may trigger an alarm.
A depleted battery may trigger an alarm.
The alarm may alert the homeowner.
The alarm may alert the fire department.

The flow diagram is as follows:

A Bayesian network for smoke detectors

This representation may be a bit counterintuitive, as the vertices are directed from the symptoms (or output) to the cause (or input). Directed graphical models are used in many different models, besides Bayesian networks [5:3].

Note

Plate models

There are several alternate representations of probabilistic models, besides the directed acyclic graph, such as the plate model commonly used for the Latent Dirichlet Allocation (LDA) [5:4].

The Naïve Bayes models are probabilistic models based on the Bayes's theorem under the assumption of features independence, as mentioned in the Generative models section under Supervised learning in Chapter 1, Getting Started.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 5. Naïve Bayes Classifiers

Create new playlist

Sign In

Sign Up

Chapter 5. Naïve Bayes Classifiers

Probabilistic graphical models

Note

Note

Table of Contents for
5. Naïve Bayes Classifiers