From linear algebra to hierarchical probabilistic models

Initial attempts by topic models to improve on the vector space model (developed in the mid-1970s) applied linear algebra to reduce the dimensionality of the document-term matrix. This approach is similar to the algorithm we discussed as principal component analysis in Chapter 12, Unsupervised Learning, on unsupervised learning. While effective, it is difficult to evaluate the results of these models absent a benchmark model.

In response, probabilistic models emerged that assume an explicit document generation process and provide algorithms to reverse engineer this process and recover the underlying topics.

This table highlights key milestones in the model evolution that we will address in more detail in the following sections:

Model	Year	Description
Latent Semantic Indexing (LSI)	1988	Reduces the word space dimensionality to capture semantic document-term relationships by
Probabilistic Latent Semantic Analysis (pLSA)	1999	Reverse-engineers a process that assumes words generate a topic and documents are a mix of topics
Latent Dirichlet Allocation (LDA)	2003	Adds a generative process for documents: a three-level hierarchical Bayesian model

Table of Contents for From linear algebra to hierarchical probabilistic models

Create new playlist

Sign In

Sign Up

Table of Contents for
From linear algebra to hierarchical probabilistic models