Topic modeling with MALLET

MALLET is a well-known library in topic modeling. It also supports document classification and sequence tagging. More about MALLET can be found at To download MALLET, visit (the latest version is 2.0.6). Once downloaded, extract MALLET in the directory. It contains the sample data in .txt format in the sample-data/web/en path of the MALLET directory.

The first step is to import the files into MALLET's internal format. To do this, open the Command Prompt or Terminal, move to the mallet directory, and execute the following command:

mallet-2.0.6$ bin/mallet import-dir --input sample-data/web/en --output tutorial.mallet --keep-sequence --remove-stopwords

This command will generate the tutorial.mallet file.

