Despite categorization being quite useful, it's not without its limitations. Specifically, here are some cases where attempting to use categorization will likely return poor results:
- Fields of text that are free-form, likely created by humans. Examples include tweets, comments, emails, and notes.
- Log lines that should actually be parsed into proper name/value pairs, such as a web access log.
- Documents that contain a lot of multi-line text, XML, and so on.