Chapter 1. The case for the digital Babel fish
1.1. Understanding digital documents
1.1.1. A taxonomy of file formats
Chapter 2. Getting started with Tika
2.1. Working with Tika source code
2.3. Tika as an embedded library
Chapter 3. The information landscape
3.1. Measuring information overload
3.2. I’m feeling lucky—searching the information landscape
3.3. Beyond lucky: machine learning
Chapter 4. Document type detection
4.1.1. The parlance of media type names
4.2.1. The shared MIME-info database
5.4.1. Semantic structure of text
5.5. Context-sensitive parsing
Chapter 6. Understanding metadata
6.1. The standards of metadata
6.4. Practical uses of metadata
7.1. The most translated document in the world
7.2. Sounds Greek to me—theory of language detection
7.3. Language detection in Tika
8.1.1. HDF: a format for scientific data
8.1.2. Really Simple Syndication: a format for rapidly changing content
8.2. How Tika extracts content
8.2.1. Organization of content
3. Integration and advanced use
9.2. Managing and mining information
Chapter 10. Tika and the Lucene search stack
11.2.1. The Detector interface
11.3.1. Customizing existing parsers
Chapter 12. Powering NASA science data systems
12.1. NASA’s Planetary Data System
12.2. NASA’s Earth Science Enterprise
Chapter 13. Content management with Apache Jackrabbit
13.1. Introducing Apache Jackrabbit
Chapter 14. Curating cancer research data with Tika
14.1. The NCI Early Detection Research Network
Chapter 15. The classic search engine example
15.1. The Public Terabyte Dataset Project
Appendix A. Tika quick reference
Appendix B. Supported metadata keys