Chapter 2. Getting started with Tika
Chapter 4. Document type detection
Listing 4.1. Basic MIME-info database file
Chapter 5. Content extraction
Chapter 6. Understanding metadata
Listing 6.1. Property class and support for XMP-like metadata
Listing 6.2. Metadata instances in Tika
Listing 6.3. Extending the Lucene indexer with generic metadata
Listing 6.4. Extending the Lucene indexer with content-specific metadata
Listing 6.5. Getting a list of recent files from the Lucene indexer
Chapter 7. Language detection
Listing 7.1. Source code of a language-detecting parser decorator
Chapter 8. What’s in a file?
Listing 8.1. Tika’s RSS feed parser exploiting RSS’s XML-based content structure
Listing 8.2. The latter half of the FeedParser’s parse method: extracting links
Listing 8.3. The Tika HDFParser’s parse method
Listing 8.4. The Tika HDFParser’s unravelStringMet method
Listing 8.5. Snippet of Tika’s HtmlHandler class that deals with meta tags
Listing 8.6. Leveraging directory information to extract file metadata
Listing 8.7. Tika’s LinkContentHandler class makes extracting file links a snap
Listing 8.8. A sample program to roll back a software version using Tika
Listing 8.9. A custom Tika Parser implementation for our deployment area
Chapter 10. Tika and the Lucene search stack
Chapter 11. Extending Tika
Listing 11.1. Custom type detector for encrypted prescription documents
Listing 11.2. An XMLParser subclass for parsing prescription documents
Listing 11.3. Parser class for encrypted prescription documents
Chapter 13. Content management with Apache Jackrabbit
Chapter 14. Curating cancer research data with Tika
Listing 14.1. Detecting file types prior to ingestion in the eCAS system
Listing 14.2. Making extracted MIME information available for search and retrieval
Chapter 15. The classic search engine example
Listing 15.1. Linking together link extraction and language detection