List of Listings

Chapter 2. Getting started with Tika

Listing 2.1. Built-in documentation of the Tika application

Listing 2.2. Simple text extractor example

Chapter 4. Document type detection

Listing 4.1. Basic MIME-info database file

Listing 4.2. C and C++ filename patterns

Listing 4.3. Simple type detector example

Chapter 5. Content extraction

Listing 5.1. Simple full-text indexer with Tika and Lucene

Listing 5.2. Tika parser interface

Chapter 6. Understanding metadata

Listing 6.1. Property class and support for XMP-like metadata

Listing 6.2. Metadata instances in Tika

Listing 6.3. Extending the Lucene indexer with generic metadata

Listing 6.4. Extending the Lucene indexer with content-specific metadata

Listing 6.5. Getting a list of recent files from the Lucene indexer

Listing 6.6. Using Tika metadata to convert to RSS

Chapter 7. Language detection

Listing 7.1. Source code of a language-detecting parser decorator

Chapter 8. What’s in a file?

Listing 8.1. Tika’s RSS feed parser exploiting RSS’s XML-based content structure

Listing 8.2. The latter half of the FeedParser’s parse method: extracting links

Listing 8.3. The Tika HDFParser’s parse method

Listing 8.4. The Tika HDFParser’s unravelStringMet method

Listing 8.5. Snippet of Tika’s HtmlHandler class that deals with meta tags

Listing 8.6. Leveraging directory information to extract file metadata

Listing 8.7. Tika’s LinkContentHandler class makes extracting file links a snap

Listing 8.8. A sample program to roll back a software version using Tika

Listing 8.9. A custom Tika Parser implementation for our deployment area

Chapter 10. Tika and the Lucene search stack

Listing 10.1. Integrating Tika into Open Relevance

Chapter 11. Extending Tika

Listing 11.1. Custom type detector for encrypted prescription documents

Listing 11.2. An XMLParser subclass for parsing prescription documents

Listing 11.3. Parser class for encrypted prescription documents

Listing 11.4. Overriding Parsers in Tika

Chapter 13. Content management with Apache Jackrabbit

Listing 13.1. Background text extraction task in Jackrabbit

Listing 13.2. Automatic type detection in Jackrabbit

Chapter 14. Curating cancer research data with Tika

Listing 14.1. Detecting file types prior to ingestion in the eCAS system

Listing 14.2. Making extracted MIME information available for search and retrieval

Chapter 15. The classic search engine example

Listing 15.1. Linking together link extraction and language detection

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset