Part 2. Tika in detail

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Part 2. Tika in detail

By now you should have a fairly good understanding of what Tika is, what it can do, and where it fits in the bigger picture of information-processing systems. If you read through chapter 2 and tried out the examples, you’ve seen Tika in action and written your first Tika-based application. But if you’re anything like us, you’re wondering how this toolkit is put together and what programming APIs it provides. Wait no more, because that’s what we’ll be covering in this part of the book!

We’ll start in chapter 4 by describing the internet media type system and how Tika can detect the type of virtually any kind of document. Once the type is known, Tika can parse the document to extract its content and any associated metadata. Content extraction with Tika is covered in chapter 5, and metadata handling in chapter 6. In chapter 7, we’ll show how Tika can help deduce information like the natural language in which a document is written. Finally, chapter 8 looks at some of the more popular file formats and the details that you should know when dealing with such files.

That’s a lot of ground to cover, so let’s get started!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Part 2. Tika in detail

Create new playlist

Sign In

Sign Up

Part 2. Tika in detail

Table of Contents for
Part 2. Tika in detail