About this Book

Natural Language Processing in Action is a practical guide to processing and generating natural language text in the real world. In this book we provide you with all the tools and techniques you need to build the backend NLP systems to support a virtual assistant (chatbot), spam filter, forum moderator, sentiment analyzer, knowledge base builder, natural language text miner, or nearly any other NLP application you can imagine.

Natural Language Processing in Action is aimed at intermediate to advanced Python developers. Readers already capable of designing and building complex systems will also find most of this book useful, since it provides numerous best-practice examples and insight into the capabilities of state-of-the art NLP algorithms. While knowledge of object-oriented Python development may help you build better systems, it’s not required to use what you learn in this book.

For special topics, we provide sufficient background material and cite resources (both text and online) for those who want to gain an in-depth understanding.

Roadmap

If you are new to Python and natural language processing, you should first read part 1 and then any of the chapters of part 3 that apply to your interests or on-the-job challenges. If you want to get up to speed on the new NLP capabilities that deep learning enables, you’ll also want to read part 2, in order. It builds your understanding of neural networks, incrementally ratcheting up the complexity and capability of those neural nets.

As soon as you find a chapter or section with a snippet that you can “run in your head,” you should run it for real on your machine. And if any of the examples look like they might run on your own text documents, you should put that text into a CSV or text file (one document per line) in the nlpia/src/nlpia/data/ directory. Then you can use the nlpia.data.loaders.get_data() function to retrieve that data and run the examples on your own data.

About this book

The chapters of part 1 deal with the logistics of working with natural language and turning it into numbers that can be searched and computed. This “blocking and tackling” of words comes with the reward of some surprisingly useful applications such as information retrieval and sentiment analysis. Once you master the basics, you’ll find that some very simple arithmetic, computed over and over and over in a loop, can solve some pretty important problems, such as spam filtering. Spam filters of the type you’ll build in chapters 2 through 4 are what saved the global email system from anarchy and stagnation. You’ll learn how to build a spam filter with better than 90% accuracy using 1990s era technology—calculating nothing more than the counts of words and some simple averages of those counts.

All this math with words may sound tedious, but it’s actually quite fun. Very quickly you’ll be able to build algorithms that can make decisions about natural language as well or better than you can (and certainly much faster). This may be the first time in your life that you have the perspective to fully appreciate the way that words reflect and empower your thinking. The high-dimensional vector-space view of words and thoughts will hopefully leave your brain spinning in recurrent loops of self-discovery.

That crescendo of learning may reach a high point toward the middle of this book. The core of this book in part 2 will be your exploration of the complicated web of computation and communication within neural networks. The network effect of small logical units interacting in a web of “thinking” has empowered machines to solve problems that only smart humans even bothered to attempt in the past, things such as analogy questions, text summarization, and translation between natural languages.

Yes, you’ll learn about word vectors, don’t worry, but oh so much more. You’ll be able to visualize words, documents, and sentences in a cloud of connected concepts that stretches well beyond the three dimensions you can readily grasp. You’ll start thinking of documents and words like a Dungeons and Dragons character sheet with a myriad of randomly selected characteristics and abilities that have evolved and grown over time, but only in our heads.

An appreciation for this intersubjective reality of words and their meaning will be the foundation for the coup-de-grace of part 3, where you learn how to build machines that converse and answer questions as well as humans.

About the code

This book contains many examples of source code both in numbered listings and in line with normal text. In both cases, source code is formatted in a fixed-width font like this to separate it from ordinary text. Sometimes code is also in bold to highlight code that has changed from previous steps in the chapter, such as when a new feature adds to an existing line of code.

In many cases, the original source code has been reformatted; we’ve added line breaks and reworked indentation to accommodate the available page space in the book. In rare cases, even this was not enough, and listings include line-continuation markers (). Additionally, comments in the source code have often been removed from the listings when the code is described in the text. Code annotations accompany many of the listings, highlighting important concepts.

The source code for all listings in this book is available for download from the Manning website at https://www.manning.com/books/natural-language-processing-in-action and from GitHub at https://github.com/totalgood/nlpia.

liveBook discussion forum

Purchase of Natural Language Processing in Action includes free access to a private web forum run by Manning Publications where you can make comments about the book, ask technical questions, and receive help from the authors and from other users. To access the forum, go to https://livebook.manning.com/#!/book/natural-language-processing-in-action/discussion. You can also learn more about Manning’s forums and the rules of conduct at https://livebook.manning.com/#!/discussion.

Manning’s commitment to our readers is to provide a venue where a meaningful dialogue between individual readers and between readers and the authors can take place. It is not a commitment to any specific amount of participation on the part of the authors, whose contribution to the forum remains voluntary (and unpaid). We suggest you try asking the authors some challenging questions lest their interest stray! The forum and the archives of previous discussions will be accessible from the publisher’s website as long as the book is in print.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset