Preface

Streaming data is the new top technology to watch in the field of data science and machine learning. As business needs become more demanding, many use cases require real-time analysis as well as real-time machine learning. This book will allow you to get up to speed with data analytics for streaming data and focuses strongly on adapting machine learning and other analytics to the case of streaming data.

You will first learn about the architecture for streaming and real-time machine learning. You will then look at the state-of-the-art frameworks for streaming data such as River.

You will learn about various industrial use cases for streaming data, such as online anomaly detection. Then, you will deep dive into challenges and how you will mitigate them. You will then learn the best practices that will help you use streaming data to generate real-time insights.

Upon completion of the book, you will be confident about using streaming data in your machine learning models.

Who this book is for

Data scientists and machine learning engineers who have a basis in machine learning, are practice- and technology-oriented, and want to learn how to apply machine learning to streaming data through practical examples with modern technologies will benefit from this book. You will need to understand basic Python and machine learning concepts but require no prior knowledge of streaming.

What this book covers

Chapter 1, Introduction to Streaming Data, explains what streaming data is and why it is different from batch data. This chapter also explains the challenges that we should expect to encounter as well as the advantages of using streaming data.

Chapter 2, Architectures for Streaming and Real-Time Machine Learning, describes various architectures that can be used to set up streaming, and how they can be utilized.

Chapter 3, Data Analysis on Streaming Data, explores data analysis on streaming data, which includes real-time insights, real-time descriptive statistics, real-time visualizations, and basic alerting systems.

Chapter 4, Online Learning with River, covers the core concepts of online learning and also introduces you to the River library, which is a fundamental part of streaming.

Chapter 5, Online Anomaly Detection, covers online anomaly detection, explains how it is useful, and also provides a use case that involves building a program for detecting anomalies in streaming data.

Chapter 6, Online Classification, covers online classification, explains how it is useful, and also provides a use case that involves building a program for classifying streaming data.

Chapter 7, Online Regression, covers online regression, how it is useful, and also provides a use case that involves building a program for detecting regression in streaming data.

Chapter 8, Reinforcement Learning, introduces you to reinforcement learning. We will explore some of the key algorithms and also explore some use cases for it using Python.

Chapter 9, Drift and Drift Detection, focuses on helping us understand drift in online learning and learning how to build solutions to detect drift.

Chapter 10, Feature Transformation and Scaling, shows us how to build a feature transformation pipeline that works with real-time and streaming data.

Chapter 11, Catastrophic Forgetting, explores what catastrophic forgetting is, and shows us how we can deal with it using example use cases.

Chapter 12, Conclusion and Best Practices, acts as a review of the book and combines all the concepts explored throughout the book for us to revise and revisit as needed.

To get the most out of this book

For following along with this book, you can use online notebook environments like Google Colab, Kaggle Notebooks, or your own local Jupyter Notebook environment with Python 3. Also, a (free) AWS account would be needed for a small number of exercises.

If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book's GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

Download the example code files

You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Machine-Learning-for-Streaming-Data-with-Python. If there's an update to the code, it will be updated in the GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots and diagrams used in this book. You can download it here: https://packt.link/6rZ0m.

Conventions used

There are a number of text conventions used throughout this book.

Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "There is no predict_many function here, so it is necessary to do a loop with predict_one repeatedly."

A block of code is set as follows:

def self_made_decision_tree(observation): 
    if observation.can_speak: 
        if not observation.has_feathers: 
            return 'human'     
    return 'not human'  
for i,row in data.iterrows(): 
    print(self_made_decision_tree(row)) 

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

from sklearn.datasets import make_blobs 
X,y=make_blobs(shuffle=True,centers=2,n_samples=2000) 

Bold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or dialog boxes appear in bold. Here is an example: "Select System info from the Administration panel."

Tips or important notes

Appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, email us at [email protected] and mention the book title in the subject of your message.

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata and fill in the form.

Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Share Your Thoughts

Once you've read Machine Learning for Streaming Data with Python, we'd love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.

Your review is important to us and the tech community and will help us make sure we're delivering excellent quality content.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset