Title Page Copyright and Credits Python Machine Learning By Example Second Edition About Packt Why subscribe? Packt.com Dedication Foreword Contributors About the author About the reviewer Packt is searching for authors like you Preface Who this book is for What this book covers To get the most out of this book Download the example code files Download the color images Conventions used Get in touch Reviews Section 1: Fundamentals of Machine Learning Getting Started with Machine Learning and Python Defining machine learning and why we need it A very high-level overview of machine learning technology Types of machine learning tasks A brief history of the development of machine learning algorithms Core of machine learning – generalizing with data Overfitting, underfitting, and the bias-variance trade-off Avoiding overfitting with cross-validation Avoiding overfitting with regularization Avoiding overfitting with feature selection and dimensionality reduction Preprocessing, exploration, and feature engineering Missing values Label encoding One hot encoding Scaling Polynomial features Power transform Binning Combining models Voting and averaging Bagging Boosting Stacking Installing software and setting up Setting up Python and environments Installing the various packages NumPy SciPy Pandas Scikit-learn TensorFlow Summary Exercises Section 2: Practical Python Machine Learning By Example Exploring the 20 Newsgroups Dataset with Text Analysis Techniques How computers understand language - NLP Picking up NLP basics while touring popular NLP libraries Corpus Tokenization PoS tagging Named-entity recognition Stemming and lemmatization Semantics and topic modeling Getting the newsgroups data Exploring the newsgroups data Thinking about features for text data Counting the occurrence of each word token Text preprocessing Dropping stop words Stemming and lemmatizing words Visualizing the newsgroups data with t-SNE What is dimensionality reduction? t-SNE for dimensionality reduction Summary Exercises Mining the 20 Newsgroups Dataset with Clustering and Topic Modeling Algorithms Learning without guidance – unsupervised learning Clustering newsgroups data using k-means How does k-means clustering work? Implementing k-means from scratch Implementing k-means with scikit-learn Choosing the value of k Clustering newsgroups data using k-means Discovering underlying topics in newsgroups Topic modeling using NMF Topic modeling using LDA Summary Exercises Detecting Spam Email with Naive Bayes Getting started with classification Types of classification Applications of text classification Exploring Naïve Bayes Learning Bayes' theorem by examples The mechanics of Naïve Bayes Implementing Naïve Bayes from scratch Implementing Naïve Bayes with scikit-learn Classification performance evaluation Model tuning and cross-validation Summary Exercise Classifying Newsgroup Topics with Support Vector Machines Finding separating boundary with support vector machines Understanding how SVM works through different use cases Case 1 – identifying a separating hyperplane Case 2 – determining the optimal hyperplane Case 3 – handling outliers Implementing SVM Case 4 – dealing with more than two classes The kernels of SVM Case 5 – solving linearly non-separable problems Choosing between linear and RBF kernels Classifying newsgroup topics with SVMs More example – fetal state classification on cardiotocography A further example – breast cancer classification using SVM with TensorFlow Summary Exercise Predicting Online Ad Click-Through with Tree-Based Algorithms Brief overview of advertising click-through prediction Getting started with two types of data – numerical and categorical Exploring decision tree from root to leaves Constructing a decision tree The metrics for measuring a split Implementing a decision tree from scratch Predicting ad click-through with decision tree Ensembling decision trees – random forest Implementing random forest using TensorFlow Summary Exercise Predicting Online Ad Click-Through with Logistic Regression Converting categorical features to numerical – one-hot encoding and ordinal encoding Classifying data with logistic regression Getting started with the logistic function Jumping from the logistic function to logistic regression Training a logistic regression model Training a logistic regression model using gradient descent Predicting ad click-through with logistic regression using gradient descent Training a logistic regression model using stochastic gradient descent Training a logistic regression model with regularization Training on large datasets with online learning Handling multiclass classification Implementing logistic regression using TensorFlow Feature selection using random forest Summary Exercises Scaling Up Prediction to Terabyte Click Logs Learning the essentials of Apache Spark Breaking down Spark Installing Spark Launching and deploying Spark programs Programming in PySpark Learning on massive click logs with Spark Loading click logs Splitting and caching the data One-hot encoding categorical features Training and testing a logistic regression model Feature engineering on categorical variables with Spark Hashing categorical features Combining multiple variables – feature interaction Summary Exercises Stock Price Prediction with Regression Algorithms Brief overview of the stock market and stock prices What is regression? Mining stock price data Getting started with feature engineering Acquiring data and generating features Estimating with linear regression How does linear regression work? Implementing linear regression Estimating with decision tree regression Transitioning from classification trees to regression trees Implementing decision tree regression Implementing regression forest Estimating with support vector regression Implementing SVR Estimating with neural networks Demystifying neural networks Implementing neural networks Evaluating regression performance Predicting stock price with four regression algorithms Summary Exercise Section 3: Python Machine Learning Best Practices Machine Learning Best Practices Machine learning solution workflow Best practices in the data preparation stage Best practice 1 – completely understanding the project goal Best practice 2 – collecting all fields that are relevant Best practice 3 – maintaining the consistency of field values Best practice 4 – dealing with missing data Best practice 5 – storing large-scale data Best practices in the training sets generation stage Best practice 6 – identifying categorical features with numerical values Best practice 7 – deciding on whether or not to encode categorical features Best practice 8 – deciding on whether or not to select features, and if so, how to do so Best practice 9 – deciding on whether or not to reduce dimensionality, and if so, how to do so Best practice 10 – deciding on whether or not to rescale features Best practice 11 – performing feature engineering with domain expertise Best practice 12 – performing feature engineering without domain expertise Best practice 13 – documenting how each feature is generated Best practice 14 – extracting features from text data Best practices in the model training, evaluation, and selection stage Best practice 15 – choosing the right algorithm(s) to start with Naïve Bayes Logistic regression SVM Random forest (or decision tree) Neural networks Best practice 16 – reducing overfitting Best practice 17 – diagnosing overfitting and underfitting Best practice 18 – modeling on large-scale datasets Best practices in the deployment and monitoring stage Best practice 19 – saving, loading, and reusing models Best practice 20 – monitoring model performance Best practice 21 – updating models regularly Summary Exercises Other Books You May Enjoy Leave a review - let other readers know what you think