Home Page Icon
Home Page
Table of Contents for
Table of Contents
Close
Table of Contents
by Wei-Meng Lee
Python Machine Learning
Cover
Introduction
CHAPTER 1: Introduction to Machine Learning
What Is Machine Learning?
Getting the Tools
Summary
CHAPTER 2: Extending Python Using NumPy
What Is NumPy?
Creating NumPy Arrays
Array Indexing
Reshaping Arrays
Array Math
Array Assignment
Summary
CHAPTER 3: Manipulating Tabular Data Using Pandas
What Is Pandas?
Pandas Series
Pandas DataFrame
Summary
CHAPTER 4: Data Visualization Using matplotlib
What Is matplotlib?
Plotting Line Charts
Plotting Bar Charts
Plotting Pie Charts
Plotting Scatter Plots
Plotting Using Seaborn
Summary
CHAPTER 5: Getting Started with Scikit‐learn for Machine Learning
Introduction to Scikit‐learn
Getting Datasets
Getting Started with Scikit‐learn
Data Cleansing
Summary
CHAPTER 6: Supervised Learning—Linear Regression
Types of Linear Regression
Linear Regression
Polynomial Regression
Summary
CHAPTER 7: Supervised Learning—Classification Using Logistic Regression
What Is Logistic Regression?
Using the Breast Cancer Wisconsin (Diagnostic) Data Set
Summary
CHAPTER 8: Supervised Learning—Classification Using Support Vector Machines
What Is a Support Vector Machine?
Kernel Trick
Types of Kernels
Using SVM for Real‐Life Problems
Summary
CHAPTER 9: Supervised Learning—Classification Using K‐Nearest Neighbors (KNN)
What Is K‐Nearest Neighbors?
Summary
CHAPTER 10: Unsupervised Learning—Clustering Using K‐Means
What Is Unsupervised Learning?
Using K‐Means to Solve Real‐Life Problems
Summary
CHAPTER 11: Using Azure Machine Learning Studio
What Is Microsoft Azure Machine Learning Studio?
Summary
CHAPTER 12: Deploying Machine Learning Models
Deploying ML
Case Study
Deploying the Model
Creating the Client Application to Use the Model
Summary
Index
End User License Agreement
Search in book...
Toggle Font Controls
Playlists
Add To
Create new playlist
Name your new playlist
Playlist description (optional)
Cancel
Create playlist
Sign In
Email address
Password
Forgot Password?
Create account
Login
or
Continue with Facebook
Continue with Google
Sign Up
Full Name
Email address
Confirm Email Address
Password
Login
Create account
or
Continue with Facebook
Continue with Google
Prev
Previous Chapter
Cover
Next
Next Chapter
Title Page
Table of Contents
Cover
Introduction
CHAPTER 1: Introduction to Machine Learning
What Is Machine Learning?
Getting the Tools
Summary
CHAPTER 2: Extending Python Using NumPy
What Is NumPy?
Creating NumPy Arrays
Array Indexing
Reshaping Arrays
Array Math
Array Assignment
Summary
CHAPTER 3: Manipulating Tabular Data Using Pandas
What Is Pandas?
Pandas Series
Pandas DataFrame
Summary
CHAPTER 4: Data Visualization Using matplotlib
What Is matplotlib?
Plotting Line Charts
Plotting Bar Charts
Plotting Pie Charts
Plotting Scatter Plots
Plotting Using Seaborn
Summary
CHAPTER 5: Getting Started with Scikit‐learn for Machine Learning
Introduction to Scikit‐learn
Getting Datasets
Getting Started with Scikit‐learn
Data Cleansing
Summary
CHAPTER 6: Supervised Learning—Linear Regression
Types of Linear Regression
Linear Regression
Polynomial Regression
Summary
CHAPTER 7: Supervised Learning—Classification Using Logistic Regression
What Is Logistic Regression?
Using the Breast Cancer Wisconsin (Diagnostic) Data Set
Summary
CHAPTER 8: Supervised Learning—Classification Using Support Vector Machines
What Is a Support Vector Machine?
Kernel Trick
Types of Kernels
Using SVM for Real‐Life Problems
Summary
CHAPTER 9: Supervised Learning—Classification Using K‐Nearest Neighbors (KNN)
What Is K‐Nearest Neighbors?
Summary
CHAPTER 10: Unsupervised Learning—Clustering Using K‐Means
What Is Unsupervised Learning?
Using K‐Means to Solve Real‐Life Problems
Summary
CHAPTER 11: Using Azure Machine Learning Studio
What Is Microsoft Azure Machine Learning Studio?
Summary
CHAPTER 12: Deploying Machine Learning Models
Deploying ML
Case Study
Deploying the Model
Creating the Client Application to Use the Model
Summary
Index
End User License Agreement
List of Tables
Chapter 3
Table 3.1: Common DataFrame Operations
Chapter 4
Table 4.1: Location Strings and Corresponding Location Codes
List of Illustrations
Chapter 1
Figure 1.1: In traditional programming, the data and the program produce the ou...
Figure 1.2: In machine learning, the data and the output produce the program
Figure 1.3: Using regression to predict the expected selling price of a house
Figure 1.4: Using classification to categorize data into distinct classes
Figure 1.5: Plotting the unlabeled data
Figure 1.6: Clustering the points into distinct groups
Figure 1.7: Downloading Anaconda for Python 3
Figure 1.8: The Jupyter Notebook Home page
Figure 1.9: Jupyter Notebook showing the Home page
Figure 1.10: Creating a new Python 3 notebook
Figure 1.11: The Python 3 notebook created in Jupyter Notebook
Figure 1.12: The notebook with two cells
Figure 1.13: Running (executing) the code in the cell
Figure 1.14: The number displayed next to the cell indicates the order in which...
Figure 1.15: The notebook with three cells
Figure 1.16: Executing the cells in non‐linear order
Figure 1.17: Restarting the kernel
Figure 1.18: Exporting your notebook to a Python file
Figure 1.19: The tooltip displays help information
Figure 1.20: Expanding the tooltip to show more detail
Chapter 2
Figure 2.1: Writing the index for row and column in between the numbers
Figure 2.2: Performing slicing using the new approach
Figure 2.3: Writing the negative indices for rows and columns
Figure 2.4: Slicing returns a reference to the original array and not a copy
Figure 2.5: Using array addition for vector addition
Figure 2.6: Performing matrix multiplication on two arrays
Figure 2.7: Performing cumulative sums on columns and rows
Figure 2.8: Understanding the meaning of the result of the
argsort()
function
Chapter 3
Figure 3.1: A Pandas Series
Figure 3.2: A Pandas DataFrame
Chapter 4
Figure 4.1: A line graph plotted using matplotlib
Figure 4.2: The line chart with the title and the labels for the x‐ and ...
Figure 4.3: The chart with the ggplot style applied to it
Figure 4.4: The chart with the grayscale style applied to it
Figure 4.5: The chart with two line graphs
Figure 4.6: The chart with a legend displayed
Figure 4.7: Plotting a bar chart
Figure 4.8: Plotting two overlapping bar charts on the same figure
Figure 4.9: The bar chart with the alphabetically arranged x‐axis
Figure 4.10: The bar chart with the correct x‐axis
Figure 4.11: Plotting a pie chart
Figure 4.12: The pie chart with two exploded slices
Figure 4.13: Displaying the pie chart with custom colors
Figure 4.14: Setting the start angle for the pie chart
Figure 4.15: Displaying the legend on the pie chart
Figure 4.16: Plotting a scatter plot
Figure 4.17: Combining multiple scatter plots into a single chart
Figure 4.18: Combining two charts into a single figure
Figure 4.19: Displaying a factorplot showing the distribution of men and women ...
Figure 4.20: A factorplot showing the survival rate of men, women, and children...
Figure 4.21: An lmplot showing the relationship between the petal length and wi...
Figure 4.22: A swarmplot showing the distribution of salaries for men and women
Chapter 5
Figure 5.1: The petal and sepal of a flower
Figure 5.2: The fields in the Iris dataset and its target
Figure 5.3: Scatter plot showing the linearly distributed data points
Figure 5.4: Scatter plot showing the three clusters of data points generated
Figure 5.5: Scatter plot showing the two clusters of data points distributed in...
Figure 5.6: Plotting the weights against heights for a group of people
Figure 5.7: Plotting the linear regression line
Figure 5.8: The linear regression line
Figure 5.9: Calculating the Residual Sum of Squares for linear regression
Figure 5.10: The formula for calculating R‐Squared
Figure 5.11: Examples of finding the Interquartile Range (IQR)
Chapter 6
Figure 6.1: Some terminologies for features and label
Figure 6.2: The DataFrame containing all of the features
Figure 6.3: The DataFrame containing all of the features and the label
Figure 6.4: Scatter plot showing the relationship between LSTAT and MEDV
Figure 6.5: Scatter plot showing the relationship between RM and MEDV
Figure 6.6: The 3D scatter plot showing the relationship between LSTAT, RM, and...
Figure 6.7: A scatter plot showing the predicted prices vs. the actual prices
Figure 6.8: The hyperplane showing the predictions for the two features...
Figure 6.9: Rotating the chart to have a better view of the hyperplane
Figure 6.10: A scatter plot of points
Figure 6.11: The regression line fitting the points
Figure 6.12: A curved line trying to fit the points
Figure 6.13: A curved line trying to fit most of the points
Figure 6.14: The line now fits the points perfectly
Figure 6.15: The straight line can't fit all of the points, so the bias is...
Figure 6.16: The curvy line fits all of the points, so the bias is low
Figure 6.17: The straight line works well with unseen data, and its result does...
Figure 6.18: The curvy line does not work well with unseen data, and its result...
Figure 6.19: You should aim for a line that has high bias and low variance
Figure 6.20: The hyperplane in the polynomial multiple regression
Figure 6.21: Rotate the chart to see the different perspectives of the hyperpla...
Chapter 7
Figure 7.1: Some problems have binary outcomes
Figure 7.2: Using linear regression to solve the voting preferences problem lea...
Figure 7.3: Logistic regression predicts the probability of an outcome, rather ...
Figure 7.4: How to calculate the odds of an event happening
Figure 7.5: The formula for the logit function
Figure 7.6: The logit curve
Figure 7.7: Flipping the logit curve into a Sigmoid curve
Figure 7.8: The formula for the Sigmoid function
Figure 7.9: The sigmoid curve plotted using matplotlib
Figure 7.10: Expressing the sigmoid function using the intercept and coefficien...
Figure 7.11: The scatter plot showing the relationships between the mean radius...
Figure 7.12: Plotting three features using a 3D map
Figure 7.13: You can interact with the 3D plot when you run the application out...
Figure 7.14: Plotting a scatter plot based on one feature
Figure 7.15: The sigmoid curve fitting to the two sets of points
Figure 7.16: Splitting the dataset into training and test sets
Figure 7.17: The confusion matrix for the prediction
Figure 7.18: Formula for calculating accuracy
Figure 7.19: Formula for calculating precision
Figure 7.20: Formula for calculating recall
Figure 7.21: The point at threshold 0.5
Figure 7.22: The value of TPR and FPR for threshold 0
Figure 7.23: The value of TPR and FPR for threshold 1
Figure 7.24: Plotting the points for threshold 0, 0.5, and 1.0.
Figure 7.25: Plotting the ROC curve and calculating the AUC
Chapter 8
Figure 8.1: Using SVM to separate two classes of animals
Figure 8.2: A set of points that can be separated using SVM
Figure 8.3: Two possible ways to split the points into two classes
Figure 8.4: SVM seeks to split the two classes with the widest margin
Figure 8.5: Support vectors are points that lie on the margins
Figure 8.6: The formula for the hyperplane and its accompanying two margins
Figure 8.7: Plotting the points using Seaborn
Figure 8.8: Relationships between the variables in the formula and the variable...
Figure 8.9: The two intercepts for the hyperplane
Figure 8.10: The hyperplane and the two margins
Figure 8.11: A scatter plot of two groups of points distributed in circular fas...
Figure 8.12: Plotting the points in the three dimensions
Figure 8.13: The various perspectives on the same dataset in 3D
Figure 8.14: The formula for the hyperplane in 3D and its corresponding variabl...
Figure 8.15: Formula for finding the hyperplane in 3D
Figure 8.16: The hyperplane in 3D cutting through the two sets of points
Figure 8.17: A kernel function transforms your data from nonlinear spaces to li...
Figure 8.18: Scatter plot of the Iris dataset's first two features
Figure 8.19: Using the SVM linear kernel
Figure 8.20: A high C focuses more on getting the points correctly classified
Figure 8.21: A low C aims for the widest margin, but may classify some points i...
Figure 8.22: Using SVM with varying values of C
Figure 8.23: The Iris dataset trained using the RBF kernel
Figure 8.24: A set of points belonging to two classes
Figure 8.25: A low Gamma value allows every point to have equal reach
Figure 8.26: A high Gamma value focuses more on points close to the boundary
Figure 8.27: The effects of classifying the points using varying values of C an...
Figure 8.28: The classification of the Iris dataset using polynomial kernel of ...
Figure 8.29: Plotting the points on a scatter plot
Figure 8.30: Separating the points into two classes
Chapter 9
Figure 9.1: The classification of a point depends on the majority of its neighb...
Figure 9.2: Plotting the points visually
Figure 9.3: The classification of the yellow point based on the different value...
Figure 9.4: Plotting out the Sepal width against the Sepal length in a scatter ...
Figure 9.5: The classification boundary based on k = 1
Figure 9.6: Understanding the concept of overfitting, underfitting, and a good ...
Figure 9.7: The effects of varying the values of k
Figure 9.8: How cross‐validation works
Figure 9.9: The chart of miscalculations for each k
Figure 9.10: The optimal value of k at 13
Chapter 10
Figure 10.1: Labeled data
Figure 10.2: Unlabeled data
Figure 10.3: A set of unlabeled data points
Figure 10.4: Clustering the points into two distinct clusters
Figure 10.5: Measuring the distance of each point with respect to each centroid...
Figure 10.6: Groupings of the points after the first round of clustering
Figure 10.7: Repositioning the centroids by taking the average of all the point...
Figure 10.8: Measuring the distance between each centroid; if the distance is 0...
Figure 10.9: The scatter plot showing all the points
Figure 10.10: The scatter plot with the points and the three random centroids
Figure 10.11: The scatter plot showing the clustering of the points as well as ...
Figure 10.12: Using the KMeans class in Scikit‐learn to do the clustering
Figure 10.13: The set of points and their positions
Figure 10.14: The chart showing the various values of K and their corresponding...
Figure 10.15: The scatter plot showing the distribution of waist circumference ...
Figure 10.16: Clustering the points into two clusters
Figure 10.17: Clustering the points into four clusters
Chapter 11
Figure 11.1: You can download the training and testing datasets from Kaggle
Figure 11.2: Examining the data in Excel
Figure 11.3: Click the “Sign up here” link for first‐time...
Figure 11.4: You can choose from the various options available to use MAML
Figure 11.5: The left panel of MAML
Figure 11.6: Uploading a dataset to the MAML
Figure 11.7: Choose a file to upload as a dataset
Figure 11.8: Creating a new blank experiment in MAML
Figure 11.9: The canvas representing your experiment
Figure 11.10: Naming your experiment
Figure 11.11: Using the dataset that you have uploaded
Figure 11.12: Dragging and dropping the dataset onto the canvas
Figure 11.13: Visualizing the content of the dataset
Figure 11.14: Viewing the dataset
Figure 11.15: Viewing the Survived column
Figure 11.16: Use the Select Columns in Dataset module to filter columns
Figure 11.17: Selecting the fields that you want to use as features
Figure 11.18: Making specific fields categorical
Figure 11.19: Removing rows that have missing values in the Age column
Figure 11.20: Viewing the cleaned and filtered dataset
Figure 11.21: Splitting the data into training and testing datasets
Figure 11.22: Training your model using the Two‐Class Logistic Regressi...
Figure 11.23: Scoring your model using the testing dataset and the trained mode...
Figure 11.24: Viewing the confusion matrix for the learning model
Figure 11.25: Using another algorithm for training the alternative model
Figure 11.26: Evaluating the performance of the two models
Figure 11.27: Viewing the metrics for the two learning algorithms
Figure 11.28: Publishing the learning model as a web service
Figure 11.29: The test page for the web service
Figure 11.30: Testing the web service with some data
Figure 11.31: The Consume link at the top of the web service page
Figure 11.32: The sample code for accessing the web service written in the thre...
Figure 11.33: Creating a new notebook in MAML
Figure 11.34: Testing the code in the Python notebook
Figure 11.35: The result returned by the web service
Chapter 12
Figure 12.1: Deploying your machine learning model as a REST API allows fro...
Figure 12.2: Matrix showing the various correlation factors
Figure 12.3: Heatmap produced by Seaborn showing the correlation factors
Figure 12.4: Ranking the performance of the various algorithms
Guide
Cover
Table of Contents
Begin Reading
Pages
iii
iv
v
vii
ix
xi
xiii
xxiii
xxiv
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
Add Highlight
No Comment
..................Content has been hidden....................
You can't read the all page of ebook, please click
here
login for view all page.
Day Mode
Cloud Mode
Night Mode
Reset