Symbols
A
action-value 473
activation function 256
activation functions
linear 268
ReLU 268
sigmoid 268
softmax 268
tanh 268
ad click-through
predicting, with logistic regression 165, 166
ad click-through prediction 110
with decision tree 134, 136, 137, 138, 139, 140
adjusted R² 245
AI-based assistance 4
AI plus human intelligence 4
AlphaGo 3
Anaconda 38
reference link 37
Apache Hadoop
URL 355
Arcene Dataset 97
area under the curve (AUC) 68
Artificial General Intelligence (AGI) 452
Artificial Intelligence (AI) 8
artificial masterpieces, Google Arts & Culture
reference link 261
artificial neural networks (ANNs) 11, 254
association 315
attributes 315
automation
versus machine learning 5
averaging 32
B
Backpropagation Through Time (BPTT) 420
bag of words (BoW) 362
Bag of Words (BoW) model 301
basic linear algebra
reference link 8
Bayes 48
Bayes' theorem
Bellman optimality equation
reference link 460
bias-variance trade-off 17, 18
Bidirectional Encoder Representations from Transformers (BERT) 448
bigrams 289
binarization 360
binning 31
Blackjack environment
reference link 469
bootstrap aggregating 140
bootstrapping 32
Box-Cox transformation 31
C
C4.5 116
converting, to numerical features 148, 150, 151
categorical variables
categories 44
chain rule 259
Chebyshev distance 316
Chi-squared Automatic Interaction Detector (CHAID) 116
classes 44
binary classification 45
multiclass classification 46, 47
multi-label classification 47, 48
Classification and Regression Tree (CART) 116
classification performance
click-through rate (CTR) 110
clothing Fashion-MNIST
reference link 388
clothing image classifier
improving, with data augmentation 406, 407, 408, 409
clothing image dataset 388, 389, 391
clothing images, classifying with CNNs 392
CNN model, architecting 392, 393, 394
CNN model, fitting 395, 396, 397, 398
convolutional filters, visualizing 398, 399, 400
clustering 315
CNN 382
architecting, for classification 387, 388
convolutional layer 382, 383, 384
nonlinear layer 384
CNN classifier
boosting, with data augmentation 400
color restoration 261
computation graphs 40
computer vision 260
conda 37
confusion matrix 66
Continuous Bag of Words (CBOW) 363
convex function 154
reference link 155
convolutional layer 382, 383, 384
cost function 9, 155, 157, 158
Cross-Industry Standard Process for Data Mining (CRISP-DM) 25
business understanding 26
data preparation 26
data understanding 26
deployment phase 26
evaluation phase 26
modeling phase 26
URL 25
cross-validation
used, for avoiding overfitting 19, 20, 21
used, for tuning models 70, 72, 73
cumulative rewards 455
D
data
acquiring 222, 223, 224, 225, 226
classifying, with logistic regression 151
data augmentation
clothing image classifier, improving 406, 407, 408, 409
CNN classifier, boosting 400
DataFrames 185
data preparation stage
best practices 349, 350, 351, 352, 353, 354, 355
data preprocessing 355
data technology (DT) 6
decision hyperplane 78
decision tree
ad click-through prediction 134, 136, 137, 138, 139, 140
ensembling 140, 142, 143, 144, 145
implementing 124, 125, 127, 128, 129, 131, 132
implementing, with scikit-learn 133, 134
decision tree module
reference link 133
decision tree regression
estimating with 234
implementing 237, 238, 240, 241
decoder 446
deep learning 11
deep learning (DL) 254
deep neural networks 30
deployment and monitoring stage
best practices 374, 375, 376, 377, 378
dimensionality reduction 25, 307, 308
used, for avoiding overfitting 24
discretization 361
distributed computing 294
document frequency 334
Dorothea Dataset 97
dot product 382
Dow Jones Industrial Average (DJIA) 217
downsampling layer 385
dynamic programming
FrozenLake environment, solving 457
E
edges 255
Elbow method 331
encoder 446
environment 453
episode 457
epsilon-greedy policy 482
Euclidean distance 316
evidence 52
exploitation 482
exploration 482
exploration phase 26
F
f1 score 66
face image dataset
face images
classifying, with SVMs 98
feature 24
feature-based bagging 141
feature crossing. See also feature interaction
feature engineering 30, 204, 218, 219, 220, 221, 355
on categorical variables, with Spark 203
feature hashing. See also hashing trick
feature interaction 207, 209, 210
feature map 382
feature projection 25
generating 222, 223, 224, 225, 226
feature selection 170
L1 regularization, examining for 170, 171
used, for avoiding overfitting 24
feedforward neural network 256
fetal state classification
on cardiotocography 104, 105, 106
forget gate 422
FrozenLake
solving, with policy iteration algorithm 464, 465, 466, 467, 468
solving, with value iteration algorithm 460, 461, 462, 463, 464
FrozenLake environment
solving, with dynamic programming 457
fundamental analysis 214
G
Gated Recurrent Unit (GRU) 420
Gaussian kernel 93
Generative Pre-training Transformer (GPT) 448
genetic algorithms (GA) 11
URL 286
Georgetown-IBM experiment
reference link 283
Gini Impurity 117, 118, 119, 120
Google Cloud Storage
reference link 355
Google Neural Machine Translation (GNMT) 261
gradient boosted trees (GBT) 142, 144, 145
gradient boosting machines 142
gradient descent 158
ad click-through, predicting with logistic regression 165, 166
logistic regression model, training 158, 159, 160, 161, 163, 164
gradients 41
Graphical Processing Units (GPUs) 11
Graphviz
URL 133
GraphX 185
H
Hadoop Distributed File System (HDFS) 192
handwritten digit recognition 46
handwritten digits MNIST dataset
reference link 388
harmonic mean 66
hashing categorical
hashing collision 205
hashing trick 204
Heterogeneity Activity Recognition Dataset 97
HIGGS Dataset 97
high-order polynomial function 22
high variance 15
holdout method 21
horizontal flipping
for data augmentation 400, 401, 402, 403
hyperplane 76
I
image-based search engines 261
image classification performance
ImageDataGenerator module
reference link 400
image recognition 261
IMDb
URL 423
imputing 27
Information Gain 120, 121, 122
inner cross-validation 21
input gate 422
interaction 30
intercept 154
Internet of Things (IoT) 6
interquartile range 29
Iterative Dichotomiser 3 (ID3) 116
K
k
value, selecting 331, 332, 333
Kaggle
URL 8
k equal-sized folds 20
Keras
URL 266
kernel coefficient 93
kernel function 93
kernels
linearly non-separable problems, solving 91, 92, 93, 94, 96
k-fold cross-validation 20
k-means
implementing 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329
implementing, with scikit-learn 329, 330, 331
used, for clustering newsgroups data 316, 333, 334, 335, 336, 337
k-means clustering
k-nearest neighbors (KNN) 359
L
L1 regularization 169
examining, for feature selection 170, 171
L2 regularization 169
labeled data 315
Labeled Faces in the Wild (LFW) people dataset
reference link 98
label encoding 28
labels 44
Laplace smoothing 54
Lasso 169
latent Dirichlet allocation (LDA)
using, for topic modeling 342, 343, 344, 345
layer 255
layers
adding, to neural network 260
leaf 112
Leaky ReLU 268
learning_curve module
reference link 373
learning rate 158
Leave-One-Out-Cross-Validation (LOOCV) 20
liblinear
reference link 80
libsvm
reference link 80
likelihood 52
linear function 268
linear kernel 96
linearly non-separable problems
solving, with kernels 91, 92, 93, 94, 96
linear regression
estimating with 226
example 216
implementing 228, 229, 230, 231, 232
implementing, with scikit-learn 232
implementing, with TensorFlow 233, 234
LinearSVC
reference link 102
logarithmic loss 158
logic gate
reference link 421
logistic function 152, 153, 256
logistic regression 153, 154, 368
ad click-through, predicting 165, 166
data, classifying 151
implementing, with TensorFlow 178, 180
logistic regression model
training, with gradient descent 158, 159, 160, 161, 163, 164
training, with regularization 169, 170
training, with stochastic gradient descent 166, 168, 169
log loss 158
London FTSE-100
reference link 218
Long Short-Term Memory
long-term dependencies, overcoming 420, 421
Long Short-Term Memory (LSTM) 420
loss function 9
low bias 14
LSTM recurrent cell
forget gate 422
input gate 422
memory unit 422
output gate 422
M
machine 363
machine learning 2
core 13
prerequisites 7
reinforcement learning 9
supervised learning 9
types 8
unsupervised learning 9
versus automation 5
versus traditional programming 5
machine learning algorithms
machine learning library (MLlib) 185
machine learning regression
problems 216
machine learning solution
machine vision 261
Manhattan distance 316
many-to-many (synced) RNNs 416, 417
many-to-many (unsynced) RNNs 417, 418
margin 78
massive click logs
data, caching 196
learning, with Spark 192
Massive Open Online Courses (MOOCs) 8
Matplotlib 40
matplotlib package
reference link 299
maximum-margin 79
mean absolute error (MAE) 245
mean squared error (MSE) 18, 154, 227, 258
memory unit 422
Miniconda 37
reference link 37
missing data imputation 351
missing values
dealing with 27
MNIST (Modified National Institute of Standards and Technology) 46
model-free approach 468
models
combining 31
tuning, with cross-validation 70, 72, 73
model training, evaluation, and selection stage
best practices 367, 369, 370, 371, 372, 373, 374
Monte Carlo learning
performing 468
Monte Carlo policy evaluation
Moore's law 12
MovieLens
URL 60
movie rating dataset
reference link 60
movie recommender
building, with Naïve Bayes 60, 62, 63, 64, 65
movie review sentiment, analyzing with RNNs 423
data analysis 423, 424, 425, 426
data preprocessing 423, 424, 425, 426
multiple LSTM layers, stacking 429, 430, 431
simple LSTM network, building 426, 428
multiclass classification 46, 47, 268
multi-head attention 447
multi-label classification 47, 48
multi-layer perceptron (MLP) 265
multinomial classification 46
multinomial logistic regression 175
multiple classes
dealing with 85, 87, 88, 89, 90, 91
N
Naïve 48
implementing, with sci-kit learn 59
movie recommender, building 60, 62, 63, 64, 65
named entities 285
named entity recognition (NER) 285
NASDAQ Composite
reference link 218
natural language 282
natural language processing (NLP) 261, 282, 283
history 283
Natural Language Toolkit (NLTK) 285
negative hyperplane 78
NER 292
nested cross-validation 21
neural machine translation system, Facebook
reference link 283
neural networks 370
building 262
demystifying 254
fine-tuning 273, 274, 275, 276, 277, 278, 279
implementing 262, 263, 264, 265
implementing, with scikit-learn 265
implementing, with TensorFlow 266, 267
layers, adding 260
overfitting, preventing 269
stock prices, predicting 271
newsgroups
underlying topics, discovering 337
newsgroups data
clustering, with k-means 316, 333, 334, 335, 336, 337
visualizing, with t-SNE 307
n-grams 289
NLP libraries
nltk
URL 286
NLTK 40
node 112
nodes 255
no free lunch theorem
reference link 8
non-convex function 154
reference link 155
non-exhaustive scheme 20
nonlinear layer 384
non-negative matrix factorization (NMF) 308
used, for topic modeling 338, 339, 340, 341
categorical features, converting to 148, 150, 151
NumPy 39
URL 38
O
one-hot encoding categorical features 196, 198, 199, 200
one-to-many RNNs 416
online learning
large datasets, training 172, 174, 175
on-policy approach 473
on-policy Monte Carlo control
performing 473, 474, 475, 476, 477
ontology 284
OpenAI
URL 452
OpenAI Gym
URL 452
optimal hyperplane
ordinal feature 111
outer cross-validation 21
outliers
output gate 422
avoiding, with cross-validation 19, 20, 21
avoiding, with dimensionality reduction 24
avoiding, with feature selection 24
avoiding, with regularization 22, 24
preventing, in neural networks 269
P
pandas library 39
part-of-speech (PoS) tagging 291, 412
pickle
plot_learning_curve function
reference link 373
policy 456
policy evaluation step 456
policy iteration algorithm
FrozenLake, solving 464, 465, 466, 467, 468
polynomial transformation 30, 361
positive hyperplane 78
posterior 52
Power transforms 30
precision 66
principal component analysis (PCA) 308, 358
image classification performance, boosting 103, 104
reference link 103
prior 52
probability 101
reference link 8
Project Gutenberg
URL 432
projection 315
PySpark 40
programming 189, 190, 191, 192
Python 36
setting up 37
Python Imaging Library (PIL) 99
Python packages
installing 38
PyTorch 40
references 451
Q
Q-learning algorithm
developing 482, 483, 484, 485, 486
Taxi problem, solving 477
qualitative features 111
quantitative features 112
Q-value 473
R
R² 245
radial basis function (RBF) kernel 93
random access memory (RAM) 185
using, for feature selection 180, 181
RBF kernel 96
recall 66
receiver operating characteristic (ROC) 68
receptive fields 384
Rectified Linear Unit (ReLU) 256, 268
recurrent neural networks (RNNs) 412
many-to-many (synced) RNNs 416, 417
many-to-many (unsynced) RNNs 417, 418
one-to-many RNNs 416
regression algorithms
stock prices, predicting 246, 247, 248, 249, 250
regression forest
implementing 242
regression performance
regression trees 234, 235, 236, 237
regularization
used, for avoiding overfitting 22, 24
used, for training logistic regression model 169, 170
approaches 456
deterministic 456
policy-based approach 456
stochastic 456
value-based approach 456
reinforcement learning, elements
action 454
agent 454
environment 453
rewards 454
states 454
ReLU function 258
Resilient Distributed Datasets (RDD) 189
reference link 189
returns 455
ridge 169
RNN architecture
learning 412
RNN model
RNN text generator
training 438, 439, 440, 441, 444
root 112
root mean squared error (RMSE) 245
rotation
for data augmentation 404
Russell 2000 (RUT) index
reference link 218
S
S3, Amazon Web Services
reference link 355
scaling 29
scikit-learn
decision tree, implementing 133, 134
k-means, implementing with 329, 330, 331
linear regression, implementing 232
Naïve Bayes, implementing 59
neural networks, implementing 265
URL 38
scikit-learn library 40
SciPy 39
Seaborn 40
seaborn package
reference link 299
self-attention 446
semantics 294
semi-supervised learning 10
separating boundary
finding, with SVM 76
separating hyperplane
identifying 77
sequence 412
sequence modeling 412
sequential learning 412
shifting
for data augmentation 405
sigmoid function 152, 256, 268
similarity querying 294
SimpleImputer class
reference link 351
single-layer neural network 254
skip-gram 363
softmax function 268
softmax regression 175
S&P 500 index
reference link 218
URL 286
Spark
download link 186
fundamentals 184
massive click logs, learning with 192
used, for feature engineering on categorical variables 203
Spark, cluster mode approaches
Apache Hadoop YARN 188
Apache Mesos 188
Kubernetes 188
standalone cluster mode 188
Spark, components 184
GraphX 185
MLlib 185
Spark Core 185
Spark SQL 185
Spark Streaming 185
Spark Core 185
Spark, documentation and tutorials
reference link 185
Spark programs
deploying 187
launching 187
Spark SQL 185
Spark Streaming 185
stacking 36
statistical learning 11
steepest descent 158
step size 158
stochastic gradient descent
used, for training logistic regression model 166, 168, 169
stochastic gradient descent (SGD) 232
stock index 217
stock market 214
stock price data
stock prices 214
predicting, with neural networks 271
predicting, with regression algorithms 246, 247, 248, 249, 250
stop words
Storage, in Microsoft Azure
reference link 355
sum of squared errors (SSE) 332
sum of within-cluster distances 332
supervised learning 9
support vector machine (SVM) 48, 242
support vector regression
SVM 370
face images, classifying 98
separating boundary, finding 76
SVM-based image classifier
SVR
implementing 244
T
targets 315
target variables 315
Taxi environment
reference link 477
simulating 477, 478, 479, 480, 481, 482
Taxi problem
solving, with Q-learning algorithm 477
Tay
reference link 284
t-distributed Stochastic Neighbor Embedding (t-SNE)
for dimensionality reduction 308, 309, 310, 311
newsgroups data, visualizing 307
technical analysis 214
TensorFlow 40
linear regression, implementing 233, 234
logistic regression, implementing 178, 180
neural networks, implementing 266, 267
URL 38
TensorFlow 2 40
term frequency-inverse document frequency (tf-idf) 335, 362
terminal node 112
testing samples 13
testing sets 13
TextBlob 285
URL 287
text data, features 301
inflectional and derivational forms of words, reducing 305, 306, 307
occurrence, counting of word token 301, 302, 303, 304
text preprocessing 304
text datasets, NLTK
reference link 287
text preprocessing 304
tokens 289
topic 342
topic model 337
with latent Dirichlet allocation (LDA) 342, 343, 344, 345
with non-negative matrix factorization (NMF) 338, 339, 340, 341
Torch
URL 450
traditional programming
versus machine learning 5
training samples 13
training sets 13
training sets generation stage
best practices 355, 356, 357, 358, 359, 360, 361, 362, 363, 364
Transformer model 444
transition matrix 460
true positive rate 66
Turing test
reference link 283
U
underfitting 16
unigrams 289
units 255
association 315
clustering 315
projection 315
types 315
unsupervised learning 308
URL Reputation Dataset 97
V
validation samples 13
validation sets 13
value iteration algorithm 460
FrozenLake, solving 460, 461, 462, 463, 464
vanishing gradient problem 420
variance 17
voting 32
W
War and Peace, writing with RNNs 431
RNN text generator, building 436, 437, 438
RNN text generator, training 438, 439, 440, 442, 443, 444
training data, acquiring 432, 433
training data, analyzing 432, 433
training set, constructing for RNN text generator 433, 435, 436
weak learners 34
weights 153
with pre-trained models 364, 365, 366, 367
word token
occurrence, counting 301, 302, 303, 304
word_tokenize function 290
word vectorization 294
working environment
setting up 450
X
XGBoost package
reference link 144
XOR gate
reference link 96
Y
Yet Another Resource Negotiator (YARN) 188
YouTube Multiview Video Games Dataset 97