Index
Symbols
1-gram model 280
5x2 cross-validation 219
7-Zip
URL 276
A
accuracy
versus classification error 59
accuracy 222
action-value function
about 712
greedy policy, computing from 720
action-value function estimation
with Monte Carlo (MC) 719
activation functions
Rectified linear unit (ReLU) 497, 498
reference link 499
selecting, for multilayer neural networks 491
softmax function 494
activation functions, selecting via tf.keras.activations
reference link 516
activations
computing, in RNN 603, 604, 605
AdaBoost
applying, scikit-learn used 269, 270, 271
AdaBoost recognition
about 264
Adaline
about 56
implementing, in Python 42, 43, 44, 45
Adaline implementation
converting, into algorithm for logistic regression 70, 71, 73
adaptive boosting
weak learner, leveraging via 264, 265
Adaptive Boosting (AdaBoost)
about 264
ADAptive LInear NEuron (Adaline)
Adaline, implementing in Python 42, 43, 44, 45
cost functions, minimizing with gradient descent 39, 40, 41
gradient descent, improving through feature scaling 46, 47, 48
large-scale machine learning 48, 49, 50, 52, 53
stochastic gradient descent 48, 49, 50, 52, 53
advanced RNN models 613
agent
agent-environment interface
agglomerative clustering
applying, via scikit-learn 401
agglomerative hierarchical clustering 390
AI winters
about 411
algorithms
debugging, with learning curves 209
debugging, with validation curves 209
selecting, with nested cross-validation 218, 219, 220
Alphafold
reference link 2
Anaconda
about 15
Anaconda installer
download link 15
Anaconda quick start guide
reference link 15
artificial neural networks
complex functions, modeling 410, 411
artificial intelligence (AI) 411
about 1
artificial neural network
training 441
artificial neurons
perceptron learning rule 23, 24, 25, 26
attention mechanism 643
autoencoders
based on size of latent space 651
connection, with dimensionality reduction 650
automatic differentiation
reference link 446
average linkage 391
average-pooling 559
B
backpropagation
intuition, developing 445, 446
reference link 446
used, for training neural network 447, 448, 449, 450
bagging
applying, to classify examples in Wine dataset 258, 261, 263, 264
model ensembles 258
bag-of-words model 278
Basic Linear Algebra Subprograms (BLAS) 30
batch() method 468
batch normalization (BN)
reference link 424
Bellman equation
about 705
for dynamic programming 714
bias 79
bias problems
diagnosing, with learning curves 209, 210, 211, 212, 214
bias unit 21
bias-variance tradeoff 79
bidirectional RNN 624
Bidirectional wrapper
reference link 624
bigger data
working with 292, 293, 294, 295
binary cross-entropy 568
binomial coefficient 238
boosting
working 265, 266, 267, 268, 269
border point 402
backpropogation through time (BPTT)
used, for training RNNs 606
Breast Cancer Wisconsin dataset
reference link 199
C
candidate value 612
Cascading Style Sheets (CSS) 314
categorical cross-entropy 568
categorical data
encoding, with pandas 120
handling 119
CelebA dataset
cell state 611
centroid 377
character-level language modeling, TensorFlow
about 629
character-level RNN model, building 636, 637
dataset, preprocessing 630, 632, 633, 634, 635, 636
text passages, generating 638, 639, 640, 641, 642
classification
about 3
class labels, predicting 4
classification algorithm
selecting 55
classification error
about 95
versus accuracy 59
classification model
precision, optimizing 222, 223, 224
recall, optimizing 222, 223, 224
classification task 3
classifiers
combining, via majority vote 239
class imbalance
class labels
class membership probabilities
from decision trees 247
class probabilities
estimating, in multiclass classification 494
modeling, via logistic regression 62
cluster inertia 379
clustering
subgroups, finding with 7
clusters
about 7
grouping, in bottom-up fashion 391, 392
organizing, as hierarchical tree 390
CNN gender classifier
training 587, 588, 589, 590, 591, 593
CNN layers
configuring, in Keras 573, 574
coefficient
estimating, of regression model via scikit-learn 350, 351
coefficient of determination 358
collinearity 173
color channels
comma-separated values (CSV) file 114
complete linkage
about 391
complex functions
modeling, with artificial neural networks 410, 411
multilayer neural network architecture 414, 415, 416, 417
single-layer neural network recap 412, 413, 414
computational performance
improving, with function decorators 506, 507, 508
conditional probabilities 64
confusion matrix
continuing task
versus episodic task 709
convergence
in neural network 451
convergence of learning 38, 39
convolutional neural networks (CNNs)
about 546
constructing, in Keras 574, 576, 577, 578
fundamentals 547
gender classification, from face images 579
implementing 561
implementing, with TensorFlow Keras API 573
convolution output
size, determining of 554
correlation matrix
cost functions
minimizing, gradient descent 39, 40, 41
cross-correlation 552
cross entropy 689
curse of dimensionality 144, 151, 619
about 111
custom Estimator
creating, from existing Keras model 542, 544
custom Keras layers
CycleGAN 698
D
data augmentation 581
data frame
Housing dataset, loading 337, 338
datasets
characteristics, visualizing 339, 341, 342
creating, from files 470, 471, 472, 473
fetching, from tensorflow_datasets library 474, 475, 476, 477, 478, 479
partitioning, into separate training 125, 126, 128
partitioning, into test sets 125, 126, 128
data storage
SQLite database, setting up for 307, 308, 309
data type
manipulating, of tensors 460, 461
DB browser, for SQLite app
URL 309
decision regions 60
decision tree
building 99, 100, 101, 102, 104
decision tree regression 369, 370
decoder network 650
deconvolution
versus transposed convolution 676
deep artificial neural network 415
deep convolutional GAN (DCGAN) 675
deep convolutional neural network, with TensorFlow
data, loading 572
data, preprocessing 573
implementing 571
multilayer CNN architecture 571, 572
deep learning 416
deep neural network (DNN) 410
deep Q-learning 738
deep Q-learning algorithm
implementing 741, 742, 744, 745, 746
deep Q-network (DQN) 738
dendrograms
about 390
attaching, to heat map 398, 399, 401
density-based clustering 377
density-based spatial clustering of applications with noise (DBSCAN)
regions of high density, locating 402, 403, 404, 405, 406, 408
dimensionality reduction
about 140
for data compression 8
discount factor 710
discrete convolution
performing, in 2D 555, 556, 557
discrete convolutions
in one dimension 550, 551, 552
performing 549
discriminability 173
discriminator
about 654
implementing 680, 681, 683, 684, 685
discriminator network
dissimilarity measures
between distributions 686, 687, 688, 689, 690
distance matrix
hierarchical clustering, performing on 392, 397, 398
divisive hierarchical clustering 390
document classification
logistic regression model, training for 289, 290, 292
documents
processing, into tokens 286, 287
DQN model
training 739
dropout
neural network, regularizing with 565, 566, 567, 568
dynamic programming (DP)
used, for predicting value function 716
with Bellman equation 714
E
eager execution 502
Eigendecomposition
in NumPy 155
elastic Net 359
elbow method
about 377
used, for finding optimal number of clusters 384, 385
element-wise product 611
element-wise summation 611
embedding 619
EM distance
about 689
encoder network 650
Endianness
reference link 423
ensemble classifier
evaluating 250, 251, 252, 253, 255
tuning 250, 251, 252, 253, 255
ensemble methods 235
ensembles
building, with stacking 256
working with 235, 236, 237, 238, 239
entropy 95
environment
about 703
episode 709
episodic task
about 709
versus continuing task 709
epochs 25
error (ERR) 222
estimated value function
used, for improving policy 717
estimators 118
Estimators
about 531
using, for MNIST hand-written digit classification 540, 541
exhaustive search algorithms 140
expectation-maximization (EM) algorithm 299
experience replay 698
explanatory variable 335
exploitation 704
exploration 704
Exploratory data analysis (EDA) 339
F
F1 score 220
false negative (FN) 221
false positive (FP) 221
false positive rate (FPR) 223
feature columns
working with 531, 532, 534, 535
feature hierarchy 548
feature importance
assessing, with random forests 146, 147, 148
feature maps 548
feature normalization
about 80
features
about 9
feature scaling 380
feature selection algorithms
sequential 140, 141, 142, 143, 144, 145
feature selection methods
reference link 145
feature vectors
words, transforming into 279
feedforward 419
filter 550
Fisher LDA 167
fitted scikit-learn estimators
serializing 302, 303, 304, 305, 307
Flask
about 309
web application, developing 309
Flask web application
directory structure, setting up 313
form validation 312
macro, implementing with Jinja 2 templating engine 314
rendering 312
result page, creating 316, 317
style, adding via CSS 314, 315
forget gate 612
forward propagation
about 418
used, for activating neural network 418
fractionally strided convolution 676
fully connected layers 548
functional API
used, for making model building flexible 524, 525
function decorators
used, for improving computational performance 506, 507, 508
fuzzifier 383
fuzziness coefficient 383
fuzzy clustering 382
fuzzy C-means (FCM) algorithm 382
fuzzy k-means 382
G
GAN models
loss functions, for generator and discriminator networks 655, 656, 657
training 667, 668, 671, 673, 675
training, on Google Colab 657, 658, 660
gates 611
Gaussian kernel 90
gender classification from face images, with CNN
about 579
CelebA dataset, loading 580, 581
CNN gender classifier, training 587, 588, 589, 590, 591, 593
generalized policy iteration (GPI) 718
generative adversarial networks (GANs)
about 649
applications 698
implementing 657
training dataset, defining 665, 666
generative models
about 653
generator
about 653
implementing 680, 681, 683, 684, 685
generator network
implementing 660, 661, 663, 664
Gini impurity 95
global interpreter lock (GIL) 455
Google Colab
GAN models, training on 657, 658, 660
gradient boosting 273
gradient computations
resources, keeping for 514, 515
gradient descent (GD)
improving, through feature scaling 46, 47, 48
regression, saving for regression parameters with 345, 346, 347, 348, 349
gradient descent learning algorithm
for logistic regression 74
gradient descent optimization 128
gradient penalty (GP) 691, 692
gradients
computing, with respect to non-trainable tensors 514
gradients of loss, with respect to trainable variables
graph
creating, in TensorFlow v1.x 503
migrating, to TensorFlow v2 504
graph-based clustering 408
graphics processing units (GPUs) 453
Graphviz
URL 101
greedy algorithms 140
greedy policy
computing, from action-value function 720
grid search
about 216
hyperparameters, tuning via 216, 218
machine learning models, fine-tuning via 216
grid world environment
implementing, in OpenAI Gym 727, 728, 732, 733
grid world problem
solving, with Q-learning 734
Gym environments
H
handwritten digits
classifying 420
hard clustering
versus soft clustering 382, 383, 384
heat map
dendrograms, attaching to 398, 399, 401
hidden-recurrence
versus output-recurrence 606, 607, 609
hidden structures
discovering, with unsupervised learning 7
hierarchical-based clustering 377
hierarchical clustering
about 390
performing, on distance matrix 392, 395, 397, 398
hierarchical tree
clusters, organizing as 390
high value 713
high variance 131
holdout cross-validation
about 203
holdout method
Housing dataset
exploring 337
features 337
loading, into data frame 337, 338
nonlinear relationships, modeling in 365, 367, 368
HTML basics
reference link 311
HTML parser module
reference link 285
human visual cortex 547
hyperbolic tangent
about 495
output spectrum, broadening 495, 496, 497
hyperparameters
tuning, via grid search 216, 218
I
IID (independent and identically distributed) 739
image file
reading 562
IMDb movie review data
preparing, for text processing 275, 276
impurity measure 95
independent and identically distributed (IID) 597
information gain (IG)
maximizing 94, 95, 96, 97, 98, 99
initial cluster centroids
placing, k-means++ used 381, 382
inliers 352
input gate 612
input padding
size of output feature maps, controlling 552, 553
input pipelines
building, tf.data used 464
installation and setup process, TensorFlow
reference link 459
instance-based learning
about 108
intelligent machines
building, to transform data into knowledge 1, 2
interactive problems
solving, with reinforcement learning 6
Internet Movie Database (IMDb) 275
Iris dataset
about 56
multilayer perceptron, building for flower classification 486, 487, 488, 489
reference link 32
J
Jinja2
URL 314
Jinja2 templating engine
used, for implementing macro 314
joblib
NumPy arrays, serializing 304
reference link 304
joint dataset
tensors, combining into 466, 467
Jupyter Notebook
about 658
Jupyter Notebook GUI
reference link 658
K
Keras
about 480
CNN, constructing in 574, 576, 577, 578
CNN layers, configuring in 573, 574
Keras API
implementations, simplifying of common architectures 515, 516, 517, 518
Keras layers
reference link 487
Keras model
custom Estimator, creating from 542, 544
kernel functions
about 90
using 178
kernel matrix
deriving 181
kernel methods
for linearly inseparable data 87, 88, 89
kernel principal component analysis implementation, Python
concentric circles, separating 188, 190
half-moon shapes, separating 185, 186, 187, 188
kernel principal component analysis (KPCA)
about 150
data points, projecting 191, 192, 193, 194
implementing, in Python 183, 185
in scikit-learn 195
kernel functions 178, 180, 181, 182, 183
kernel trick 178, 179, 180, 181, 182, 183
used, for nonlinear mappings 177, 178
kernels
about 550
hyperbolic tangent (sigmoid) kernel 182
polynomial kernel 182
radial basis function (RBF) or Gaussian kernel 182
kernel SVM
used, for solving nonlinear problems 87
kernel trick
about 89
used, for finding separating hyperplanes in high-dimensional space 90, 91, 92, 93
k-fold cross-validation
about 203
used, for assessing model performance 203
K-fold cross-validation
KL divergence 688
k-means
objects, grouping by similarity 376
k-means++ 381
used, for placing initial cluster centroids 381, 382
k-means clustering
with scikit-learn 377, 378, 379, 381
k-nearest neighbors (KNN) 107, 108, 109, 110, 111, 129
L
L1 regularization
about 132
sparse solutions with 135, 136, 137, 138
L2 regularization
geometric interpretation 132, 133, 134
Lancaster stemmer 287
language modeling
about 629
large-scale machine learning 48, 49, 50, 52, 53
Latent Dirichlet Allocation (LDA)
about 296
text documents, decomposing 297
with scikit-learn 297, 299, 300, 301
layer parameters, initializing via tf.keras.initializers
reference link 516
lazy learner
about 107
leaky ReLU activation function 662
learning by interaction concept 701
learning curves
algorithms, debugging with 209
bias problems, diagnosing with 209, 210, 211, 212, 214
variance problems, diagnosing with 209, 210, 211, 212, 214
least absolute shrinkage and selection operator (LASSO) 359
leave-one-out cross-validation (LOOCV) 207
lemmas 288
lemmatization 288
limited-memory Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm
URL 77
Linear Algebra Package (LAPACK) 30
Linear Algebra Review and Reference
reference link 22
linear discriminant analysis (LDA)
about 150
feature space examples, projecting 175
scatter matrices, computing 169, 170, 171
selecting, for new feature subspace 172, 173, 174
supervised data compression, performing via 167
working 168
linear least squares 345
linearly inseparable data
linearly separable classes 22
LinearRegression class, implementing in MLxtend
reference link 351
linear regression models
building 480, 481, 482, 483, 484, 485
performance, evaluating 356, 357, 358, 359
turning, into curve 361
linkage matrix 393
local receptive field 548
local storage disk
dataset, creating from files 470, 471, 472, 473
logistic cost function
logistic regression
about 63
Adaline implementation, converting into algorithm for 70, 71, 72
class probabilities, modeling via 62
for multiple classes 64
resource 81
versus SVMs 86
logistic regression model
about 66
training, for document classification 289, 290, 292
training, with scikit-learn 75, 76, 77, 78
logistic sigmoid function
about 65
logit function 64
log-likelihood function
about 68
long-range interactions
learning challenges 610
long short-term memory cells 611, 612
loss functions
for classification 568, 569, 570
for discriminator networks 655, 656, 657
for generator networks 655, 656, 657
loss functions, via tf.keras.losses
reference link 517
low-level features 548
M
machine learning
Python, using for 14
terminology 11
types 2
with pre-made Estimators 536, 537, 538, 539, 540
machine learning models
fine-tuning, via grid search 216
machine learning systems
building 11
models, evaluating 14
predictive model, selecting 13
predictive model, training 13
unseen data instances, predicting 14
macro
implementing, Jinja2 templating engine used 314
majority vote
classifiers, combining via 239
majority voting principle
about 236
using, to make predictions 246, 247, 250
manifold learning
about 196
reference link 196
margin 83
Markov decision processes (MDPs)
about 705
mathematical formulation 706, 707, 708
Markov process
visualizing 708
mathematical operations
applying, to tensors 461, 462, 463
Matplotlib
reference link 27
matrix multiplication 44
maximum margin classification
with support vector machines 82
max-pooling 559
MC control
used, for finding optimal policy 720
McCulloch-Pitts (MCP) 20
McCulloch-Pitts neuron model 411
mean imputation 116
mean-pooling 559
mean squared error (MSE) 358, 482
median absolute deviation (MAD)
about 354
medoid 377
memory cell 611
metrics
scoring, for multiclass classification 229
microframework
about 309
minibatch gradient descent 49
mini-batch learning 49
min-max scaling 129
mirrored projections 159
missing data
dealing with 113
missing values
features, eliminating 115, 116
identifying, in tabular data 114, 115
training examples, eliminating 115, 116
Mixed National Institute of Standards and Technology (MNIST) 421
MLxtend library
URL 339
MNIST dataset
obtaining 421, 422, 423, 424, 425, 427, 428
preparing 421, 422, 423, 424, 425, 427, 428
reference link 421
MNIST hand-written digit classification
Estimators, using for 540, 541
MNIST, loading with scikit-learn
reference link 428
model-based reinforcement learning 707
model ensembles
with bagging 258
model-free reinforcement learning 707
model performance
assessing, with k-fold cross-validation 203
models
features, selecting 131
implementing, based on Model class 526
model selection 204
model training
via .compile() method 485, 486
Monte Carlo (MC)
action-value function estimation 719
reinforcement learning 718
state-value function estimation 719
movie classifier
movie classifier application
uploading 328
movieclassifier code files
obtaining 320
movie dataset
preprocessing, into convenient format 276, 278
movie review classifier
embedding, into web application 296
movie review classifier, turning into web application
main application, implementing as app.py 320, 321, 323
result page template, creating 324, 325, 326
review form, setting up 323, 324
movie review dataset
download link 276
obtaining 276
multiclass classification
about 4
class probabilities, estimating in 494
multi-head attention (MHA) 646
multilayer CNN architecture 571, 572
multilayer neural network architecture 414, 415, 416, 417
multilayer neural networks
activation functions, selecting for 491
multilayer perceptron (MLP)
about 415
building, to classify flower 486, 487, 488, 489
implementing 428, 435, 437, 439, 440, 441
multinomial logistic regression 64
multiple decision trees
combining, via random forests 104, 105, 106, 107
multiple linear regression
multiprocessing
via n_jobs parameter 290
MurmurHash3 function
reference link 294
N
naïve Bayes classifier 292
National Institute of Standards and Technology (NIST) 421
natural language processing (NLP) 275
Natural Language Toolkit (NLTK)
about 303
URL 287
nested cross-validation
algorithm, selecting with 218, 219, 220
neural network
activating, via forward propagation 418, 420
implementing 452
regularizing, with dropout 565, 566, 567, 568
training, via backpropagation 447, 448, 449, 450
n-grams models 280
NLTK book
reference link 287
NN model
building, in TensorFlow 479
no free lunch theorem 55
noise point 402
nominal features
about 119
one-hot encoding, performing 122, 123, 124
nonlinearly separable case
nonlinear mappings
kernel principal component analysis, using 177, 178
nonlinear problems
solving, kernel SVM used 87
nonlinear relationships
modeling, in Housing dataset 365, 367, 368
non-overlapping pooling
versus overlapping pooling 560
nonparametric models
versus parametric models 108
normal equation
about 351
reference link 351
normalization 129
NumPy
Eigendecomposition 155
NumPy array indexing 29
NumPy arrays
serializing, with joblib 304
NumPy's savez function
reference link 427
O
objective function 39
object-oriented perceptron API
objects
grouping, by similarity 376
odds 64
off-policy TD control (Q-learning) 723
offsets 336
one-hot encoding
about 123
performing, on nominal features 122, 123, 124
online learning 49
on-policy TD control (SARSA) 722
OpenAI Gym
about 724
grid world environment, implementing 727, 728, 732, 733
grid world example 726
URL 724
operations and functions, TensorFlow
reference link 464
opinion mining 275
optimal policy
about 712
finding, with MC control 720
optimizers, via tf.keras.optimizers
reference link 517
ordinal features
about 119
ordinary least squares linear regression model
implementing 345
ordinary least squares (OLS) 345
out-of-core learning 292
output gate 612
output-recurrence
versus hidden-recurrence 606, 607, 609
output spectrum
broadening, with hyperbolic tangent 495, 496, 497
overfitting
addressing, with validation curves 214, 215
tackling, via regularization 78, 79, 80, 81
overlapping pooling
versus non-overlapping pooling 560
P
packages
for data science 16
for machine learning 16
for scientific computing 16
padding 550
pandas
categorical data, encoding with 120
reference link 26
parameter-sharing 564
parametric models
versus nonparametric models 108
Pearson product-moment correlation coefficient (Pearson's r) 342
perceptron
reference link 25
settings 62
perceptron convergence
reference link 38
perceptron hyperparameters 44
perceptron learning algorithm
implementing, in Python 26
object-oriented perceptron API, using 26, 29
perceptron model, training on Iris dataset 30, 31, 32, 33, 35, 36, 38
perceptron learning rule 23, 24, 25, 26
perceptron model
training, on Iris dataset 30, 31, 32, 33, 35, 36, 38
perceptron rule 56
performance challenges 455, 456
performance evaluation metrics
about 220
performance metrics, via tf.keras.metrics
reference link 517
pickle module
about 303
reference link 303
security risk 305
pip
reference link 15
pipelines
transformers, combining with estimators 201, 202, 203
workflows, streamlining with 198
policy
about 711
improving, with estimated value function 717
policy evaluation 716
policy iteration 717
polynomial regression 361
polynomial terms
adding, with scikit-learn 362, 364
pooling
advantages 559
pooling layers 549
pooling size 559
Porter stemmer algorithm 287
precision (PRE)
about 223
optimizing, of classification model 222, 223, 224
precision-recall curves
about 225
reference link 225
predicted class label 24
predictions
making, majority voting principle used 246, 247, 250
principal component analysis (PCA) 201
explained variance 156
feature transformation 157, 159, 160
in scikit-learn 160, 162, 163, 165, 167
total variance 156
unsupervised dimensionality reduction, performing via 150, 151
versus LDA 187
versus linear discriminant analysis 167, 168
prototype-based clustering 377
public server
web application, deploying to 327
Python
about 14
kernel principal component analysis, implementing 183
perceptron learning algorithm, implementing 26
URL 14
using, for machine learning 14
Python 3
URL 15
PythonAnywhere
URL 327
PythonAnywhere account
creating 327
Python Progress Indicator (PyPrind)
reference link 277
Q
Q-learning
grid world problem, solving 734
Q-learning algorithm
implementing 734, 735, 737, 738
quality of clustering
quantifying, via silhouette plots 386, 388, 390
quality of synthesized images
improving, convolutional and Wasserstein GAN used 675
R
radial basis function (RBF) 90, 182
random forest
used, for assessing feature importance 146, 147, 148
random forest regression
random forests
about 128
multiple decision trees, combining via 104, 105, 106, 107
used, for dealing with nonlinear relationships 368
RandomizedSearchCV class, usage
reference link 218
RANdom SAmple Consensus (RANSAC)
about 352
used, for fitting robust regression model 352, 354, 356
raw term frequencies 280
RBF KPCA
recall (REC)
about 223
optimizing, of classification model 222, 223, 224
receiver operating characteristic (ROC)
Rectified linear unit (ReLU) 497, 498
recurrent edge 601
recursive backward elimination 145
regex library 285
regions of high density
locating, via DBSCAN 402, 403, 404, 405, 406, 408
regression
continuous outcomes, predicting 4, 5
regularized methods, using 359, 360, 361
saving, for regression parameters with gradient descent 345, 346, 347, 348, 349
regression analysis 4
about 334
regression line 336
regression model
coefficient, estimating via scikit-learn 350, 351
regression parameters
regression, saving with gradient descent 345, 346, 347, 348, 349
regular expressions
about 286
reference link 286
regularization
about 80
overfitting, tackling via 78, 79, 80, 81
regularized methods
using, for regression 359
reinforcement learning
interactive problems, solving 6
theoretical foundations 705
with Monte Carlo (MC) 718
reinforcement learning algorithms
about 715
dynamic programming 715
relationships 342, 343, 344, 345
re module
reference link 286
repeat() method 469
Residual plots 357
residuals 336
resources
keeping, for multiple gradient computations 514, 515
return function 709, 710, 711, 713
reward signal
Ridge Regression 359
RL algorithm
implementing 723
recurrent neural network (RNN)
activations, computing in 603, 604, 605
for modeling sequences 600
implementing, for sequence modeling 613
looping mechanism 600, 601, 602
training, BPTT used 606
type of output, determining from 601
robust regression model
fitting, with RANSAC 352, 354, 356
RobustScaler
reference link 131
ROC area under the curve (ROC AUC) 225
S
samples
generating, with GANs 653, 654
sampling 105
scatterplot matrix 339
scikit-learn
agglomerative clustering, applying via 401
alternative implementations 86, 87
coefficient, estimating of regression model via 350
coefficient, estimating of regression model via 350, 351
for k-means clustering 377, 378, 379, 381
kernel principal component analysis 195
logistic regression model, training 75, 76, 77, 78
principal component analysis (PCA) 160, 162, 163, 165, 167
reference link 62
used, for adding polynomial terms 362, 364
used, for applying AdaBoost 269, 270, 271
scikit-learn estimator API
self-attention mechanism
parameterizing 645
sentiment analysis 275
sentiment of IMDb movie reviews prediction project
about 614
layers, embedding for sentence encoding 619, 620, 621
movie review data, preparing 614, 615, 616, 617, 618
RNN model, building for sentiment analysis task 623, 624, 625, 626, 627, 629
sepal width 94
sequence modeling
categories 599
many-to-many 600
many-to-one 599
one-to-many 599
RNNs, implementing for 613
sequences 597
sequential backward selection (SBS) 140, 150
sequential data
about 597
modeling 597
representing 598
versus time-series data 597, 598
shape
manipulating, of tensors 460, 461
shift 552
shuffle() method 468
sigmoid function
about 65
signal 550
silhouette analysis 386
silhouette coefficient 386
silhouette plots
about 377
quality of clustering, quantifying via 386, 388, 390
similarity
objects, grouping by 376
similarity function 90
simple linear regression
simple majority vote classifier
implementing 240, 241, 244, 246
simulated experience 719
single instruction, multiple data (SIMD) 30
single-layer neural network recap 413, 414
single linkage
about 391
slack variables
used, for dealing with nonlinearly separable case 84, 85
Snowball stemmer 287
soft clustering
versus hard clustering 382, 383, 384
soft k-means 382
soft-margin classification 84
softmax function
class probabilities, estimating in multiclass classification 494
softmax regression 64
sparse-connectivity 564
sparse solutions
with L1 regularization 135, 136, 137, 138
splits
reference link 486
SQLite
URL 307
sqlite3
reference link 307
SQLite database
setting up, for data storage 307, 308, 309
squared error derivative 41
squared Euclidean distance 378
stacking
used, for building ensembles 256
state transition probability 708
state-value function 712
state-value function estimation
with Monte Carlo (MC) 719
Statsmodels
reference link 351
stemming algorithms 287
stochastic gradient descent (SGD) 48, 49, 50, 52, 53, 293, 345, 414
stop-word removal 288
stop-words 288
stride 552
subgroups
finding, with clustering 7
subsampling 559
sum of squared errors (SSE) 39, 133, 345, 379, 413
supervised learning
predictions, making about future 3
support vector machine (SVM)
maximum margin classification 82
reference link 374
versus logistic regression 86
support vectors 82
Synthetic Minority Over-sampling Technique (SMOTE)
about 234
reference link 234
T
tabular data
missing values, identifying 114, 115
tanh 495
target values
determining, for computing loss 740, 741
temporal difference (TD) learning
TensorFlow
installation, troubleshooting 459
key features 501
learning 458
NN model, building 479
RNNs, implementing for sequence modeling 613
training performance 455
used, for implementing deep convolutional neural network 571
TensorFlow Dataset
creating, from existing tensors 465, 466
TensorFlow Dataset API 465
tensorflow_datasets library
datasets, fetching from 474, 475, 476, 477, 478, 479
TensorFlow Keras API
about 480
used, for implementing CNN 573
TensorFlow library
reference link 452
TensorFlow style guide
reference link 479
TensorFlow v1
input data, loading into model 505
TensorFlow v1.x
graph, creating 503
TensorFlow v2
graph, migrating to 504
input data, loading into model 505
TensorFlow Variable objects
model parameters, storing 508, 509, 511
model parameters, updating 508, 509, 511
tensors
combining, into joint dataset 466, 467
data type, manipulating of 460, 461
mathematical operations, applying to 461, 462, 463
shape, manipulating of 460, 461
TensorFlow Dataset, creating from 465, 466
term frequency-inverse document frequency (tf-idf)
word relevancy, accessing 281, 282, 284
test dataset
trained model, evaluating on 490
text classification
with recurrent neural networks 293
text data
text documents
decomposing, with LDA 297
text processing
IMDb movie review data, preparing for 275, 276
tf.data
used, for building input pipelines 464
tf.image module 471
tf.io module 471
tf.keras 480
tf.keras.regularizers
reference link 516
time-series data
versus sequential data 597, 598
tokens
documents, processing into 286, 287
topic modeling
about 296
with Latent Dirichlet Allocation 296
trained model
evaluating, on test dataset 490
reloading 490
saving 490
transformer 117
Transformer architecture 643
transposed convolution
versus deconvolution 676
true class label 24
true negative (TN) 220
true positive rate (TPR) 223
true positive (TP) 220
U
underfitting
about 78
addressing, with validation curves 214, 215
unigram model 280
unit step function 21
unsupervised classification
about 7
unsupervised dimensionality reduction
unsupervised learning
hidden structures, discovering with 7
V
validation curves
algorithms, debugging with 209
overfitting, addressing with 214, 215
underfitting, addressing with 214, 215
validation dataset 143
value function
predicting, with dynamic programming 716
value iteration 718
variable sequence lengths
dealing with 621
variance 79
variance explained ratios 156
variance problems
diagnosing, with learning curves 209, 210, 211, 212, 214
variance reduction 370
variational autoencoders (VAEs) 653
Vectorization 30
W
Ward's linkage 391
Wasserstein GAN (WGAN) 675
weak learners
leveraging, via adaptive boosting 264, 265
web application
deploying, to public server 327
developing, with Flask 309
movie review classifier, embedding into 296
WGAN-GP
about 691
implementing, to train DCGAN model 692, 693, 695, 696
Widrow-Hoff rule 38
wine cultivars
reference link 127
Wine dataset
about 125
obtaining 259
obtaining, reference link 153
reference link 126
Winograd's minimal filtering algorithm 559
within-node variance 370
word2vec model
about 295
reference link 296
word capitalization
dealing with 285
word relevancy
accessing, via term frequency-inverse document frequency 281, 282, 284
words
transforming, into feature vectors 279, 280
word stemming 287
workflows
streamlining, with pipelines 198
X
Xavier initialization 510
XOR classification problem
Z
zero-padding 550