Index

  • A
  • accuracy, computing of, 168–171
  • algorithms
    • categories of in ML, 5
    • comparing ML algorithms, 258–260
    • evaluating ML algorithms, 260–261, 277–279
    • supervised learning algorithms, 5
    • Two‐Class Decision Jungle algorithm, 258, 259, 260
    • Two‐Class Logistic Regression algorithm, 258, 259, 260
    • Two‐Class Support Vector Machine algorithm, 258
    • unsupervised learning algorithms, 5, 7
  • Anaconda, 88–1
  • apply() function, 57, 58, 59
  • area under the curve (AUC), 174
  • argsort() function, 33
  • arrange() function, 20
  • array assignment, 34–38
  • array indexing, 22–26
  • array math, 27–34
  • arrays
    • copying by reference, 34–35
    • copying by value (deep copy), 37
    • copying by view (shallow copy), 36–37
    • creating NumPy arrays, 20–21
    • reshaping of, 26–27
    • slicing of, 23–25
  • asmatrix() function, 30
  • auc() function, 174
  • Auto MPG Data Set, 98
  • Azure Machine Learning Studio (MAML)
    • comparing against other algorithms, 258–260
    • creating experiment, 248–252
    • evaluating machine learning algorithms, 260–261
    • example using Titanic experiment, 244–246
    • filtering data and making fields categorical, 252–254
    • introduction, 243
    • programmatically accessing web service, 263–266
    • publishing experiment, 261–263
    • publishing learning model as web service, 261–262
    • removing missing data, 254
    • splitting data for training and testing, 254–256
    • testing web service, 263
    • training a model, 256–258
    • uploading dataset, 247–248
    • use of, 246–266
  • B
  • Bagging, 143
  • bar chart
    • defined, 73
    • plotting of, 73–77
  • bar() function, 73
  • bias, 141–144
  • Boolean indexing, 22–23
  • Boosting, 143, 144
  • bootstrap aggregation, 143
  • Boston dataset, 120–124, 144–146
  • Breast Cancer Wisconsin (Diagnostic) Data Set, 156–174
  • C
  • C parameter, 194–196
  • case study in machine learning (ML)
    • cleaning data, 271–273
    • evaluating algorithms, 277–279
    • examining correlation between features, 273–274
    • introduction, 270–271
    • loading data, 271
    • plotting correlation between features, 274–276
    • selecting best performing algorithm, 279
    • training and saving the model, 279–282
  • catplot() function, 87
  • Census Income Data Set, 98
  • charts
    • bar chart, 73–77
    • line chart, 68–73
    • pie chart, 77–82
  • classes
    • DataFrame() class, 45
    • KMeans class, 230, 232, 239
    • KNeighborsClassifier class, 213
    • LinearRegression class, 101–102, 131, 139, 145
    • LogisticRegression class, 162
    • MinMaxScaler class, 112
    • PolynomialFeatures class, 138
    • Series class, 41
    • SVC class, 182, 192
  • classification problems, described, 4
  • classification_report() function, 170
  • clustering
  • clusters, defined, 222
  • coefficient of determination, 105
  • coefficient of multiple determinations for multiple regressions, 105
  • conda, 8
  • confusion matrix, 166–168, 261
  • confusion_matrix() function, 168
  • Constrained Optimization, 181
  • contourf() function, 193
  • copy() function, 37
  • corr() function, 126, 273, 274
  • correlation
    • examining correlation between features in ML case study, 273–274
    • negative correlation, 127
    • plotting correlation between features in ML case study, 274–276
    • positive correlation, 127
  • cross_val_score() function, 217
  • crosstab, 63–64
  • crosstab() function, 64, 166, 167
  • cross‐validation, 216
  • cumsum() function, 31
  • cumulative sum, 31–32
  • D
  • data
    • cleaning data in clustering using K‐Means, 237–238
    • cleaning data in ML case study, 271–273
    • data cleansing in linear regression, 125–126
    • data cleansing in Scikit‐learn, 106–117
    • filtering data and making fields categorical in MAML, 252–254
    • importing data in clustering using K‐Means, 237
    • labeled data, 221
    • loading data in ML case study, 271
    • manipulation of tabular data using Pandas, 39–65
    • removing missing data in MAML, 254
    • sorting of in Pandas DataFrame, 55–57
    • splitting data for training and testing in MAML, 254–256
    • unlabeled data, 221, 222
  • data cleansing, 107–117, 125
  • data visualization, using matplotlib, 67–91
  • DataFrame, Pandas. See Pandas DataFrame
  • DataFrame() class, 45
  • datasets
    • Boston dataset, 120–124, 144–146
    • getting datasets in Scikit‐learn, 94–100
    • Iris dataset. See Iris dataset
    • Kaggle dataset, 97, 244, 270
    • labeled dataset, 5
    • uploading of in MAML, 247–248
  • date_range() function, 42, 43
  • decorator, defined, 282
  • dependent variable, 119
  • describe() function, 48
  • dot() function, 29
  • dot product, 29–30
  • drop() function, 61, 62
  • drop_duplicates() function, 111
  • dropna() function, 109
  • dump() function, 107
  • duplicated() function, 110, 111
  • E
  • ensemble learning, 143, 144
  • euclidean_distance() function, 208
  • Evaluate Model, 260
  • explanatory variable, 120
  • explode parameter, 78–79
  • eye() function, 21
  • F
  • F1 Score, 170
  • False Negative (FN), 167, 261
  • False Positive (FP), 167, 261
  • False Positive Rate (FPR), 170, 171–172, 173
  • features
    • in case study, 273–276
    • independent variables as, 119
    • in linear regression, 126–128
    • in logistic regression, 156–174
    • in Titanic experiment with MAML, 252
  • Fisher, Ronald (biologist), 94
  • fit() function, 102, 230
  • Flask micro‐framework, 280
  • flatten() function, 27
  • full() function, 21
  • functions
    • apply() function, 57, 58, 59
    • applying of to DataFrame, 57–60
    • argsort() function, 33
    • arrange() function, 20
    • asmatrix() function, 30
    • auc() function, 174
    • bar() function, 73
    • catplot() function, 87
    • classification_report() function, 170
    • confusion_matrix() function, 168
    • contourf() function, 193
    • copy() function, 37
    • corr() function, 126, 273, 274
    • cross_val_score() function, 217
    • crosstab() function, 64, 166, 167
    • cumsum() function, 31
    • date_range() function, 42, 43
    • describe() function, 48
    • dot() function, 29
    • drop() function, 61, 62
    • drop_duplicates() function, 111
    • dropna() function, 109
    • dump() function, 107
    • duplicated() function, 110, 111
    • euclidean_distance() function, 208
    • eye() function, 21
    • fit() function, 102, 230
    • flatten() function, 27
    • full() function, 21
    • get_feature_names() function, 139
    • head() function, 49
    • heatmap() function, 275
    • info() function, 125, 271
    • isnull() function, 108, 125
    • kernel function, 192
    • knn() function, 208, 209
    • legend() function, 72, 81
    • lmplot() function, 89
    • load() function, 107
    • load_boston() function, 121
    • load_breast_cancer() function, 156
    • load_dataset() function, 88
    • logit function, 153–154
    • make_blobs() function, 98
    • make_circles() function, 100, 187
    • make_regression() function, 98
    • matshow() function, 274
    • mean() function, 48, 234
    • metrics.silhouette_samples() function, 234
    • metrics.silhouette_score() function, 234
    • np.add() function, 28
    • np.concatenate() function, 189
    • np.dot() function, 31
    • np.where() function, 114
    • outliers_iqr() function, 114, 115
    • outliers_z_score() function, 116
    • pie() function, 81, 82
    • plot() function, 68, 71, 85
    • plot_surface() function, 134, 190
    • polynomial function, 120, 138, 139, 145, 149
    • predict() function, 102, 163, 231
    • predict_proba() function, 163
    • Radial Basis function (RBF), 196–197, 277, 278–279
    • randn() function, 45
    • random() function, 21
    • ravel() function, 27, 106
    • read_csv() function, 46
    • reset_index() function, 109
    • reshape() function, 26–27, 35
    • roc_curve() function, 173
    • savefig() function, 82
    • scatter() function, 85
    • score() function, 106, 170
    • Sigmoid function, 155, 156
    • sns.get_dataset_names() function, 88
    • sort() function, 33
    • sort_index() function, 55, 56, 61
    • sort_values() function, 55, 56
    • sq() function, 57, 58
    • sq_root() function, 57, 58, 59
    • subplot() function, 85
    • sum() function, 59
    • tail() function, 49
    • title() function, 69
    • train_test_split() function, 131, 164
    • transpose() function, 54
    • view() function, 36
    • xlabel() function, 69
    • xticks() function, 76–77
    • ylabel() function, 69
    • zeros() function, 20
  • G
  • Gamma, 197–199
  • Gaussian Kernel, 196–197
  • get_feature_names() function, 139
  • The Grammar of Graphics: Statistics and Computing (Wilkinson), 70
  • H
  • harmonic mean of precision and recall, 170
  • head() function, 49
  • heatmap() function, 275
  • high bias, 143
  • high variance, 143
  • hyperplane
    • defined, 179
    • formula for in SVM, 180–181
    • plotting of, 184–185, 189–191
    • 3D hyperplane, 133–135, 136, 146–147, 189–191
  • J
  • Jupyter Notebook (formerly known as iPython Notebook), 8, 9–18, 67, 68, 69, 134, 160, 264, 283
  • K
  • k
    • exploring different values of, 212–215
    • finding optimal k, 218–219, 234–236
    • visualizing different values of, 209–211
  • Kaggle dataset, 97, 244, 270
  • kernel function, 192
  • kernel trick, 186–191
  • kernels
  • k‐folds, 216
  • K‐Means
    • calculating Silhouette Coefficient, 233–234
    • cleaning data, 237–238
    • clustering using, 239–240
    • evaluating cluster size using Silhouette Coefficient, 232–236
    • finding optimal k, 234–236
    • finding optimal size classes, 240–241
    • how it works, 222–225
    • implementing of in Python, 225–230
    • importing data, 237
    • plotting scatter plot, 238
    • unsupervised learning using, 222
    • using of in Scikit‐learn, 230–232
    • using of to solve real‐life problems, 236–241
    • what is unsupervised learning? 221–226
  • KMeans class, 230, 232, 239
  • K‐Nearest Neighbors (KNN)
    • calculating distance between points, 207–208
    • cross‐validation, 216
    • described, 205–219
    • evaluation of in ML case study, 277–278
    • exploring different values of k, 212–215
    • finding optimal k, 218–219
    • implementing of in Python, 206–211
    • making predictions, 209
    • parameter‐tuning k, 217–218
    • using Scikit‐learn's KNeighborsClassifier class for, 211–219
    • visualizing different values of k, 209–211
  • KNeighborsClassifier class, 213
  • knn() function, 208, 209
  • L
  • label (dependent variable), 119
  • labeled data, 221
  • labeled dataset, 5
  • Larange Multipliers, 181
  • legend() function, 72, 81
  • line chart, plotting of, 68–73
  • linear kernel, 182, 192, 194, 195, 196, 199, 201, 278, 279
  • linear regression
    • data cleansing, 125–126
    • defined, 100, 120
    • feature selection, 126–128
    • formula for polynomial regression, 138
    • getting gradient and intercept of linear regression line, 103–104
    • getting intercept and coefficients, 133
    • multiple regression, 128–130
    • plotting 3D hyperplane, 133–135, 146–147
    • plotting linear regression line, 102–103
    • polynomial regression, 135–147
    • training the model, 131–132
    • types of, 119–120
    • understanding bias and variance, 141–144
    • using polynomial multiple regression on Boston dataset, 144–146
  • LinearRegression class, 101–102, 131, 139, 145
  • list data type, 19–20
  • lmplot, 88–89
  • lmplot() function, 89
  • load() function, 107
  • load_boston() function, 121
  • load_breast_cancer() function, 156
  • load_dataset() function, 88
  • logistic regression
    • computing accuracy, recall, precision, and other metrics, 168–171
    • defined, 151–153
    • evaluation of in ML case study, 277
    • examining relationship between features, 156–161
    • finding intercept and coefficient, 162
    • getting the confusion matrix, 166–168
    • logit function, 153–154
    • making predictions, 163–164
    • plotting features in 2D, 157–158
    • plotting in 3D, 158–160
    • plotting ROC and finding area under the curve (AUC), 174
    • plotting sigmoid curve, 162–163
    • Receiver Operating Characteristic (ROC) curve, 171–174
    • sigmoid curve, 154–156
    • testing the model, 166
    • training the model using all features, 164–174
    • training using one feature, 161–164
    • Two‐Class Logistic Regression algorithm, 258, 259, 260
    • understanding odds, 153
    • using Breast Cancer Wisconsin (Diagnostic) Data Set, 156–174
  • LogisticRegression class, 162
  • logit function, 153–154
  • low variance, 143
  • M
  • machine learning (ML)
    • case study
      • cleaning data, 271–273
      • evaluating algorithms, 277–279
      • examining correlation between features, 273–274
      • introduction, 270–271
      • loading data, 271
      • plotting correlation between features, 274–276
      • selecting best performing algorithm, 279
      • training and saving the model, 279–280
    • categories of algorithms in, 5
    • creating client application to use the model, 283–284
    • defined, 1, 3
    • deployment of, 269–270
    • deployment of model of
      • introduction, 280–282
      • testing model, 282–283
    • described, 3
    • disciplines of, 3
    • main goal of, 269
  • make_blobs() function, 98
  • make_circles() function, 100, 187
  • make_regression() function, 98
  • mathematics, as discipline of machine learning, 3
  • matplotlib
    • defined, 67
    • plotting bar charts
      • adding another bar to chart, 74–75
      • changing tick marks, 75–77
      • introduction, 73–74
    • plotting line charts
      • adding legend, 72–73
      • adding title and labels, 69
      • introduction, 68–69
      • plotting multiple lines in same chart, 71–72
      • styling, 69–71
    • plotting pie charts
      • displaying custom colors, 79–80
      • displaying legend, 81
      • exploding slices, 78–79
      • introduction, 77–78
      • location strings and corresponding location codes, 82
      • rotating pie charts, 80
      • saving chart, 82
    • plotting scatter plots
      • combining plots, 83–84
      • introduction, 83
      • subplots, 84–85
    • plotting using Seaborn
      • displaying categorical plots, 86–88
      • displaying lmplots, 88–89
      • displaying swarmplots, 90–91
      • introduction, 85–86
  • matrix class, 30–31
  • matrix multiplication, 30
  • matshow() function, 274
  • mean() function, 48, 234
  • meshgrid, 214
  • metrics.silhouette_samples() function, 234
  • metrics.silhouette_score() function, 234
  • Microsoft Azure Machine Learning Studio (MAML)
    • comparing against other algorithms, 258–260
    • creating experiment, 248–252
    • evaluating machine learning algorithms, 260–261
    • example using Titanic experiment, 244–246
    • filtering data and making fields categorical, 252–254
    • introduction, 243
    • programmatically accessing web service, 263–266
    • publishing experiment, 261–263
    • publishing learning model as web service, 261–262
    • removing missing data, 254
    • splitting data for training and testing, 254–256
    • testing web service, 263
    • training a model, 256–258
    • uploading dataset, 247–248
    • use of, 246–266
  • MinMaxScaler class, 112
  • misclassification error (MSE), 218
  • model, a.k.a. program, 3
  • multi‐class classification problem, 4
  • multiple linear regression, 120
  • multiple regression, 120, 128–130
  • N
  • ndarray (n‐dimensional array), 20, 31
  • negative correlation, 127
  • normalization, 112–113
  • np.add() function, 28
  • np.concatenate() function, 189
  • np.dot() function, 31
  • np.where() function, 114
  • NumPy
    • array assignment, 34–38
    • array indexing, 22–26
    • array math, 27–34
    • creating NumPy arrays, 20–21
    • described, 19–20
    • NumPy slice as reference, 25
    • reshaping arrays, 26–267
    • slicing arrays, 23–25
    • sorting in, 32–34
  • O
  • odds, understanding of, 153
  • optimal k, 218–219, 232, 234–236
  • outliers, 113–117
  • outliers_iqr() function, 114, 115
  • outliers_z_score() function, 116
  • overfitting, 143, 214–215
  • P
  • Pandas, described, 39–40
  • Pandas DataFrame
    • adding/removing rows/columns in, 60–63
    • applying functions to, 57–60
    • checking to see if result is DataFrame or Series, 55
    • common DataFrame operations, 65
    • creation of, 45–46
    • defined, 45
    • examples of, 124
    • extracting from, 49–54
    • generating crosstab, 63–64
    • selecting based on cell value, 54
    • selecting single cell in, 54
    • sorting data in, 55–57
    • specifying index in, 46–47
    • transformation of, 54–55
  • Pandas Series
    • accessing elements in, 41–42
    • creation of using specified index, 41
    • date ranges, 43–44
    • defined, 40
    • generating descriptive statistics on, 47–48
    • specifying datetime range as index of, 42–43
  • penalty parameter of the error term, 195
  • pie chart
    • defined, 77
    • plotting of, 77–82
  • pie() function, 81, 82
  • plot() function, 68, 71, 85
  • plot_surface() function, 134, 190
  • plotting
    • of bar charts, 73–77
    • of correlation between features in ML case study, 274–276
    • of hyperplane, 184–185, 189–191
    • of line charts, 68–73
    • of linear regression line, 102–103
    • of pie charts, 77–82
    • plotting features in 2D (logistic regression), 157–158
    • of ROC and finding area under the curve (AUC) (logistic regression), 174
    • of scatter plots, 83–85, 238
    • of sigmoid curve (logistic regression), 162–163
    • of 3D hyperplane (linear regression), 133–135, 146–147
    • in 3D (logistic regression), 158–160
    • using Seaborn, 85–91, 182
  • polynomial function, 120, 138, 139, 145, 149
  • polynomial kernel, 199–200
  • polynomial multiple regression, 120, 144–146
  • polynomial regression, 120, 135–147
  • PolynomialFeatures class, 138
  • positive correlation, 127
  • precision, computing of, 168–171
  • predict() function, 102, 163, 231
  • predict_proba() function, 163
  • predictions
    • making of in KNN, 209
    • making of in logistic regression, 163–164
    • making of in Scikit‐learn, 102
    • making of in SVM, 185–186
  • Q
  • quadratic regression, 138
  • R
  • Radial Basis function (RBF), 196–197, 277, 278–279
  • randn() function, 45
  • random() function, 21
  • ravel() function, 27, 106
  • read_csv() function, 46
  • recall, computing of, 168–171
  • Receiver Operating Characteristic (ROC) curve, 171–174
  • regression
  • Regularization, 143
  • reset_index() function, 109
  • reshape() function, 26–27, 35
  • Residual Sum of Squares (RSS), 104–105, 141, 143
  • REST (REpresentational State Transfer) API, 269–270, 280, 283
  • ROC (Receiver Operating Characteristic) curve, 171–174
  • roc_curve() function, 173
  • R‐squared method, 105, 132
  • S
  • savefig() function, 82
  • scatter() function, 85
  • scatter plot
  • scientific computing, as discipline of machine learning, 3
  • Scikit‐learn
    • data cleansing
      • cleaning rows with NaNs, 108
      • introduction, 106–107
      • normalizing columns, 112–113
      • removing duplicate rows, 110–112
      • removing outliers, 113–117
      • removing rows, 109
      • replacing NaN with mean of column, 109
    • getting datasets
      • clustered dataset, 98–99
      • clustered dataset distributed in circular fashion, 100
      • generating your own, 98
      • introduction, 94
      • linearly distributed dataset, 98
      • using Kaggle dataset, 97
      • using Scikit‐learn dataset, 94–97
      • using UCI (University of California, Irvine) Machine Learning Repository, 97–98
    • getting started with
      • evaluating model using test dataset, 105–106
      • examining performance of model by calculating Residual Sum of Squares (RSS), 104–105
      • getting gradient and intercept of linear regression line, 103–104
      • introduction, 100–101
      • making predictions, 102
      • persisting the model, 106–107
      • plotting linear regression line, 102–103
      • using LinearRegression class for fitting model, 101–102
    • introduction to, 93–100
    • polynomial regression in, 138–141
    • use of for SVM, 181–183
    • use of KNeighborsClassified class for KNN, 211–219
    • using K‐Means in, 230–232
  • score() function, 106, 170
  • Score Model, 256
  • Seaborn
    • defined, 85
    • plotting points using, 182
    • plotting using, 85–91
  • Series, Pandas. See Pandas Series
  • Series class, 41
  • shallow copy, 36
  • sigmoid curve, 154–156, 162–163
  • Sigmoid function, 155, 156
  • Silhouette Coefficient, 232–236
  • slope, 184
  • sns.get_dataset_names() function, 88
  • sort() function, 33
  • sort_index() function, 55, 56, 61
  • sort_values() function, 55, 56
  • sq() function, 57, 58
  • sq_root() function, 57, 58, 59
  • statistics, as discipline of machine learning, 3
  • StatLib library, 120
  • Student Performance Data Set, 98
  • subplot() function, 85
  • sum() function, 59
  • supervised learning
    • classification using K‐Nearest Neighbors (KNN)
      • calculating distance between points, 207–208
      • cross‐validation, 216
      • described, 205–219
      • exploring different values of k, 212–215
      • finding optimal k, 218–219
      • implementation of, 208–209
      • implementing KNN in Python, 206–211
      • making predictions, 209
      • parameter‐tuning k, 217–218
      • using Scikit‐learn's KNeighborsClassifier class for, 211–219
      • visualizing different values of k, 209–211
    • classification using Support Vector Machines (SVM)
      • adding third dimension, 187–188
      • C parameter, 194–196
      • formula for hyperplane, 180–181
      • Gamma, 197–199
      • introduction, 177–186
      • kernel trick, 186–191
      • making predictions, 185–186
      • maximum separability, 178–179
      • plotting 3D hyperplane, 189–191
      • plotting hyperplane and margins, 184–185
      • polynomial kernel, 199–200
      • Radial Basis function (RBF), 196–197
      • support vectors, 179–180
      • types of kernels, 191–200
      • using Scikit‐learn for, 181–183
      • using SVM for real‐life problems, 200–203
    • linear regression
      • data cleansing, 125–126
      • defined, 120
      • feature selection, 126–128
      • formula for polynomial regression, 138
      • getting intercept and coefficients, 133
      • multiple regression, 128–130
      • plotting 3D hyperplane, 133–135, 146–147
      • polynomial regression, 135–147
      • polynomial regression in Scikit‐learn, 138–141
      • training the model, 131–132
      • types of, 119–120
      • understanding bias and variance, 141–144
      • using Boston dataset, 120–124
      • using polynomial multiple regression on Boston dataset, 144–146
    • logistic regression
      • defined, 151–153
      • examining relationship between features, 156–161
      • finding intercept and coefficient, 162
      • getting the confusion matrix, 166–168
      • logit function, 153–154
      • making predictions, 163–164
      • plotting features in 2D, 157–158
      • plotting in 3D, 158–160
      • plotting ROC and finding area under the curve (AUC), 174
      • plotting sigmoid curve, 162–163
      • Receiver Operating Characteristic (ROC) curve, 171–174
      • sigmoid curve, 154–156
      • testing the model, 166
      • training the model using all features, 164–174
      • training using one feature, 161–164
      • understanding odds, 153
      • using Breast Cancer Wisconsin (Diagnostic) Data Set, 156–174
    • supervised learning algorithms, 5–6
  • Support Vector Classification (SVC), 183
  • Support Vector Machines (SVM)
    • adding third dimension, 187–188
    • C parameter, 194–196
    • formula for hyperplane, 180–181
    • Gamma, 197–199
    • introduction, 177–186
    • kernel trick, 186–191
    • making predictions, 185–186
    • maximum separability, 178–179
    • plotting 3D hyperplane, 189–191
    • plotting hyperplane and margins, 184–185
    • polynomial kernel, 199–200
    • Radial Basis function (RBF), 196–197, 277, 278–279
    • support vectors, 179–180
    • types of kernels, 191–200
    • use of for real‐life problems, 200–203
    • using Scikit‐learn for, 181–183
  • support vectors, 179–180
  • SVC class, 182, 192
  • swarmplots, 90–91
  • T
  • tabular data, manipulation of using Pandas, 39–65
  • tail() function, 49
  • targets, 120
  • 3D hyperplane, 133–135, 136, 146–147, 189–191
  • threshold, 152, 163
  • Titanic, use of as experiment, 244–246
  • title() function, 69
  • traditional programming, described, 2
  • train_test_split() function, 131, 164
  • transpose() function, 54
  • True Negative (TN), 167, 261
  • True Positive Rate (TPR), 168, 171–172, 173
  • True Positive (TP), 167, 260
  • Tukey Fences, 113–115
  • two‐class classification problem, 4
  • Two‐Class Decision Jungle algorithm, 258, 259, 260
  • Two‐Class Logistic Regression algorithm, 258, 259, 260
  • Two‐Class Support Vector Machine algorithm, 258
  • U
  • UCI Machine Learning Repository, 97–98
  • underfitting, 143, 214–215
  • unlabeled data, 221, 222
  • unsupervised learning
    • clustering using K‐Means
      • calculating Silhouette Coefficient, 233–234
      • cleaning data, 237–238
      • clustering using K‐Means, 239–240
      • evaluating cluster size using Silhouette Coefficient, 232–236
      • finding optimal k, 234–236
      • finding optimal size classes, 240–241
      • how it works, 222–225
      • implementing K‐Means in Python, 225–230
      • importing data, 237
      • plotting scatter plot, 238
      • unsupervised learning using K‐Means, 222
      • using K‐Means in Scikit‐learn, 230–232
      • using K‐Means to solve real‐life problems, 236–241
      • what is unsupervised learning? 221–226
    • unsupervised learning algorithms, 5, 7
  • V
  • variables
    • dependent variable, 119
    • explanatory variable, 120
    • independent variable, 119
  • variance, 141–144
  • view() function, 36
  • W
  • Wilkinson, Leland (author)
    • The Grammar of Graphics: Statistics and Computing, 70
  • X
  • xlabel() function, 69
  • xticks() function, 76–77
  • Y
  • y‐intercept, 184
  • ylabel() function, 69
  • Z
  • zeros() function, 20
  • Z‐score, 116–117
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset