Index


  • Abbeel, P., 223
  • Abel, E., 66
  • Accounting automation avenues and investment management, 265
  • Accuracy, data, 7–8
    • issues, 10–11
  • Actions in holistic workflow framework, 74–78
    • production data stage, 77–78
    • raw data stage, 74–76
      • creating metadata, 75–76
      • data ingestion, 75
    • refined data stage, 76–77
  • Adam optimizer, 222
  • Aggregate function, 85, 86, 87
  • Aggregation, 78
  • Ahmed, F., 225
  • AI-based self-driving car,
    • about the model, 283, 285
    • introduction, 275–277
      • algorithm used, 279–280
      • environment overview, 277–279
    • preprocessing the image/frame, 285–286
    • real-time lane detection and obstacle avoidance, 283
    • self-driving car simulation, 281
  • Alexa, 238
  • Altair Monarch, 60, 61f
  • Altman, R.B., 161
  • Alto, 308
  • Amazon, 4
  • Amazon Web Services, 99
  • Analogue-to-digital conversion, 199
  • Analytical input, 201–204
  • Analytics,
    • big data. see Big data analytics in real time
    • and business intelligence in optimization, role, 44–45
    • data science, 189
    • defined, 189
    • descriptive, predictive, diagnostic, and prescriptive, 100
    • express, using data wrangling process, 106
    • self-service, 50
  • AnoGAN, 227
  • Anomaly detection algorithm, 227, 244
  • Antilock brakes in automobiles, 4
  • Anzo, 60, 61, 62f
  • Apache Marvin AI, 248
  • Architecture of data wrangling, 56–59
  • Arjovsky, M., 221, 225
  • Array, data structure in R, 125, 136–138
  • array() function, 136
  • Artés-Rodríguez, A., 55, 67
  • Art-GAN, 227
  • Artificial control and effective fiduciaries, 264–265
  • Artificial intelligence (AI),
    • application of, 243
    • evolution, 235
    • type, 235
  • Artificial intelligence in accounting and finance,
    • applications of, 256–257
      • in consumer finance, 257
      • in corporate finance, 257–258
      • in personal finance, 257
    • benefits and advantages of, 258–259
      • accounting automation avenues and investment management, 265
      • active insights help drive better decisions, 261–262
      • AI machines make accounting tasks easier, 260–261
      • artificial control and effective fiduciaries, 264–265
      • build trust through better financial protection and control, 261
      • changing the human mindset, 259
      • consider the “Runaway Effect,” 264
      • fighting misrepresentation, 260
      • fraud protection, auditing, and compliance, 262–263
      • intelligent investments, 264
      • invisible accounting, 261
      • machines as financial guardians, 263
      • machines imitate the human brain, 260
    • challenges of, 265–267
      • cyber and data privacy, 267
      • data quality and management, 267
      • institutional issues, 270
      • legal risks, liability, and culture transformation, 267–268
      • limits of machine learning and AI, 269
      • practical challenges, 268
      • roles and skills, 269–270
    • changing the human mindset, 258–259
    • future scope of study, 272
    • introduction, 252–254
    • suggestions and recommendation, 271
    • uses of,
      • AI driven Chatbots, 255–256
      • audits, 255
      • monthly, quarterly cash flows, and expense management, 255
      • pay and receive processing, 254
      • supplier on boarding and procurement, 255
  • Artificial neural network (ANN), 276
  • Artwork, 227
  • Arús-Pous, J., 227
  • Ashok Leyland, 292
  • Association, unsupervised learning for, 237
  • Attacks, type, 37
  • Audits, 255
  • Authentication, data, 35
  • Auto-encoders, 150, 176–178
  • Automotive industry,
    • China, 301
    • European Union, 301
    • Indian; see also Suppliers network
      • on SCM of Indian auto industry,
      • COVID-19 on automotive sector, 301–305
      • global, 298, 300
      • prior pandemic, 294–296
    • Japan, 301
    • United States, 300–301
  • Auxiliary data, 57
  • AVERAGEIF(S) function, 28
  • AWS, 22
  • Backup, data, 35
  • Bar graph, 87, 88–89
  • Barrejón, D., 55, 67
  • Bartenhagen, C., 150
  • Batch normalization, concept of, 221
  • Bengio, Y., 214
  • Berret, C., 67
  • Bessel kernel, 165
  • Between-class scatter matrix, 163
  • Bhatt, P., 293
  • Big data, 17, 45
    • challenges of, 113
    • cost-effective manipulations of, 54
    • processing, 99
    • 4 V’s of, 2
  • Big data analytics in real time,
    • applications in commercial
      • surroundings, 196–207
      • IoT and data science, 197–204
      • predictive analysis for corporate enterprise, 204–207
    • aspiration for meaningful analysis, 193–196
    • design, structure, and techniques, 191–192
    • fundamental infrastructure of, 192
    • information management to
      • valuation offerings, transition
      • from, 195–196
    • from information to guidance, 194–195
    • insights’ constraints, 207–209
      • data, fragmented and imprecise, 208
      • extensibility, 208
      • implementation in real time scenarios, 208–209
      • representation of data, 207–208
      • technological developments, 207
    • IoT and, 190–191
    • overview, 188–190
    • visualization tools, 193–196
  • Binning method, 103
  • Biometric authentication, 246
  • Bixby, 238
  • Bjerrum, E.J., 227
  • Blind Source Separation (BSS), 171
  • BMW, 292
  • Bors, C., 54–55
  • Boston consulting group, 291
  • Bottou, L., 221, 225
  • Braun, M.T., 54
  • Breaching, data. see Data breaching
  • #BreakTheChain, 294
  • Bridgewater associates, 264
  • Brzozowski, M., 224
  • Buono, P., 54, 81
  • Business insights, 32
  • Business Intellectual capacity (BI) programs, 190
  • Business intelligence,
    • analytics, 11
    • benefits of, 195
    • data wrangling-based, 190
    • effectiveness of, 191
    • in optimization, role, 44–45
    • possibilities of, 192
    • real-time, 193
    • tools, 191
  • Cab booking, apps for, 238, 240f
  • Caffe, 247
  • Canny edge extraction, 276
  • Capacity planning, 36
  • Carreras, C., 55
  • Ceusters, W., 67
  • c() function, 127–128
  • CGANs (conditional GANs), 218–219
  • Character type of atomic vector, 126
  • Chatbots, 252, 255–256, 257, 258, 260
  • Chen, H., 227
  • Chen, X., 223
  • Cheung, V., 225
  • China, COVID-19 on automotive sector, 301
  • Chintala, S., 220, 221, 225, 226
  • CIFAR-10 dataset, 221, 225
  • City operations map visualizations, Uber’s, 46–47
  • Civili, C., 66
  • class() function, 127–128
  • Classification algorithms, 243, 244f
  • Classifiers, used, 179
  • Classroom, 31–32
  • Cleaning data, 2, 15, 58, 79, 92, 95, 100, 111, 200–201
  • Cloud DBA, 22
  • Clustering, unsupervised learning for, 237
  • Clustering algorithms, 245
  • Clustering method, 103, 149
  • Clustering technique, 276–277
  • Cohan, A., 66
  • Colon operator, vectors using, 126
  • Column(s),
    • addition of, 144–145
    • in dataset, changing order of, 82, 83f
    • orthonormal matrices, 175
    • in relational database, 6, 7
  • Complex type of atomic vector, 126
  • Compound annual growth rate (CAGR), 290, 295, 306
  • Computational modeling, 205
  • Computerized reasoning, 253
  • CONCATENATE function, 28
  • Conditional GANs (cGANs), 218–219
  • Conditional-LSTM GAN, 227
  • Confirmatory factor analysis, 175
  • Conformal Isomap (C-Isomap), 173
  • Consolidating data, 100
  • Core profiling, types, 79–80
    • individual values profiling, 80
    • set-based profiling, 80
  • Courville, A.C., 214, 225
  • Covariance matrix of data, 158, 159, 161, 167, 176
  • COVID-19 pandemic, 290, 291, 292, 293, 300
    • on automotive sector, 300
    • effect on Indian automobile industry, 301–305
    • global automobile industry, 298, 300–301
    • MSIL during, 296–297
    • post COVID-19 recovery, automobile industry scenario, 306
    • thump on automobile sector, 294–296
    • worldwide economic impact of epidemic, 298, 299t
  • Cross-validation folds, data preparation within, 104
  • CSV file, data in, 5
  • CSVKit, 17, 110, 115, 120
  • Customer connection management software, 206
  • Custom metadata creation, defined, 6
  • Cyber and data privacy, 267
  • Cybercriminals, 37, 38, 40
  • CycleGANs, 218
  • Dash boarding, 11
  • Data,
    • defined, 2
    • design and preparation, 9
    • direct value from, 3, 4
    • documentation & reproducibility, 111, 114
    • extracting insights from, 100
    • filtering/scrubbing, 17
    • fragmented and imprecise, 208
    • indirect value, 3
    • input, 5–6
    • learnings from, 48
    • merging & linking of, 111
    • mishandling and its consequences, 39–41
    • processing and organizing, 99–100
    • quality, 110–111
    • representation of, 201, 207–208
    • stages
      • produced. see Production data
      • raw, 4–8, 73, 74–76
      • refined. see Refined data
    • structuring, 15, 78, 95
    • utilization, 92
    • warehouse administrator, 21
    • workflow structure, 4
  • Data accessing, 58
  • Data accuracy, 7–8
  • Data administrators, 56, 67, 68, 110, 113, 114, 115, 194
    • defined, 20
    • goal, 29
    • practical problems faced by, 54
    • responsibilities, 20, 34–37
      • capacity planning, 36
      • data authentication, 35
      • data backup and recovery, 35
      • database tuning, 36–37
      • data extraction, transformation, and loading, 34
      • data handling, 35
      • data security, 35
      • effective use of human resource, 36
      • security and performance monitoring, 36
      • software installation and maintenance, 34
      • troubleshooting, 36
    • roles, 20, 21–22
    • skills required, 22–34
  • Data analysis, 206–207
    • use, 191–192
  • Data analysts. see Data administrators
  • Database administrator (DBA),
    • Cloud DBA, 22
    • concerns for, 37–39
    • responsibility, 21, 34–37
      • capacity planning, 36
      • data authentication, 35
      • data backup and recovery, 35
      • database tuning, 36–37
      • data extraction, transformation, and loading, 34
      • data security, 35
      • effective use of human resource, 36
      • security and performance monitoring, 36
      • software installation and maintenance, 34
      • troubleshooting, 36
    • role, 20, 21–22
  • Database systems, data wrangling in, 66
  • Database tuning, 36–37
  • Data breaching, 37–39, 40
    • laws, 41
    • long-term effect of, 42
    • phases of, 40–41
  • Data cleaning, 2, 15, 58, 79, 92, 95, 100, 111, 200–201
  • Data collection, 199, 200
  • Data deluge, 110
  • Data discovery, 14, 111
  • Data enrichment, 15, 59, 78–79, 111
  • Data errors, 118–119
  • Data extraction, 58
  • Data frame, 23, 125, 144–145
    • accessing, 145
    • addition of column, 144–145
    • creation, 144
  • data.frame() function, 144
  • Data gathering, 17
  • Data inconsistency, 101
  • Data ingestion, 75
  • Data integrity, 191
  • Data Lake, 110
  • Data leakage, 39
    • in deep learning, 101–102
    • in machine learning, 101–102, 103–104, 113
    • in ML for medical treatment, 93–94
  • Data management, defined, 110
  • Data manipulation, 117, 118–119
  • Datamation, 100
  • Datameer, 63, 64f
  • Data munging. see Data wrangling
  • Data optimization, 13
  • Data organization, 111
  • Data preparation, 92, 93
    • within cross-validation folds, 104
  • Data preprocessing, 92, 93
    • performance of, 102
    • use of, 100–101
  • Data projects, workflow framework for, 72–74
  • Data publishing, 16, 59, 95–96, 111
  • Data quality and management, 267
  • Data refinement, 13
  • Data remediation. see Data wrangling
  • Data reshaping, 55
  • Data science,
    • analytics, 189
    • applications in production industry, 197–204
      • data transformation, 199–204
      • inter linked devices, 199
    • defined, 188
    • IoT and, 189
  • Data scientists, role, 20
  • Dataset(s),
    • CIFAR-10, 221, 225
    • columns, changing order of, 82, 83f
    • drug trial, 8
    • Fashion MNIST, 225
    • granularity, 7
    • ImageNet, 225
    • MIR Flickr, 219
    • MNIST, 219, 223
    • red-wine quality, 178, 179, 180t
    • scope, 8
    • structure, 6–7
    • temporality, 8
    • training and test, 237
    • used, 178
    • validation, 104
    • Wikiart, 227
    • Wisconsin breast cancer, 178, 179, 181t
    • YFCC100M, 219
  • Data sources, 57
  • Data structure in R,
    • classification, 124–125
    • heterogeneous, 138–145
      • dataframe, 144–145
      • defined, 138
      • list, 139–143
    • homogeneous, 124, 125–138
      • array, 136–138
      • factor, 131–132
      • matrix, 132–136
      • vectors, 125–131
    • overview, 123–125
  • Data structuring, 58
  • Data theft, 40
  • Data transformation, 2, 34, 54, 63, 199–204
    • analytical input, 201–204
    • cleaning and processing of data, 200–201
    • information collection and storage, 200
    • representing data, 201
  • Data validation, 15, 59, 95, 111
  • Data visualizations, 45, 48–49
    • producing, 24
  • DataWrangler, 115
  • Data wrangling,
    • aims, 3
    • application areas, 65–67
      • in database systems, 66
      • journalism data, 67
      • medical data, 67
      • open government data, 66
      • traffic data, 66–67
    • defined, 2, 54, 110
    • do’s for, 16
    • entails, 110–111
    • goals, 114–115
    • obstacles surrounding, 113–114
    • overview, 2–4
    • stages, 94–96
      • cleaning, 95
      • discovery, 94
      • improving, 95
      • publishing, 95–96
      • structuring, 95
      • validation, 95
    • steps, 14–16, 111–114
    • tools for, 16–17, 59–65, 115–116
    • ways for effective, 116–119
  • Data wrangling dynamics,
    • architecture, 56–59
      • accessing, 58
      • auxiliary data, 57
      • cleaning, 58
      • enriching, 59
      • extraction, 58
      • publication, 59
      • sources, 57
      • structuring, 58
      • validation, 59
    • challenges, 55–56
    • overview, 53–54
    • related work, 54–55
    • tools, 59–65
  • DDoS attacks, 37
  • Decision making, 114
  • Decision trees, 246
  • Decoder, 177
  • Deep Belief Network (DBN), 215
  • Deep Boltzmann Machine (DBM), 215
  • Deep Convolutional GANs (DCGANs), 218, 220–221
  • Deep learning, 8, 20
    • -based techniques, for image processing, 246
    • data leakage in, 101–102
    • in ERP, 91–92, 93
    • GANs, 214, 215
    • generative and discriminative models, 216–217
  • DeepMind, 226, 227
  • DeepRay, 226
  • De la Torre, F., 168
  • De-noising images, 168
  • .describe() function, 83, 84f, 86
  • Descriptive analytics, 100
  • DeShon, R.P., 54
  • Diagnostic analytics, 100
  • Digital Vidya, 100
  • Dijkstra’s algorithm, 173
  • Dimensionality,
    • curse of, 148
    • intrinsic, 148
    • reduction. see Dimension reduction techniques in distributional semantics,
  • Dimension reduction techniques in distributional semantics
  • Discover cross domain relations with GANs (DiscoGANs), 218
  • Discovering data, 14
  • Discovery, 94
  • Discriminative modeling, generative modeling vs, 216–217
  • Documentation of data, 111, 114
  • Double type of atomic vector, 126
  • Downey, D., 66
  • Dplyr, 116
  • Droom, 297
  • Drug trial datasets, 8
  • Duan, Y., 223
  • Dumoulin, V., 225
  • DVDGAN, 226
  • E-commerce market, 300
  • Economist intelligence unit, 194
  • E-diagnostics, 292
  • EmuguCV, 247
  • Encoder, 177
  • Energy-based GAN, 222
  • Engkvist, O., 227
  • Eno, 257
  • Enrichment, data, 15, 59, 78–79, 111
  • Enterprise resource planning (ERP), 91–92, 93
  • Enterprise(s),
    • applications, big data analytics in real time for. see Big data analytics in real time
    • best practices for, 41
    • corporate, predictive analysis for, 204–207
  • Esmaeilzadeh, H., 224
  • Essentials of data wrangling,
    • actions in holistic workflow
      • framework, 74–78
      • production data stage, 77–78
      • raw data stage, 74–76
      • refined data stage, 76–77
    • case study, 80–84
    • core profiling, types, 79–80
      • individual values profiling, 80
      • set-based profiling, 80
    • graphical representation, 86–89
    • overview, 71–72
    • quantitative analysis, 84–86
      • maximum number of fires, 84–85
      • statistical summary, 86
      • total number of fires, 85–86
    • transformation tasks, 78–79
      • cleansing, 79
      • enriching, 78–79
      • structuring, 78
    • workflow framework for data projects, 72–74
  • Etaati, L., 55
  • ETL (extract, transform and load) techniques, 2, 21, 26–27, 34, 54, 66, 71, 117
  • Euclidean distance, 161, 172, 173, 174
  • European Union, COVID-19 on automotive sector, 301
  • Excel, 7, 26, 27, 28, 29, 49, 55, 59–60, 61, 63, 80–81, 99, 100, 115
  • Exfiltrate, 41
  • Exploratory factor analysis, 175
  • Exploratory modelling and forecasting, 11
  • Express analytics using data wrangling process, 106
  • Extract, transform and load (ETL) techniques, 2, 21, 26–27, 34, 54, 66, 71, 117
  • Extruct, 99
  • ‘EY Global FAAS,’ 266
  • Facebook, 119, 194, 240, 247
  • Face recognition, 168, 240
  • Factor, data structure in R, 124–125, 131–132
  • Factor analysis (FA), 150, 175–176
  • factor() function, 131–132
  • Fan, H., 224
  • Fashion MNIST, 225
  • Feature extraction in speech recognition, 169
  • Feldman, S., 66
  • Fields of record, 6–7
  • Fisher GAN, 225
  • #FlattenTheCurve, 294
  • Flexible discriminant analysis (FDA), 165
  • FlexiGan, 224
  • Flipkart, 4
  • Floyd-Warshall shortest path algorithm, 173
  • Ford, 292, 304t, 305t, 306
  • Fraud detection, 240, 241f
  • Frequency outliers, defined, 7–8
  • Furche, T., 54
  • Gaming with virtual reality experience, 246
  • GANs. see Generative adversarial networks (GANs)
  • Gartner, 190
  • GauGAN, 227
  • Gaussian kernel, 165, 166
  • #GearUpForTomorrow, 294
  • Geiger, A., 225
  • General Motors, 293
  • Generative adversarial networks (GANs),
    • anatomy, 217–218
    • architecture of, 217f
    • areas of application, 226–228
    • background, 215–217
    • generative modeling vs discriminative modeling, 216–217
    • overview, 214–215
    • shortcomings of, 224–226
    • supervised vs unsupervised learning, 215–216
    • types, 218–224
      • cGANs, 218–219
      • DCGAN, 220–221
      • InfoGANs, 223–224
      • LSGANs, 222–223
      • StackGANs, 222
      • WGAN, 221–222
  • Generative modeling vs discriminative modeling, 216–217
  • Generic metadata, creation of, 6, 76
  • Genetic algorithms, 246
  • Genomic dataset, 194
  • Gen Zers, 272
  • Geodesic distance, defined, 173
  • Geopandas, 98
  • GeoTab, 292
  • Ghodrati, S., 224
  • Github, 120
  • Global automobile industry, 298, 300–301
  • Goharian, N., 66
  • Gong, B., 226
  • Goodfellow, I.J., 214, 225
  • Google, 238, 247
  • Google analytics, 26
  • Google assistant, 236
  • Google BigQuery, 99
  • Google DatePrep, 115
  • Google scholar, 214
  • Google sheets, 99
  • Google translator, 242
  • Gool, L.V., 226
  • Gopalan, R., 276
  • “Gosurge” for surge pricing, 44
  • Gottlob, G., 54
  • Gradient penalty, LSGANs with, 223
  • Granularity,
    • of dataset, 7
    • issues, refined data, 10
  • Graphical representation, 86–89
  • Graphs, creating, 24
  • Gross value added (GVA) growth, 299t
  • groupby() function, 85, 86–87
  • Gschwandtner, T., 54–55
  • Gulrajani, I., 225
  • Gutmann, M.U., 224
  • GV, 263
  • Handling, data, 35
  • .head() function, 82, 83f, 85
  • Heer, J., 54, 55, 81
  • Hellerstein, J.M., 55
  • Hero MotoCorp, 294
  • Hessian LLE (HLLE), 170
  • Heterogeneous data structure, 124, 125, 138–145
    • dataframe, 144–145
    • defined, 138
    • list, 139–143
      • creation, 139
      • elements, accessing, 140–142
      • elements, manipulating, 142
      • elements, merging, 142–143
      • elements, naming, 139–140
  • Hidden layer(s), 176, 177, 178
  • Hillel, A.B., 276
  • Homogeneous data structures, 124, 125–138
    • array, 136–138
    • factor, 131–132
    • matrix, 132–136
      • assigning rows and columns names, 133
      • computation, 135–136
      • creation, 132–133
      • elements, assessing, 134
      • elements, updating, 134–135
      • transposition, 136
    • vectors, 125–131
      • arithmetic operations, 129–130
      • atomic vectors, types, 125–126
      • element recycling, 130
      • elements, accessing, 128–129
      • nesting of, 129
      • sorting of, 130–131
      • using c() function, 127–128
      • using colon operator, 126
      • using sequence (seq) operator, 127
  • Honda, 291, 301, 304t, 305t
  • Hortonworks, 50
  • Hotstar, 4
  • Hough line transformation, 286
  • Hough transform, 283
  • Houthooft, R., 223
  • Hsu, C.Y., 67
  • Human resource, effective use of, 36
  • Hyperbolic tangent kernel, 165
  • #HyundaiCares, 294
  • Hyundai Motor Company, 290, 293, 294, 297, 304t, 305t
  • Hyundai Motor India Ltd (HMIL), 290
  • Hyundai Motors, 290, 301, 306
  • iAlert, 292
  • IBM Cognos Analytics, 100
  • ImageNet, 223, 225
  • Imagenet-1k, 221
  • Image processing, 173
    • ML in, 246–248
      • frameworks and libraries for, 246–248
  • Image sharpening, 246
  • Image synthesis, 226
  • Image thresholding, 283
  • IM (isometric mapping (Isomap)), 150, 172–173
  • Independent component analysis (ICA), 150, 171–172
  • India Energy Storage Alliance (IESA), 290
  • Indian auto industry, suppliers network on SCM of. see Suppliers network on SCM of Indian auto industry
  • Individual values profiling
    • semantic constraints, 80
    • syntactic constraints, 80
  • Industrial revolution 4.0, 189, 197
  • Industrial sector, predictive analysis for corporate enterprise applications in, 204–207
  • Industry 4.0, data wrangling in
    • future directions, 119–120
    • goals, 114–115
    • overview, 110–111
    • steps in, 111–114
    • tools and techniques, 115–116
    • ways for effective, 116–119
  • Informatica cloud, 75
  • Information, defined, 2
  • Information collection and storage, 200
  • Information management to valuation offerings, transition from, 195–196
  • Information maximizing GANs (InfoGANs), 218, 223–224
  • Information-theory concept, 223
  • Information to guidance, 194–195
  • Ingestion process, 75
  • Integer type of atomic vector, 126
  • International organization of motor vehicle manufacturers, 291
  • Internet of Things (IoT),
    • adoption of, 198
    • applications in production industry, 197–204
      • data transformation, 199–204
      • inter linked devices, 199
    • big data and, 190–191
    • data science and, 189
    • defined, 188
    • revenue production, 190
    • use of, 194
  • Intrinsic dimensionality, 148
  • Inverse perspective mapping (IPM), 276–277
  • IoT. see Internet of Things (IoT)
  • iPython, 24, 25
  • Ishida, S., 293
  • Isomap (isometric mapping), 150, 172–173
  • Japan, COVID-19 on automotive sector, 301
  • Japanese ATR database, 169
  • Java EE, 21
  • JDBC, 21, 27
  • Jensen-Shannon divergence, 221
  • Jia, X., 226
  • Johansson, S.V., 227
  • Joins, 79
  • Journalism data, 67
  • JPMorgan Chase, 257
  • JSON, data format, 7
  • JSOnline, 116
  • Jupyter notebooks, 24
  • Just in time (JIT) system, 310–311
  • Kamenshchikov, I., 225
  • Kandel, S., 54, 55, 81
  • Kasica, S., 67
  • Kennedy, J., 54, 81
  • Kernel matrix, 167
  • Kernel principal component analysis (KPCA), 150, 161, 165–169
  • Kernel trick, 167, 168
  • Khaleghi, B., 224
  • Kia, 290, 291, 302, 304t, 305t
  • Kim, N.S., 224
  • Kitamura, T., 168–169
  • Kivy packages, 277
  • #0KMPH, 294
  • Koehler, M., 66
  • Kohonen, T., 173
  • Konstantinou, N., 66
  • Kotsias, P., 227
  • KPCA (kernel principal component analysis), 150, 161, 165–169
  • KPMG Worldwide, 209, 291
  • Krauledat, M., 225
  • Krishnaveni, M., 292
  • Kuljanin, G., 54
  • Kullback-Leibler divergence, 221
  • Landmark Isomap (L-Isomap), 173
  • Lane detection, 277
  • Langs, G., 227
  • Laplacian kernel, 165, 166
  • Large audiences, 32
  • Large scale scene understanding (LSUN), 221
  • Latent factors, 175
  • LatentGAN, 227
  • Lau, R.Y., 222
  • LDA (linear discriminant analysis), 150, 161–165
  • Leakage of data, 93–94, 101–102, 103–104
  • Lean manufacturing, 311
  • Learning rate decay, 174
  • Learnings from data, 48
  • Least Square GANs (LSGANs), 218, 222–223
  • LeCun, Y., 214
  • Lee, H., 222
  • Legal risks, liability, and culture transformation, 267–268
  • length() function, 141
  • Li, H., 222
  • Li, Q., 222
  • Libkin, L., 54
  • Libraries,
    • importing, 81–82
    • for ML image processing, 246–248
  • Lidar, 276
  • Lima, A., 168–169
  • Linear dimensionality reduction techniques, 178
  • Linear dimension reduction techniques, 148, 150
  • Linear discriminant analysis (LDA), 150, 161–165
  • Linear kernel, 165
  • Line graph, 86, 87f
  • List, data structure in R, 125, 139–143
    • creation, 139
    • elements,
      • accessing, 140–142
      • manipulating, 142
      • merging, 142–143
      • naming, 139–140
  • Listening skills, 33
  • list() function, 139
  • Liu, K., 224
  • Liu, S., 224
  • Liu, Z., 226
  • LLE (locally linear embedding), 150, 169–171, 172
  • Loading, data, 2, 21, 26–27, 34, 54, 66, 71, 117
  • Locally linear embedding (LLE), 150, 169–171, 172
  • Local smoothing, 103
  • Logeswaran, L., 222
  • Logical type of atomic vector, 126
  • Logistics Regression, disadvantages of, 162
  • Loss function, least square, 222–223
  • LSGANs (Least Square GANs), 218, 222–223
  • Lu, W., 226
  • Luk, W., 224
  • Ma, L., 226
  • MacAvaney, S., 66
  • Machine learning (ML) for medical treatment,
    • data leakage, 93–94, 101–102, 103–104, 113
    • data preparation within cross-validation folds, 104
    • data preprocessing performance of, 102 use of, 100–101
    • data wrangling, 93–94
      • enhancement of express analytics, 106
      • examples, 96
      • significance of, 96
      • tools and methods, 99–100
      • tools for python, 96–99
      • use of, 101–104
    • data wrangling, stages, 94–96
      • cleaning, 95
      • discovery, 94
      • improving, 95
      • publishing, 95–96
      • structuring, 95
      • validation, 95
    • overview, 91–92
    • types, 105
  • Machine learning (ML) frameworks, in image processing
    • application, 236
    • frameworks and libraries for, 246–248
    • in image processing, 246–248
    • overview, 235–236
    • solution to problem using, 243–246
      • anomaly detection algorithm, 244
      • classification algorithms, 243, 244f
      • clustering algorithms, 245
      • regression algorithm, 244, 245
      • reinforcement algorithms, 245, 246
    • techniques, applications of, 238, 240–243
      • fraud detection, 240, 241f
      • Google translator, 242
      • personal assistants, 238, 240f
      • predictions, 238, 240f
      • product recommendations, 242
      • social media, 240, 241f
      • videos surveillance, 243
    • types, 236–238
      • reinforcement learning (RL), 236, 238, 239t
      • supervised learning (SL), 236– 237, 239t
      • unsupervised learning (UL), 236, 237, 239t
  • Magrittr, 116
  • Mahindra first cull wheels, 297
  • Mahindra & Mahindra, 290, 291, 302, 304t, 305t
  • Malsburg, C. von der, 173
  • Malware attacks, 39
  • Mao, X., 222
  • Map, defined, 174
  • Mapping applications for City Ops teams, Uber, 46–47
  • Marketplace forecasting, Uber, 47
  • Markovs decision process (MDP), 279–280
  • Maruti 800, 308
  • Maruti Production System (MPS), 311
  • Maruti Suzuki India Limited (MSIL); see also Suppliers network on SCM of Indian auto industry
    • competitive dimensions, 306–307
    • during COVID-19, 296–297, 302, 304t, 305t
    • distributors network, 311
    • logistics management, 312
    • manufacturing, 310–311
    • operations and SCM, 308–309
    • strategies, 307–308
    • suppliers network, 309–310
  • Maruti Suzuki Veridical Value, 297
  • Maruti Udyog Limited, 290
  • MATLAB, 27
    • toolbox for image processing, 247
  • Matplotlib, 24, 81, 89, 116
  • Matrix, data structure in R, 125, 132–136
    • assigning rows and columns names, 133
    • computation, 135–136
    • creation, 132–133
    • elements
      • assessing, 134
      • updating, 134–135
    • transposition, 136
  • matrix() function, 132
  • .max( ) function, 84, 85f
  • Medical data, 67
  • Medicine, 227
  • Meng, J., 224
  • Mescheder, L.M., 225
  • Metadata, creation of, 75–76
  • Metal gauge sensor, 199
  • Metaxas, D.N., 225
  • Metz, L., 220, 221, 226
  • Miao, X., 276
  • Microsoft Azure, 22
  • Microsoft SQL, 21
  • MidiNet, 227
  • Miksch, S., 54–55
  • MIR Flickr dataset, 219
  • Mirza, M., 214, 218
  • Mishandling of data, 39–41
  • Missing data (inaccurate data), 100–101
  • MNIST dataset, 219, 223
  • Modelling and forecasting analysis, 11
  • Monthly, quarterly cash flows, and expense management, 255
  • Mp4 video format, 286
  • Mroueh, Y., 225–226
  • Ms Access database, 204
  • MSIL. see Maruti Suzuki India Limited (MSIL)
  • Multiclass classification, 243
  • Multidimensional scaling (MDS), 172, 173
  • Munzner, T., 67
  • Murray, P., 164
  • Music, 227
  • MyDoom, 38
  • Mysql, 204
  • MySQL, 21, 100
  • Nankaku, Y., 168–169
  • Natural language processing (NLP), 242, 263
  • Nayak, J., 293
  • Nearest neighbors, 246
  • Neighbourhood size, 174
  • NET, 21
  • Netflix, 3, 4
  • Network-based attack, 40
  • NetworkX, 97, 98f
  • Neumayr, B., 66
  • Neural language processing, 238
  • Neural machine translation, 242
  • Neural nets, 246
  • Neural networks (NN), 176, 280
    • applications, 247
    • generative adversarial, 227
  • Ng, H., 224
  • Nguyen, M.H., 168
  • Nissan, 291
  • Niu, X., 224
  • Noisy data,
    • presence of, 101
    • process of handling, 103
  • Non-linear dimensionality reduction techniques, 148, 149, 150, 179
  • Non-linear mapping function, 165
  • Non-linear PCA, 161, 165
  • Novelty detection, 168
  • Nowozin, S., 225
  • Numerical Python (NumPy), 23, 81, 115, 279, 285
  • Nvidia, 226, 227
  • Nym health, 263
  • Object detection, 276
  • ObjGAN, 226
  • Obstacle avoidance, 283
  • ODBC, 21, 27
  • Odena, A., 225
  • Olmos, P.M., 55, 67
  • One-on-one, form of presentation, 31
  • Online data analysis preparation (OLAP), 192
  • Online shopping websites, 242
  • OpenCV, 247, 283–284
  • Open government data, 66
  • OpenRefine, 115
  • Optimization, data, 13
  • Oracle, 21, 100
  • Original equipment manufacturers (OEMs), 292
  • Orsi, G., 54
  • Osindero, S., 218
  • Output actions,
    • at produced stage, 13–14
    • at raw data stage, 6
    • at refined stage, 11–12
  • Ozair, S., 214
  • Pandas, 22, 23–24, 25, 81, 85, 97, 116
  • Pan-India automobile market, 306
  • Parallel transport unfolding, 173
  • PassGAN, 228
  • Patil, M.D., 148
  • Paton, N.W., 54
  • Pattern recognition, 170, 173, 194, 236
  • Paxata, 63, 64f
  • Pay and receive processing, 254
  • PCA (principal component analysis), 148, 149, 150, 158–161
  • PepsiCo (case study), 48–50
  • Performance monitoring, 36
  • Perl, 80–81
  • Personal assistants, 238, 240f
  • Phased manufacturing program (PMP), 310
  • Pie chart, 86, 87, 88f
  • Pivoting, 78
  • Plaisant, C., 54, 81
  • Plotly, 116
  • Plots, creating, 24
  • Polynomial kernel, 165, 166
  • Pouget-Abadie, J., 214
  • Power BI, 29–30, 55
  • Power query editor, 55
  • Predictions, apps for, 238, 240f
  • Predictive analysis for corporate enterprise, 204–207
  • Predictive analytics, 100
    • primary goal of, 190
  • Prescriptive analytics, 100
  • Presentation skills, 31–32
  • Principal component analysis (PCA), 148, 149, 150, 158–161
  • Probabilistic PCA, 161
  • Production data, 12–14, 73, 74
    • data optimization, 13
    • output actions, 13–14
    • stage actions, 77–78
  • Production industry, IoT and data science applications in, 196–207
    • data transformation, 199–204
      • analytical input, 201–204
      • cleaning and processing of data, 200–201
      • information collection and storage, 200
      • representing data, 201
      • inter linked devices, 199
      • predictive analysis for corporate enterprise, 204–207
  • Product recommendations, 242
  • Profiling, core, 79–80
    • individual values profiling, 80
    • set-based profiling, 80
  • Prykhodko, O., 227
  • Publishing, data, 16, 59, 95–96, 111
  • Publishing skills, 32–33
  • Purrr, 116
  • PwC report, 42
  • Python, as programming language, 22–25, 96–99, 115–116, 120
  • PyTorch, 247, 279
  • Qiu, G., 224
  • Q-learning, 280
  • Quadratic discriminant analysis (QDA), 165
  • Que, Z., 224
  • R, managing data structure in
    • heterogeneous data structures, 138–145
      • dataframe, 144–145
      • defined, 138
      • list, 139–143
    • homogeneous data structures, 124, 125–138
      • array, 136–138
      • factor, 131–132
      • matrix, 132–136
      • vectors, 125–131
    • overview, 123–125
  • Radford, A., 220, 221, 225, 226
  • Radial Basis Function (RBF) kernel, 165, 166
  • Random forest algorithm, 92
  • Rattenbury, T., 55
  • Raw data, defined, 110
  • Raw data stage, 4–8, 73, 74–76
  • Raw type of atomic vector, 126
  • Raychaudhuri, S., 161
  • Real-time business intelligence, 193
  • Real-time lane detection and obstacle avoidance, 283
  • Records, dataset’s, 6–7
  • Recovery, data, 35
  • Recycle GAN, 226
  • Red-wine quality dataset, 178, 179, 180t
  • Reed, Z.A., 222
  • Reed gauge, 199
  • Refined data, 9–12, 73, 74
    • accuracy issues, 10–11
    • design and preparation, 9
    • granularity issues, 10
    • output actions at refined stage, 11–12
    • scope issues, 11
    • stage actions, 76–77
    • structure issues, 9
  • Regression-based algorithms, 103, 244, 245
  • Regularised discriminant analysis (RDA), 165
  • Reinforcement algorithms, 245, 246
  • Reinforcement learning (RL), 236, 238, 239t
  • Relational database, 6
  • ReLU activation function, 221
  • Renault, 302, 304t, 305t
  • Representational consistency, defined, 6
  • Representation of data, 201, 207–208
  • Reproducibility of data, 111, 114
  • Reputation, diminished, 42
  • Resende, F.G., 168–169
  • Resource chain management, 206
  • Response without thinking, 33
  • Responsibilities as database administrator, 20, 34–37
    • capacity planning, 36
    • data authentication, 35
    • data backup and recovery, 35
    • database tuning, 36–37
    • data extraction, transformation, and loading, 34
    • data handling, 35
    • data security, 35
    • effective use of human resource, 36
    • security and performance monitoring, 36
    • software installation and maintenance, 34
    • troubleshooting, 36
  • REST, 21
  • Riche, N.H., 54, 81
  • Riegling, M., 48–49
  • RL (reinforcement learning), 236, 238, 239t
  • Robotic Process Automation (RPA), 258
  • Robust KPCA, 168
  • Robust PCA, 161
  • Rows, in relational database, 6, 7
  • R programming language, 25–26, 80–81, 116
  • RStudio, 120
  • Runaway effect, 264
  • Russell, C., 224
  • SAGAN, 225
  • Saini, O., 178
  • Salimans, T., 225
  • Sallinger, E., 66
  • Samadi, K., 224
  • Sane, S.S., 148
  • Sarveniaza, A., 150
  • Saxena, G.A., 173
  • Scala, 27–28
  • Schiele, B., 222, 226
  • Schlegl, T., 227
  • Schmidt-Erfurth, U., 227
  • Schulman, J., 223
  • Scikit-learn, 22, 25
  • SciPy, 24–25
  • Scipy.integrate, 24
  • Scipy.linalg, 24
  • Scipy.optimize, 24
  • Scipy.signal, 24
  • Scipy.sparse, 25
  • Scipy.stats, 25
  • SCM (supply chain management) of Indian auto industry. see Suppliers network on SCM of Indian auto industry
  • Scope of dataset, 8
    • issues, 11
  • Security, 227–228
    • data, 35
    • performance monitoring and, 36
  • Seeböck, P., 227
  • Self-driving car simulation, 281
  • Self-driving technology, 246
  • Self-organising maps (SOMs), 150, 173–174
  • Self-service analytics, 50
  • Semantic constraints, 80
  • Sensors, 199
  • Sequence (seq) operator, vectors using, 127
  • Sercu, T., 225–226
  • Service Mandi, 292
  • Set-based profiling, 80
  • Shah, M., 276
  • Shah, M.K., 294
  • Sigmoid kernel, 165, 166
  • Single element vector, 125–126
  • Singular value decomposition (SVD), 150, 174–175
  • Siri, 236, 238
  • Skills and responsibilities of data wrangler,
    • case studies, 42–50
      • PepsiCo, 48–50
      • Uber, 42–48
    • data administrators
      • responsibilities, 34–37
      • roles, 20, 21–22
    • database administrator (DBA), role, 20, 21–22
    • overview, 20
    • soft skills, 31–34
      • business insights, 32
      • issues, 33–34
      • presentation skills, 31–32
      • response without thinking, 33
      • speaking and listening skills, 33
      • storytelling, 32
      • writing/publishing skills, 32–33
    • technical skills, 22–30
      • Excel, 28
      • MATLAB, 27
      • Power BI, 29–30
      • python, 22–25
      • R programming language, 25–26
      • Scala, 27–28
      • SQL, 26–27
      • Tableau, 28–29
  • SL (supervised learning), 236–237, 239t
  • Small intimate groups, 31
  • Smart intelligence, examples of, 193
  • Smart production, 194
  • Smith, B., 67
  • Smolley, S.P., 222
  • Snore-GAN, 227
  • Social attack, 40–41
  • Social media using phone, 240, 241f
  • Society of Indian Automobile Manufacturers (SIAM), 295, 301
  • Soft skills, of data wrangler, 31–34
    • business insights, 32
    • issues, 33–34
    • presentation skills, 31–32
    • response without thinking, 33
    • speaking and listening skills, 33
    • storytelling, 32
    • writing/publishing skills, 32–33
  • Software installation and maintenance, 34
  • Solvexia, 114
  • SOMs (self-organising maps), 150, 173–174
  • sort() function, 130–131
  • Spark, 27, 28
  • Sparse KPCA, 168, 169
  • Sparse PCA, 161
  • Speaking and listening skills, 33
  • Spectral normalization, 225
  • Spectral regularization technique (SR-GAN), 224–225
  • Speech recognition, 168
  • Spline kernel, 165
  • Splitstackshape, 116
  • SQL, 26–27, 55, 117
  • SQL DBA, 21
  • SQLJ, 21
  • Srivastava, A., 224
  • SSGAN, 228
  • StackGANs, 218, 222
  • Statsmodel, 25
  • #Stayhomestaysafe, 294
  • StormWorm, 38
  • Storytelling, 32
  • str() function, 132, 141
  • Structuring data, 15, 78, 95
  • Stuart, J.M., 161
  • StyleGAN, 226
  • summary() function, 141–142
  • Sun, Q., 226
  • Supervised dimensionality reduction, 161
  • Supervised learning (SL), 236–237, 239t
  • Supervised machine learning algorithms, 99, 105
  • Supervised vs unsupervised learning, 215–216
  • Supplier on boarding and procurement, 255
  • Suppliers network on SCM of Indian auto industry
    • discussion, 306–312
      • competitive dimensions, 306–307
      • MSIL distributors network, 311
      • MSIL logistics management, 312
      • MSIL manufacturing, 310–311
      • MSIL operations and SCM, 308–309
      • MSIL strategies, 307–308
      • MSIL suppliers network, 309–310
    • findings, 298–306
      • effect on Indian automobile industry, 301–305
      • global automobile industry, 298, 300–301
      • post COVID-19 recovery, 306
      • worldwide economic impact of epidemic, 298, 299t
    • literature review, 292–297
    • methodology, 297–298
    • MSIL during COVID-19, 296–297
    • overview, 290–292
    • prior pandemic automobile industry, 294–296
  • Supply chain management (SCM) of Indian auto industry. see Suppliers network on SCM of Indian auto industry
  • Surge pricing, 44–45
  • Sutskever, I., 223
  • Sutton, C.A., 224
  • Suzuki Inc. (Japan), 307
  • Suzuki Motor corporation, 290
  • SVD (singular value decomposition), 150, 174–175
  • Syntactic constraints, 80
  • Tableau, 28–29, 49, 50, 100
  • Tabula, 61, 62f, 115
  • .tail( ) function, 83, 84f
  • Talend, 65, 75
  • Tang, W., 224
  • TanH activation function, 221
  • Tata motors, 290–291, 296, 302, 304t, 305t, 306
  • Tata –Nano, 308
  • Technical skills, of data wrangler, 22–30
    • Excel, 28
    • MATLAB, 27
    • Power BI, 29–30
    • python, 22–25
    • R programming language, 25–26
    • Scala, 27–28
    • SQL, 26–27
    • Tableau, 28–29
  • Temporal difference (TD), 280
  • Temporality, 8
  • Tenenbaum, J.B., 149
  • Tensorflow, 247
  • TensorFlow K-NN classification technique, 194
  • Tesla, 292
  • Test dataset, 237
  • Text mining, 192
  • t() function, 136
  • Theano, 116
  • Theft, data, 40
  • Thermal imaging sensor, 199
  • Tokuda, K., 168–169
  • Tomer, S., 294
  • Tools, data wrangling, 59–65
    • Altair Monarch, 60, 61f
    • Anzo, 60, 61, 62f
    • basic data munging tools, 115
    • cleaning and consolidating data, 100
    • Datameer, 63, 64f
    • Excel, 59–60
    • extracting insights from data, 100
    • Paxata, 63, 64f
    • processing and organizing data, 99–100
    • for python, 96–99, 115–116
    • R tool, 116
    • Tabula, 61, 62f, 115
    • Talend, 65
    • Trifacta, 61, 63
  • Toyota, 290, 291, 294, 301, 302, 304t, 305t
  • #ToyotaWithIndia, 294
  • Traffic data, 66–67
  • Training dataset, 237
  • Transformation, data, 2, 21, 26–27, 34, 54, 63, 66, 71, 117
  • Transformation tasks, in data wrangling, 78–79
    • cleansing, 79
    • enriching, 78–79
    • structuring, 78
  • Transpose of matrix, 136
  • Trifacta, 49, 50, 55, 61, 63
  • Trifacta wrangler, 55, 61, 66
  • Troubleshooting, 36
  • Trust, loss of, 42
  • Tuytelaars, T., 226
  • Twitter, 119, 194
  • Uber (case study), 42–48
  • UberPOOL, 46
  • UL (unsupervised learning), 236, 237, 239t, 245
  • Unions, 79
  • United States, COVID-19 on automotive sector, 300–301
  • Unsupervised learning (UL), 236, 237, 239t, 245
    • supervised vs, 215–216
  • Unsupervised machine learning algorithms, 99, 105
  • VAEs (variational autoencoders), 67, 215, 224
  • Validation,
  • Valkov, L., 224
  • Valuation offerings, information management to, 195–196
  • Value-added data system (VADA), 66
  • van der Maaten, L.J.P., 148
  • van Ham, F., 54, 81
  • Varghese, S., 293
  • Variances, defined, 159
  • Variational autoencoders (VAEs), 67, 215, 224
  • Vectors, data structure in R, 124, 125–131
    • arithmetic operations, 129–130
    • atomic vectors, types, 125–126
    • element recycling, 130
    • elements, accessing, 128–129
    • nesting of, 129
    • sorting of, 130–131
    • using c() function, 127–128
    • using colon operator, 126
    • using sequence (seq) operator, 127
  • VEEGAN, 224
  • Verizon, 42
  • Videos, 226
    • surveillance, 243
  • Vidya, R., 292
  • Visa exchange, 257
  • Visualization,
  • VLOOKUP function, 28
  • Volkswagen, 293, 306
  • Waldstein, S.M., 227
  • Wang, L., 226
  • Wang, Z., 222
  • WannaCry, 38
  • Warde-Farley, D., 214
  • Warehouse administrator, 21
  • Wasserstein distance, 221
  • Wasserstein GANs (WGANs), 218, 221–222
  • WebGazer, 247–248
  • Websites, online shopping, 242
  • Wei, X., 226
  • #WePledgeToBeSafe, 294
  • WGANs (Wasserstein GANs), 218, 221–222
  • Wikiart dataset, 227
  • Wisconsin breast cancer dataset, 178, 179, 181t
  • Within-class scatter matrix, 163, 164
  • Wood inspection, 173
  • Workflow framework, holistic,
    • actions in, 74–78
      • production data stage, 77–78
      • raw data stage, 74–76
      • refined data stage, 76–77
    • for data projects, 72–74
  • World Health Organization (WHO), 294
  • Wrangler edge, 61
  • Wrangler enterprise, 61
  • Writing skills, 32–33
  • Xero, 261
  • Xie, H., 222
  • XML, data format, 7
  • Xu, B., 214
  • Xu, T., 222
  • Xu, Z., 293
  • Yan, X., 222
  • Yates, A., 66
  • Yazdanbakhsh, A., 224
  • YFCC100M dataset, 219
  • Yoo, H., 276
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset