
AI systems

and heuristics, 148

types of, 157–61

AI-based methods, 56–58


graph, 48–50

Amazon Echo, 55

Amazon’s MXnet, 179

Analytica, 71

Anomaly detection, 40–41

Artificial creativity, 56, 60, 163

Artificial Intelligence (AI), 18, 24

importance of, 156–57

in data science, 155

trends in, 176

Artificial Neural Networks (ANN), 54, 157, 166

Association rules, 46

Autoencoders, 159, 166

Automated data exploration, 59

Automated data exploration methods, 45

Automatic translation systems, 164

Big data, 13, 15–17, 24, 176

4 V’s of, 16–17

Bootstrap method, 109

Bootstrapping, 109–10

Business intelligence, 14

Butterfly effect, 40, 108–9, 114

Chatbots, 55, 60, 162–63

Classification, 38, 40

Classification methodology

model for, 138–39

Classifier, 38

Clustering methods, 44, 46–47

Collaborative filtering method, 42, 43, 44

Collaborative projects, 181–82

Computer vision, 161, 166

Conditional Random Field, 53

Confidence, 101–3, 105

Confidentiality, 168, 170, 171, 174

Content-based systems, 42, 43

Convolution layers, 159

Convolutional Neural Networks (CNNs), 166

Cosine distance, 43

CVS, 78

D3.js, 73–74


pairing with model, 137–38

Data analytics software, 70–72

Data anonymization, 170, 174

Data discovery, 30, 35

Data engineering, 20, 24, 26, 35

Data exploration method, 28, 35, 45, 46, 49

Data frame, 27

Data governance, 75, 80

Data governance software, 74–75

Data learning, 31, 35

Data mining method, 45–46

Data modeling, 21, 24, 29–30, 35

Data preparation, 27, 35

Data product creation, 32–33, 35

Data representation, 28–29, 35

Data science, 13, 24

and AI, 155, 161

and AI consideration, 165

future trends and, 175–83

heuristics in, 145, 149

methodologies in, 37–58

programming languages for, 64–67

Data science ethics, 167

importance of, 167–68

Data science pipeline, 34

Data science process, 24

mistakes in, 125, 131

Data science professionals

need for, 22–23

Data science research, 180

Data science technologies, 185

Data science vs. business intelligence

vs. statistics, 13

Data scientists

continuous education and, 181

functions of, 20–21

inability of, 21–22

need for, 20, 22–23

tools and software of, 61

versatility of, 179–80

Data security, 171–72, 174

Database platforms, 61

Databases, 62

and data science, 79

Deep learning networks, 158–59, 166

Dimensionless space, 48

Discrete target variable

and classification, 38

Distance functions, 47

DL networks, 178

Ensemble, 151

Ensemble setting, 139

Entity recognition, 53

Ethics, 173–74

data science, 167

importance of, 167–68

Experiment conclusions

sensitivity analysis and, 107

Experiments, 97

and predictive analytics system, 100–101

constructing, 98–99

evaluating the results, 103–4

Extreme learning machines (ELMs), 160

Feature creation, 127

Feature evaluation

and heuristics, 150

Feature set, 90–91

Fuzzy logic systems, 160

GenEx, 54

Git, 77

Github, 78, 182

Global sensitivity analysis, 109

GPUs, 178


composition of, 47

Graph algorithms, 48–50

Graph analysis, 54, 59

Graph analytics, 47

Graph modeling, 50

Graph processing, 50

Graph-based databases, 63

Graphs, 59

uses of, 50

Hadoop, 75–76, 178


anatomy of, 151–53

Heuristics, 145–46, 153–54

and AI systems, 148

and feature evaluation, 150

applications and, 151

in data science, 145, 149

solving problems with, 146–48

High-level mistakes

coping with, 135–36

Hypothesis, 85–86, 95

Inductive/deductive classifiers, 38


on big data, 17

Information distillation, 21, 24, 32, 35

Insight, deliverance, and visualization

of findings, 33, 35

Internet Movie Database (IMDb), 42

Internet of Things (IoT), 176

Jaccard distance, 43

Jackknife, 110–11

JavaScript library, 73

Julia, 65, 67

libraries for, 69–70

KPSpotter, 54

Latent Dirichlet Allocation (LDA), 52

Libraries, 68–70

Licensing matters, 172–73, 174

Local sensitivity analysis, 112

Machine Learning (ML), 18, 24

Machine learning processes

heuristics and, 149

Massive Open Online Courses (MOOC’s), 23

Mathematica, 72, 74


Mentor, 132

value of, 131

Mentoring, 182

Minimum Spanning Tree, 49

Mistakes, 132

common types, 126–29

high-level, 135–36

Mistakes vs. bugs, 125–26


and data pairing, 137–38

choosing, 129–30, 136, 140–41

Monte Carlo, 111

Natural Language Processing (NLP), 51, 59, Error! Not a valid bookmark in entry on page 59

Navigation systems, 164

Neo4j, 50

NLP methods, 54

NMF algorithm, 44

Non-negative matrix factorization (NMF or NNMF), 43, 44

NoSQL databases, 62–63

Novelty detection, 40–41

Null Hypothesis, 86

Object-Oriented Programming (OOP), 177

Octave, 71

Open-source programming platforms, 79

Outlier prediction, 41

Packages, 68–70

Parallelization, 178

Permutation methods, 110

Pipeline, 197, 73

Pooling layers, 159

Predictive analytics, 37, 58

Predictive analytics method, 43

Predictive analytics methodology, 41

Privacy, 168–69, 174

Programming bugs, 117–22, 122

common types, 119–22

considerations on, 122

coping with, 133–35

understanding, 117–18

where they are, 117–18

Programming languages, 79

choosing, 67

for Data Science, 64–67

Julia, 65

Python, 65–66

Programming paradigms, 177

Python, 65–66

libraries for, 68

Query Language (SQL), 62

Questions, 95

and common cases, 86

and data science, 83

and for predicting variable, 89–90

not to ask, 94

relationships and, 87–88

what to ask, 84–85

R, 66

Randomization technique, 110

Recommendation systems, 42

Recommender systems, 42, 43, 58, 137

Recurrent Neural Networks (RNNs), 166

Regression, 39

Regularization, 44

Resampling methods, 109, 114


application of, 38

Scala, 67

Scalability, 152

Scilab, 71

Sensitivity analysis, 107, 112–13, 113

importance of, 107

Sentiment analysis, 51, 53, 60

Slack, 182

Spark, 75, 76

SQL-based databases, 62

Statistics, 14

Stopwords, 53

Storm, 76

Tableau, 74

TCP tunneling technique, 169

Tensor Processing Units, 178

Text prediction, 41–42

Text summarization, 53–54, 60

Time-series analysis, 40

Topic extraction, 60

Topic Extraction/Modeling, 52

Transductive classifiers, 38

Turney, 54

Utility matrix, 42, 44


of big data, 17

Variety, 17, 24

Velocity, 16, 24

Veracity, 17, 24

Versatilist data scientist, 179–80

Version control systems (VCS) software, 77–79, 80

Virtual private network (VPN), 169

Visualization options, 79

Visualization software, 73

Volume, 16, 24

WolframAlpha, 74

Workflow. See Pipeline

Yahoo, 169

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.