AI systems
and heuristics, 148
types of, 157–61
AI-based methods, 56–58
Algorithms
graph, 48–50
Amazon Echo, 55
Amazon’s MXnet, 179
Analytica, 71
Anomaly detection, 40–41
Artificial creativity, 56, 60, 163
Artificial Intelligence (AI), 18, 24
importance of, 156–57
in data science, 155
trends in, 176
Artificial Neural Networks (ANN), 54, 157, 166
Association rules, 46
Autoencoders, 159, 166
Automated data exploration, 59
Automated data exploration methods, 45
Automatic translation systems, 164
Big data, 13, 15–17, 24, 176
4 V’s of, 16–17
Bootstrap method, 109
Bootstrapping, 109–10
Business intelligence, 14
Butterfly effect, 40, 108–9, 114
Chatbots, 55, 60, 162–63
Classification, 38, 40
Classification methodology
model for, 138–39
Classifier, 38
Clustering methods, 44, 46–47
Collaborative filtering method, 42, 43, 44
Collaborative projects, 181–82
Computer vision, 161, 166
Conditional Random Field, 53
Confidence, 101–3, 105
Confidentiality, 168, 170, 171, 174
Content-based systems, 42, 43
Convolution layers, 159
Convolutional Neural Networks (CNNs), 166
Cosine distance, 43
CVS, 78
D3.js, 73–74
Data
pairing with model, 137–38
Data analytics software, 70–72
Data anonymization, 170, 174
Data discovery, 30, 35
Data engineering, 20, 24, 26, 35
Data exploration method, 28, 35, 45, 46, 49
Data frame, 27
Data governance, 75, 80
Data governance software, 74–75
Data learning, 31, 35
Data mining method, 45–46
Data modeling, 21, 24, 29–30, 35
Data preparation, 27, 35
Data product creation, 32–33, 35
Data representation, 28–29, 35
Data science, 13, 24
and AI, 155, 161
and AI consideration, 165
future trends and, 175–83
heuristics in, 145, 149
methodologies in, 37–58
programming languages for, 64–67
Data science ethics, 167
importance of, 167–68
Data science pipeline, 34
Data science process, 24
mistakes in, 125, 131
Data science professionals
need for, 22–23
Data science research, 180
Data science technologies, 185
Data science vs. business intelligence
vs. statistics, 13
Data scientists
continuous education and, 181
functions of, 20–21
inability of, 21–22
need for, 20, 22–23
tools and software of, 61
versatility of, 179–80
Data security, 171–72, 174
Database platforms, 61
Databases, 62
and data science, 79
Deep learning networks, 158–59, 166
Dimensionless space, 48
Discrete target variable
and classification, 38
Distance functions, 47
DL networks, 178
Ensemble, 151
Ensemble setting, 139
Entity recognition, 53
Ethics, 173–74
data science, 167
importance of, 167–68
Experiment conclusions
sensitivity analysis and, 107
Experiments, 97
and predictive analytics system, 100–101
constructing, 98–99
evaluating the results, 103–4
Extreme learning machines (ELMs), 160
Feature creation, 127
Feature evaluation
and heuristics, 150
Feature set, 90–91
Fuzzy logic systems, 160
GenEx, 54
Git, 77
Github, 78, 182
Global sensitivity analysis, 109
GPUs, 178
Graph
composition of, 47
Graph algorithms, 48–50
Graph analysis, 54, 59
Graph analytics, 47
Graph modeling, 50
Graph processing, 50
Graph-based databases, 63
Graphs, 59
uses of, 50
Hadoop, 75–76, 178
Heuristic
anatomy of, 151–53
Heuristics, 145–46, 153–54
and AI systems, 148
and feature evaluation, 150
applications and, 151
in data science, 145, 149
solving problems with, 146–48
High-level mistakes
coping with, 135–36
Hypothesis, 85–86, 95
Inductive/deductive classifiers, 38
Infographic
on big data, 17
Information distillation, 21, 24, 32, 35
Insight, deliverance, and visualization
of findings, 33, 35
Internet Movie Database (IMDb), 42
Internet of Things (IoT), 176
Jaccard distance, 43
Jackknife, 110–11
JavaScript library, 73
Julia, 65, 67
libraries for, 69–70
KPSpotter, 54
Latent Dirichlet Allocation (LDA), 52
Libraries, 68–70
Licensing matters, 172–73, 174
Local sensitivity analysis, 112
Machine Learning (ML), 18, 24
Machine learning processes
heuristics and, 149
Massive Open Online Courses (MOOC’s), 23
Mathematica, 72, 74
MATLAB, 71
Mentor, 132
value of, 131
Mentoring, 182
Minimum Spanning Tree, 49
Mistakes, 132
common types, 126–29
high-level, 135–36
Mistakes vs. bugs, 125–26
Model
and data pairing, 137–38
choosing, 129–30, 136, 140–41
Monte Carlo, 111
Natural Language Processing (NLP), 51, 59, Error! Not a valid bookmark in entry on page 59
Navigation systems, 164
Neo4j, 50
NLP methods, 54
NMF algorithm, 44
Non-negative matrix factorization (NMF or NNMF), 43, 44
NoSQL databases, 62–63
Novelty detection, 40–41
Null Hypothesis, 86
Object-Oriented Programming (OOP), 177
Octave, 71
Open-source programming platforms, 79
Outlier prediction, 41
Packages, 68–70
Parallelization, 178
Permutation methods, 110
Pipeline, 197
Plot.ly, 73
Pooling layers, 159
Predictive analytics, 37, 58
Predictive analytics method, 43
Predictive analytics methodology, 41
Privacy, 168–69, 174
Programming bugs, 117–22, 122
common types, 119–22
considerations on, 122
coping with, 133–35
understanding, 117–18
where they are, 117–18
Programming languages, 79
choosing, 67
for Data Science, 64–67
Julia, 65
Python, 65–66
Programming paradigms, 177
Python, 65–66
libraries for, 68
Query Language (SQL), 62
Questions, 95
and common cases, 86
and data science, 83
and for predicting variable, 89–90
not to ask, 94
relationships and, 87–88
what to ask, 84–85
R, 66
Randomization technique, 110
Recommendation systems, 42
Recommender systems, 42, 43, 58, 137
Recurrent Neural Networks (RNNs), 166
Regression, 39
Regularization, 44
Resampling methods, 109, 114
Rules
application of, 38
Scala, 67
Scalability, 152
Scilab, 71
Sensitivity analysis, 107, 112–13, 113
importance of, 107
Sentiment analysis, 51, 53, 60
Slack, 182
Spark, 75, 76
SQL-based databases, 62
Statistics, 14
Stopwords, 53
Storm, 76
Tableau, 74
TCP tunneling technique, 169
Tensor Processing Units, 178
Text prediction, 41–42
Text summarization, 53–54, 60
Time-series analysis, 40
Topic extraction, 60
Topic Extraction/Modeling, 52
Transductive classifiers, 38
Turney, 54
Utility matrix, 42, 44
Value
of big data, 17
Variety, 17, 24
Velocity, 16, 24
Veracity, 17, 24
Versatilist data scientist, 179–80
Version control systems (VCS) software, 77–79, 80
Virtual private network (VPN), 169
Visualization options, 79
Visualization software, 73
Volume, 16, 24
WolframAlpha, 74
Workflow. See Pipeline
Yahoo, 169