As this ebook edition doesn't have fixed pagination, the page numbers below are hyperlinked for reference only, based on the printed edition of this book.
A
access requirements 242
role-based access 242
user group-based access 242
ad hoc reports 190
aggregate variables 78
all-purpose tools 196
alpha 151
type I error, interacting with 154, 155
type II error, interacting with 154, 155
alphanumeric data 22
alternative hypothesis (H1) 149
versus null hypothesis (H0) 149, 150
Anaconda
URL 93
analysis
assumptions 103
list, making 103
selecting 102
selecting, challenges 102
analysis type
analytical tools 194
application programming interface (API) 33, 34
asynchronous API 33
audio files
format 25
automated checks 258
B
bell curve 111
Bernoulli distribution 113
binary variables 23
binomial distribution 114
Bottom Line Up Front (BLUF) 212
bubble charts 233
business services tools 196
C
cardinality 249
many-to-many cardinality 250
one-to-many cardinality 249
one-to-one cardinality 249
casewise deletion 58
categorical variables 23
binary variables 23
nominal variables 24
ordinal variables 24
categories
recoding, into numbers 80
chi-square goodness of fit test 168
chi-square statistic 167
chi-square test for independence 168
C-level executives 209
Cognitive Biases
reference link 38
color theory 221
reference link 222
Comma Separated Values (CSV) 26
complex frequency tables 130-132
compliance reports 192
CompTIA 4
CompTIA Data+ 3
examinee 7
format 6
confidence intervals 137
contingency table 130
continuous variables 23
examples 23
calculating 170
code 174
negative correlation 171
positive correlation 171
cross-validation 256
currency data 22
D
dashboard report 191
data 22
shaping, with common functions 82
transposing 83
Data+: DAO-001 4
data accuracy 255
data acquisition 254
Data Analysis 5
data appending 76
data attribute limitations 256
data audits 257
database archetypes 10
data blending 75
data classifications 246
Payment Card Industry (PCI) 247
personal health information (PHI) 247
personally identifiable information (PII) 246, 247
data collection 34
surveys 35
web scraping 34
data completeness 255
Data Concepts and Environments 5
data consistency 255
data constraints 249
Data Governance, Quality, and Control 6
data integrity 241
data lakes 18
data linkage 248
data marts 18
data merging
join, using 71
Data Mining 5
data pipeline 40
extraction 40
loading 40
transformation 40
data profiling 257
types 257
data quality dimensions 255
data accuracy 255
data attribute limitations 256
data completeness 255
data consistency 255
data quality rules and metrics 256
data refresh date 214
data schemas 14
data security 242
access requirements 242
requirements 243
data transformation 254
data type validation 63
acceptable use policy 244
data breach 245
data deletion 245
data processing 244
data retention 246
data warehouses 18
data wrangling 69
dates
working with 82
defined rows and columns 10
degrees of freedom (dfs) 138
delta load 42
dependent t-test 162
dependent variables 24
derived variable
dimension reduction 92
dimensions 211
dimension tables 14
discrete variables 23
examples 23
Bernoulli distribution 113
binomial distribution 114
exponential distribution 112
leptokurtic distribution 115
normal distribution 111
platykurtic distribution 115
Poisson distribution 112
skew 114
Document Object Model (DOM) 35
drop-down menus 36
dummy coding 81
managing 51
cons 189
pros 188
versus static reports 189
E
encryption requirements
reference link 243
entity relationship requirements 248
cardinality 249
data constraints 249
handling 248
record link restrictions 248
exam domains 5
Data Analysis 5
Data Concepts and Environments 5
Data Governance, Quality, and Control 6
Data Mining 5
Visualization 6
exploratory data analysis (EDA) 90
exploring 90
types 90
exploratory data analysis (EDA), types
descriptive statistics 91
relationships 92
exponential distributions 112
Extensible Markup Language (XML) 27
Extract, Load, Transform (ELT) 41, 42
Extract, Transform, Load (ETL) 41
F
feature reduction 92
field definitions 212
file types 24
audio files 25
flat files 26
images 25
text 25
video files 26
website 26
filtering, stages
query filtering 206
report filtering 206
final product 255
flat files
Comma Separated Values (CSV) 26
Tab Separated Values (TSV) 26
forecasting 101
complex frequency tables 130-132
frequency table 130
Frequently Asked Questions (FAQs) 214
full load 42
G
General Data Protection Regulation (GDPR)
reference link 245
general public 210
graph databases 13
H
Health Insurance Portability and Accountability Act (HIPAA)
reference link 247
histograms 229
homoscedasticity 178
Hypertext Markup Language (HTML) 27
need for 146
I
imputation 59
independent t-test 162
independent variables 24
example 226
reference link 226
International Organization for Standardization (ISO) 22
interpolation 60
interquartile range 121
interquartile range (IQR) 65
J
JavaScript Object Notation (JSON) 27
joins
types 71
K
Kaggle 32
URL 32
key 13
key performance indicators (KPIs) 77, 100
Key Table 70
key-value pairs 10
K Nearest Neighbors (KNN) 60
kurtosis 116
L
left joins 74
leptokurtic distribution 115
line charts 230
link analysis 102
listwise deletion 58
M
machine data 12
machine learning (ML) algorithm 22, 60
management 210
map 233
master data management (MDM) 258
master data management (MDM), processes 259
consolidation 259
data dictionary 260
standardization 260
master data management (MDM), usage
streamlining data access 259
mean 116
example 116
using 118
measures 211
measures of central tendency 91, 116
measures of dispersion 91
measures of frequency 91
measures of position 91
median 117
example 117
using 119
merge 71
Missing at Random (MAR) 56
Missing Completely at Random (MCAR) 56
missing data
dealing with 55
dealing, with MNAR 60
deletion 58
imputation 59
interpolation 60
types 56
missing data, deletion
filtering 59
listwise deletion 58
pairwise deletion 58
variable deletion 59
missing data, imputation
hot deck imputation 59
mean 59
median 59
mode imputation 59
missing data, types
Missing at Random (MAR) 56
Missing Completely at Random (MCAR) 56
Missing Not at Random (MNAR) 57
Missing Not at Random (MNAR) 57
dealing with 60
mode 118
example 118
using 119
multicollinearity 53
multiple-choice answers 36
N
natural language processing (NLP) 35, 79
negative correlation 171
nominal variables 24
non-negative matrix factorization (NMF) 92
non-relational databases 12, 13
normal distribution 111
null hypothesis (H1) 149
versus alternative hypothesis (H0) 149, 150
numbers
recoding, into categories 80
numeric data 22
O
one-sample t-test 162
online transactional processing (OLTP) 42, 43
operating system (OS) 25
operational reports 193
ordinal variables 24
outer joins 73
outliers
P
paired t-test 162
pairwise deletion 58
parameterization 45
Pareto charts 231
Payment Card Industry (PCI) 247
Pearson’s chi-square 168
percent change
performance analysis 99
key performance indicators (KPIs) 100
process analytics 101
project management 100
personal health information (PHI) 247
personally identifiable financial information (PIFI) 247
personally identifiable information (PII) 246, 247
platykurtic distribution 115
point-in-time 187
Poisson distribution 112
positive correlation 171
principal component analysis (PCA) 92
process analytics 101
programming language tools 195
project management 100
public data sources
utilizing 31
p-value 150
Q
quality control 253
check 254
data quality dimensions 255
data quality rules and metrics 256
quality control techniques
data acquisition 254
data manipulation 254
data transformation 254
final product 255
quality validation 256
automated checks 258
cross-validation 256
data audits 257
data profiling 257
reasonable expectations 257
sample/spot checks 257
query 43
query structure
optimizing 43
query tools 195
R
ranges 119
real-time 187
reasonable expectations 257
record linkage 248
record link restrictions 248
recurring reports 192
compliance report 192
operational reports 193
risk and regulatory reports 193
reduction variables
calculating 78
managing 51
regulations
levels 193
regulatory reports 192
relational database management system (RDBMS) 195
release approval 243
report development process 201, 202
plan approval, obtaining 203
report, creating 203
report preparation
business requirements 204
considerations 204
dashboard-specific requirements 211
report preparation, business requirements
filtering 206
frequency 208
report preparation, dashboard-specific requirements
data attributes 211
data sources 211
report run date 214
reports
designing 216
reports design
branding 216
color theory 221
return on investment (ROI) 100
risk and regulatory reports 193
role-based access 242
S
sample/spot checks 257
scatter plots 232
security requirements 243
data encryption 243
data transmission 243
de-identification/masking of data 243
simple linear regression 174-177
single-choice answers 36
cons 17
pros 16
Spearman’s correlation 172
specification mismatch 62
spreadsheet tools 195
SQL database 13
stakeholders 210
example 125
standard scores 139
cons 15
pros 15
cons 188
pros 188
versus dynamic reports 189
stored data
number of recorded variables, modifying 20, 21
record, updating with up-to-date value 19, 20
updating 19
Structured Query Language (SQL) 12, 195
Student’s t-test 162
study 39
subqueries 46
survey answers, types
drop-down menus 36
multiple-choice answers 36
single-choice answers 36
text-based answers 35
surveys 35
survey answers, types 35
synchronous API 33
system functions 83
T
Tab Separated Values (TSV) 26
tactical reports 190
tags 77
technical experts 210
text-based answers 35
time series analysis 101
tokenization 79
tools
learning, to use 197
tree maps 235
trend analysis 101
t-test 162
assumptions 163
best practice 163
Twitter API
reference link 34
type I error 153
interacting, with alpha 154, 155
type II error 153
interacting, with alpha 154, 155
U
undefined field 11
unpaired t-test 162
unstructured databases 11
machine data 12
undefined field 11
use requirements 244
user group-based access 242
V
variable deletion 59
variables
recoding 79
variable types 22
categorical variables 23
continuous variables 23
dependent variables 24
discrete variables 23
independent variables 24
video files
format 26
views 206
READ-ONLY 206
virtual machine (VM) 40
visualization 6
visualization tools 196
W
waterfall charts 229
web scraping 34
website files 26
HTML 27
JSON file 27
XML 27
Windows Media Audio (WMA) 25
reference link 227
World Health Organization (WHO) 32
Z