Index

As this ebook edition doesn't have fixed pagination, the page numbers below are hyperlinked for reference only, based on the printed edition of this book.

A

access requirements 242

role-based access 242

user group-based access 242

ad hoc reports 190

aggregate variables 78

all-purpose tools 196

alpha 151

type I error, interacting with 154, 155

type II error, interacting with 154, 155

alphanumeric data 22

alternative hypothesis (H1) 149

versus null hypothesis (H0) 149, 150

Anaconda

URL 93

analysis

assumptions 103

list, making 103

selecting 102

selecting, challenges 102

analysis type

selecting 104, 105

analytical tools 194

application programming interface (API) 33, 34

asynchronous API 33

audio files

format 25

automated checks 258

B

bar charts 227, 228

bell curve 111

Bernoulli distribution 113

bias 38, 39

binary variables 23

binomial distribution 114

Bottom Line Up Front (BLUF) 212

bubble charts 233

business services tools 196

C

cardinality 249

many-to-many cardinality 250

one-to-many cardinality 249

one-to-one cardinality 249

casewise deletion 58

categorical variables 23

binary variables 23

nominal variables 24

ordinal variables 24

categories

recoding, into numbers 80

chi-square 167, 168

assumptions 168, 169

code 169, 170

chi-square goodness of fit test 168

chi-square statistic 167

chi-square test for independence 168

C-level executives 209

Cognitive Biases

reference link 38

color theory 221

reference link 222

Comma Separated Values (CSV) 26

complex frequency tables 130-132

compliance reports 192

CompTIA 4

CompTIA Data+ 3

examinee 7

format 6

concatenation 75, 76

conditional operators 82, 83

confidence intervals 137

discovering 137-139

contingency table 130

continuous variables 23

examples 23

correlations 92, 170-172

assumptions 173, 174

calculating 170

code 174

negative correlation 171

positive correlation 171

cross-validation 256

currency data 22

D

dashboard report 191

data 22

merging 69, 70

shaping, with common functions 82

transposing 83

Data+: DAO-001 4

data accuracy 255

data acquisition 254

Data Analysis 5

data appending 76

data attribute limitations 256

data audits 257

database archetypes 10

data blending 75

data classifications 246

Payment Card Industry (PCI) 247

personal health information (PHI) 247

personally identifiable information (PII) 246, 247

data collection 34

observation 39, 40

surveys 35

web scraping 34

data completeness 255

data concatenation 75, 76

Data Concepts and Environments 5

data consistency 255

data constraints 249

Data Governance, Quality, and Control 6

data integrity 241

data lakes 18

data linkage 248

data manipulation 69, 254

data marts 18

data merging

join, using 71

key variables, using 70, 71

Data Mining 5

data parsing 78, 79

data pipeline 40

extraction 40

loading 40

transformation 40

data profiling 257

types 257

data quality dimensions 255

data accuracy 255

data attribute limitations 256

data completeness 255

data consistency 255

data quality rules and metrics 256

data refresh date 214

data schemas 14

snowflake schema 16, 17

star schema 14-16

data science 4, 5

data security 242

access requirements 242

requirements 243

data transformation 254

data types 21, 22

data type validation 63

data use agreement 242, 244

acceptable use policy 244

data breach 245

data deletion 245

data processing 244

data retention 246

data warehouses 18

data wrangling 69

dates

working with 82

defined rows and columns 10

degrees of freedom (dfs) 138

delta load 42

dependent t-test 162

dependent variables 24

derived variable

calculating 76-78

dimension reduction 92

dimensions 211

dimension tables 14

discrete variables 23

examples 23

distributions 109, 110

Bernoulli distribution 113

binomial distribution 114

exponential distribution 112

leptokurtic distribution 115

normal distribution 111

platykurtic distribution 115

Poisson distribution 112

skew 114

uniform distribution 111, 112

Document Object Model (DOM) 35

drop-down menus 36

dummy coding 81

duplicate data 52, 53

managing 51

dynamic reports 187, 188

cons 189

pros 188

versus static reports 189

E

encryption requirements

reference link 243

entity relationship requirements 248

cardinality 249

data constraints 249

handling 248

record link restrictions 248

exam domains 5

Data Analysis 5

Data Concepts and Environments 5

Data Governance, Quality, and Control 6

Data Mining 5

Visualization 6

execution plan 46, 47

exploratory data analysis (EDA) 90

example 93-99

exploring 90

types 90

exploratory data analysis (EDA), types

descriptive statistics 91

dimension reduction 92, 93

relationships 92

exponential distributions 112

Extensible Markup Language (XML) 27

Extract, Load, Transform (ELT) 41, 42

Extract, Transform, Load (ETL) 41

F

feature reduction 92

field definitions 212

file types 24

audio files 25

flat files 26

images 25

text 25

video files 26

website 26

filtering 43, 44

filtering, stages

query filtering 206

report filtering 206

final product 255

flat files

Comma Separated Values (CSV) 26

Tab Separated Values (TSV) 26

forecasting 101

frequencies 129, 130

complex frequency tables 130-132

frequency table 130

Frequently Asked Questions (FAQs) 214

full load 42

G

General Data Protection Regulation (GDPR)

reference link 245

general public 210

geographic maps 236, 237

graph databases 13

H

Health Insurance Portability and Accountability Act (HIPAA)

reference link 247

heat maps 234, 235

histograms 229

homoscedasticity 178

Hypertext Markup Language (HTML) 27

hypothesis testing 145, 146

need for 146

process 147, 148

questions, writing 155, 156

I

imputation 59

independent t-test 162

independent variables 24

indexing 44, 45

infographics 225, 226

example 226

reference link 226

inner join 71, 72

International Organization for Standardization (ISO) 22

interpolation 60

interquartile range 121

interquartile range (IQR) 65

invalid data 61, 62

J

JavaScript Object Notation (JSON) 27

joins

inner joins 71, 72

left joins 73, 74

outer joins 72, 73

right joins 74, 75

types 71

K

Kaggle 32

URL 32

key 13

key performance indicators (KPIs) 77, 100

Key Table 70

key-value pairs 10

K Nearest Neighbors (KNN) 60

kurtosis 116

L

left joins 74

leptokurtic distribution 115

likert scales 37, 38

line charts 230

link analysis 102

listwise deletion 58

M

machine data 12

machine learning (ML) algorithm 22, 60

management 210

map 233

master data management (MDM) 258

master data management (MDM), processes 259

consolidation 259

data dictionary 260

standardization 260

master data management (MDM), usage

streamlining data access 259

mean 116

example 116

using 118

measures 211

measures of central tendency 91, 116

measures of dispersion 91

measures of frequency 91

measures of position 91

median 117

example 117

using 119

merge 71

Missing at Random (MAR) 56

Missing Completely at Random (MCAR) 56

missing data

dealing with 55

dealing, with MNAR 60

deletion 58

imputation 59

interpolation 60

types 56

missing data, deletion

filtering 59

listwise deletion 58

pairwise deletion 58

variable deletion 59

missing data, imputation

hot deck imputation 59

mean 59

median 59

mode imputation 59

missing data, types

Missing at Random (MAR) 56

Missing Completely at Random (MCAR) 56

Missing Not at Random (MNAR) 57

Missing Not at Random (MNAR) 57

dealing with 60

mode 118

example 118

using 119

multicollinearity 53

multiple-choice answers 36

N

natural language processing (NLP) 35, 79

negative correlation 171

nominal variables 24

non-negative matrix factorization (NMF) 92

non-parametric data 63, 64

non-relational databases 12, 13

normal distribution 111

null hypothesis (H1) 149

versus alternative hypothesis (H0) 149, 150

numbers

recoding, into categories 80

numeric data 22

O

observation 39, 40

one-sample t-test 162

one-tailed tests 151, 152

online transactional processing (OLTP) 42, 43

open source datasets 32, 33

operating system (OS) 25

operational reports 193

ordinal variables 24

outer joins 73

outliers

searching 64, 65

P

paired t-test 162

pairwise deletion 58

parameterization 45

Pareto charts 231

Payment Card Industry (PCI) 247

Pearson’s chi-square 168

percentages 129, 132-134

percent change

calculating 134, 135

percent difference 136, 137

calculating 136, 137

performance analysis 99

key performance indicators (KPIs) 100

process analytics 101

project management 100

personal health information (PHI) 247

personally identifiable financial information (PIFI) 247

personally identifiable information (PII) 246, 247

pie charts 231, 232

platykurtic distribution 115

point-in-time 187

Poisson distribution 112

positive correlation 171

principal component analysis (PCA) 92

process analytics 101

programming language tools 195

project management 100

public databases 31, 32

public data sources

utilizing 31

p-value 150

Q

quality control 253

check 254

data quality dimensions 255

data quality rules and metrics 256

quality control techniques

data acquisition 254

data manipulation 254

data transformation 254

final product 255

quality validation 256

automated checks 258

cross-validation 256

data audits 257

data profiling 257

reasonable expectations 257

sample/spot checks 257

quartiles 119, 120

calculating 120, 121

query 43

query structure

optimizing 43

query tools 195

R

ranges 119

calculating 119, 120

real-time 187

reasonable expectations 257

record linkage 248

record link restrictions 248

recurring reports 192

compliance report 192

operational reports 193

risk and regulatory reports 193

reduction variables

calculating 78

redundant data 53-55

managing 51

regulations

levels 193

regulatory reports 192

relational database management system (RDBMS) 195

relational databases 12, 13

release approval 243

report delivery 214-216

report development process 201, 202

plan approval, obtaining 203

plan, creating 202, 203

report, creating 203

report, delivering 203, 204

report elements 212-214

report preparation

business requirements 204

considerations 204

dashboard-specific requirements 211

report preparation, business requirements

audience 209, 210

data content 204, 205

data range 207, 208

filtering 206

frequency 208

views 206, 207

report preparation, dashboard-specific requirements

data attributes 211

data sources 211

report run date 214

reports

designing 216

reports design

branding 216

color theory 221

fonts 217-219

key chart elements 220, 221

layouts 219, 220

research reports 190, 191

return on investment (ROI) 100

right joins 74, 75

risk and regulatory reports 193

role-based access 242

S

sample/spot checks 257

scatter plots 232

security requirements 243

data encryption 243

data transmission 243

de-identification/masking of data 243

self-service reports 191, 192

simple linear regression 174-177

assumptions 178, 179

code 179-181

single-choice answers 36

skew 114-116

snowflake schema 16, 17

cons 17

pros 16

sorting 44, 45

Spearman’s correlation 172

specification mismatch 62

spreadsheet tools 195

SQL database 13

stacked charts 228, 229

stakeholders 210

standard deviation 121-124

example 125

standard scores 139

star schema 14-16

cons 15

pros 15

static reports 187, 188

cons 188

pros 188

versus dynamic reports 189

stored data

number of recorded variables, modifying 20, 21

record, updating with up-to-date value 19, 20

updating 19

structured databases 10, 11

Structured Query Language (SQL) 12, 195

Student’s t-test 162

study 39

subqueries 46

subsets 43, 44

survey answers, types

drop-down menus 36

likert scales 37, 38

multiple-choice answers 36

single-choice answers 36

text-based answers 35

surveys 35

bias 38, 39

survey answers, types 35

synchronous API 33

system functions 83

T

Tab Separated Values (TSV) 26

tactical reports 190

tags 77

technical experts 210

temporary tables 45, 46

text-based answers 35

time series analysis 101

tokenization 79

tools

learning, to use 197

tree maps 235

trend analysis 101

t-test 162

assumptions 163

best practice 163

code 163-167

Twitter API

reference link 34

two-tailed tests 151, 152

type I error 153

interacting, with alpha 154, 155

type II error 153

interacting, with alpha 154, 155

U

undefined field 11

uniform distribution 111, 112

unpaired t-test 162

unstructured databases 11

machine data 12

undefined field 11

use requirements 244

user group-based access 242

V

variable deletion 59

variables

recoding 79

variable types 22

categorical variables 23

continuous variables 23

dependent variables 24

discrete variables 23

independent variables 24

variance 121, 122

example 122, 123

video files

format 26

views 206

READ-ONLY 206

virtual machine (VM) 40

visualization 6

visualization tools 196

W

waterfall charts 229

web scraping 34

web services 33, 34

website files 26

HTML 27

JSON file 27

XML 27

Windows Media Audio (WMA) 25

word clouds 225, 227

reference link 227

World Health Organization (WHO) 32

Z

z-scores 139-141

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset