Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Previous Chapter

inside back cover

index

A

ABAC (attribute-based access control) 229, 279

access control

implementing fine-grained 229

supporting attribute-level 228

accuracy 218 – 219

additive homomorphic encryption 255

adult dataset 66, 69

age attribute 213

agglomerative hierarchical clustering 168

AI (artificial intelligence) 3 – 5

AMI (adjusted mutual information) score 75

anonymization 151 – 155

beyond k-anonymity 154 – 155

private information sharing vs. privacy concerns 151 – 152

using k-anonymity against re-identification attacks 152 – 154

ARI (adjusted rand index) 75

association rule hiding 20, 217 – 218

attacks 8 – 16

challenges of privacy protection in big data analytics 15 – 16

correlation attacks 16

identification attacks 16

de-anonymization or re-identification attacks 15

membership inference attacks 13 – 15

model inversion attacks 12 – 13

on database systems 222 – 225

targeting data confidentiality 223 – 224

targeting data privacy 224 – 225

problem of private data in 9

reconstruction attacks 9 – 12

attacker's perspective of 10 – 11

real-world scenario involving 11 – 12

attribute-based access control (ABAC) 229, 279

attribute-level access control 228

Australian dataset 172

AVERAGE query 286

B

big data analytics 15 – 16

correlation attacks 16

identification attacks 16

binary mechanism (randomized response) 35 – 37

binary response 35

binning method 138

breast cancer dataset 172

budget, privacy 31 – 32

C

CA (continuous authentication) applications 256

categorical values 191

cloud-based storage 280 – 282

CNAE dataset 86, 88

complex statistics 25

composition properties 51 – 55, 296 – 298

parallel composition 54 – 55, 297

sequential composition 51 – 52, 296 – 297

compressive privacy 21 – 22, 237

concrete attacks 224

confidence parameter 217

confidentiality of data 223 – 224

continuous values 191 – 193

continuous variables 132

correlation attacks 16, 225

counting queries

Laplace mechanism 38 – 39

sequential composition of 52 – 54

CP (compressive privacy) 233 – 267, 269

mechanisms 237 – 239

other dimensionality reduction methods 238 – 239

overview 235 – 236

privacy-preserving PCA and DCA on horizontally partitioned data 251 – 266

achieving privacy preservation on horizontally partitioned data 253 – 254

evaluating efficiency and accuracy of 263 – 266

how privacy-preserving computation works 258 – 263

overview of proposed approach 256 – 257

recapping dimensionality reduction approaches 254 – 255

using additive homomorphic encryption 255

using for ML applications 239 – 251

accuracy of utility task 246 – 248

effect of p' in DCA 249 – 251

implementing compressive privacy 240 – 246

create operation 278

CRUD (create, read, update, and delete) operations 286

cryptographic-based approaches 284

CSP (crypto service provider) 253, 255, 257

CSP (crypto service provider)-based trust model 276 – 277

D

DAC (discretionary access control) 279

data

horizontally partitioned data

privacy-preserving PCA and DCA on 251 – 266

how data is processed inside ML algorithms 6

problem of private data in clear 9

publishing data 186 – 200

implementing data sanitization operations in Python 189 – 193

k-anonymity 193 – 198

storage with NoSQL database 280 – 282

synthetic data

evaluating performance of 171 – 176

generating 169 – 170

use of private data 6

data generator 169

data management 202 – 232

database systems 220 – 231

attacks on 222 – 225

considerations for designing customizable privacy-preserving database system 228 – 231

how likely to leak private information 222

SDB systems 225 – 227

threats and vulnerabilities 221 – 222

modifying data mining output 216 – 220

association rule hiding 217 – 218

inference control in statistical databases 219 – 220

reducing accuracy of data mining operation 218 – 219

privacy protection beyond k-anonymity 204 – 215

implementing privacy models with Python 211 – 215

l-diversity 205 – 207

t-closeness 208 – 211

privacy protection in data processing and mining 203 – 204

data mining 19 – 21, 179 – 201

different approaches to data publishing 20

how to protect privacy on data mining algorithms 20 – 21

importance of privacy preservation in 180 – 182

modifying input 185 – 186

modifying output 216 – 220

association rule hiding 217 – 218

inference control in statistical databases 219 – 220

reducing accuracy of data mining operation 218 – 219

on privacy-preserving data collection 20

privacy protection in 183 – 184, 203 – 204

impact of privacy regulatory requirements 184

what is data mining and how can it help 183

publishing data 186 – 200

implementing data sanitization operations in Python 189 – 193

k-anonymity 193 – 198

data perturbation approach 226

data preprocessing 169

data privacy 224 – 225

data processing and mining 204

database systems 220 – 231

attacks on 222 – 225

targeting data confidentiality 223 – 224

targeting data privacy 224 – 225

considerations for designing customizable privacy-preserving database system 228 – 231

implementing fine-grained access control to data 229

keeping rich set of privacy-related metadata 228

maintaining privacy-preserving information flow 231

protecting against insider attacks 231

supporting attribute-level access control mechanisms 228

how likely to leak private information 222

SDB systems 225 – 227

threats and vulnerabilities 221 – 222

data protection schemes currently employed by industry 221

privacy assurance as challenge 221 – 222

DataHub 268 – 289

integrating privacy and security technologies into 280 – 288

data storage with cloud-based secure NoSQL database 280 – 282

PPML 284 – 286

privacy-preserving data collection with LDP 282 – 284

privacy-preserving query processing 286 – 287

using synthetic data generation 287 – 288

research collaboration workspace 272 – 280

architectural design 275 – 276

blending different trust models 276 – 278

configuring access control mechanisms 278 – 280

significance of research data protection and sharing platform 270 – 272

important features 271 – 272

motivation behind DataHub 270 – 271

DCA (discriminant component analysis) 143

effect of p' in 249 – 251

on horizontally partitioned data 251 – 266

achieving privacy preservation on horizontally partitioned data 253 – 254

evaluating efficiency and accuracy of 263 – 266

how privacy-preserving computation works 258 – 263

overview of proposed approach 256 – 257

recapping dimensionality reduction approaches 254 – 255

using additive homomorphic encryption 255

DE (direct encoding) 104, 140

de-anonymization 15

delete operation 278

diabetes dataset 172

differential privacy. See DP

differentially private distributed PCA (DPDPCA) protocol 85

diffprivlib IBM's Differential Privacy Library 41, 67

dimensionality reduction (DR) 238 – 239, 254 – 255

direct encoding (DE) 104 – 110, 140

discrete variables 132

discretionary access control (DAC) 279

discretionary models 279

discriminant component analysis. See DCA

displayImage routine 241

DLPA (distributed Laplace perturbation algorithm) 287

downgrading classifier effectiveness 20

DP (differential privacy) 15, 17 – 18, 25 – 55, 234 – 235, 268, 285, 291 – 298

composition properties of 296 – 298

parallel composition DP 297

sequential composition DP 296 – 297

concept of 27 – 29

for synthetic data generation 155 – 167

DP synthetic histogram representation generation 156 – 159

DP synthetic multi-marginal data generation 162 – 167

DP synthetic tabular data generation 160 – 162

formal definition of 291 – 292

how it works 30 – 35

formulating solution for private company scenario 32 – 35

privacy budget 31 – 32

sensitivity of 30 – 31

mechanisms 35 – 48, 292 – 295

binary mechanism (randomized response) 35 – 37

exponential mechanism 43 – 48

Gaussian mechanism 293 – 294

geometric mechanism 293

Laplace mechanism 38 – 43

staircase mechanism 294

vector mechanism 295

Wishart mechanism 295

properties of 48 – 55

composition properties 51 – 55

group privacy property 50 – 51

postprocessing property 48 – 49

DP sanitizer 169

DPDPCA (differentially private distributed PCA) protocol 85, 285

DR (dimensionality reduction) 238 – 239, 254 – 255

E

EDBs (encrypted databases) 224

education-num attribute 213

EFB (Equal Frequency Binning) 138

EMD (earth mover distance) 209

empirical risk 61

epsilon (^ϵ) 32

European Union’s GDPR (General Data Protection Regulation) 8

EVD (eigenvalue decomposition) 237, 259

EWB (Equal-Width Binning) 138

EWD (Equal-Width Discretization) 137

exponential mechanism 43 – 48

F

feature-level clustering 170

feature-level micro-aggregation case study 168 – 176

evaluating performance of generated synthetic data 171 – 176

datasets used for experiments 172

performance evaluation and results 172 – 176

generating synthetic data 169 – 170

how data preprocessing works 169 – 170

using hierarchical clustering and micro-aggregation 168 – 169

FERPA (Family Educational Rights and Privacy Act) 8

fine-grained access control 229

fit command 242

full system compromise 224

G

Gaussian mechanism 293 – 294

GDPR (General Data Protection Regulation) 8

generalization technique 152

geometric mechanism 293

GEVD (generalized eigenvalue decomposition) 262

GISETTE dataset 86, 88

Glasses dataset 240

grayscale representation scheme 10

group privacy property 50 – 51

H

hierarchical clustering 168 – 169

HIPAA (Health Insurance Portability and Accountability Act of 1996) 8

histograms

DP synthetic histogram representation generation 156 – 159

encoding 110 – 117

SHE 113 – 114

THE 114 – 117

queries 40 – 41

HMAC (hash value) 281

horizontally partitioned data

privacy-preserving PCA and DCA on 251 – 266

achieving privacy preservation on horizontally partitioned data 253 – 254

evaluating efficiency and accuracy of 263 – 266

how privacy-preserving computation works 258 – 263

overview of proposed approach 256 – 257

recapping dimensionality reduction approaches 254 – 255

using additive homomorphic encryption 255

I

identification attacks 16, 225

inference control 219 – 220

inferring membership 14 – 15

information

maintaining privacy-preserving information flow 231

private information sharing vs. privacy concerns 151 – 152

injection attacks 223

input modification 185 – 186

insider attacks 231

ISOLET dataset 86

K

k-anonymity 18 – 19, 193 – 198, 204 – 215, 287

anonymization beyond 154 – 155

does not always work 195 – 198

implementing in Python 198 – 200

implementing privacy models with Python 211 – 215

l-diversity 205 – 207

t-closeness 208 – 211

using against re-identification attacks 152 – 154

what is k and how to apply 194 – 195

k-means++ itemization 76

KMeans function 76

Kogan, Aleksandr 4

L

l-diversity 18, 205 – 207, 287

Laplace mechanism 38 – 43

counting queries 38 – 39

histogram queries 40 – 41

LDA (linear discriminant analysis) 59, 255

LDP (local differential privacy) 18, 95 – 122

concept of 97 – 101

in detail 98 – 100

scenario with survey 100 – 101

mechanisms 104 – 121

direct encoding 104 – 110

histogram encoding 110 – 117

unary encoding 117 – 121

privacy-preserving data collection with 282 – 284

randomized response for 101 – 104

leaking information

database systems 222

l-diversity 206 – 207

LIBSVM dataset repository 86

linear discriminant analysis (LDA) 59, 255

load_digits dataset 75

M

MAC (mandatory access control) 279

machine learning. See ML

MCDO (multiple-class data owners) 260 – 261

MDA (multiple discriminant analysis) 59

MDAV (maximum distance to average record) 169

MDR (multiclass discriminant ratio) 239, 255

mean squared error (MSE) 172

membership inference attacks 13 – 15

metadata, privacy-related 228

MGGM (multivariate Gaussian generative model) 287

micro-aggregation 168 – 169

minutiae representation scheme 10

ML (machine learning) 3 – 25, 124, 146, 269

compressive privacy 21 – 22

privacy-preserving data mining techniques 19 – 21

privacy-preserving synthetic data generation 18 – 19

privacy complications in AI era 4 – 5

threat of learning beyond intended purpose 5 – 8

how data is processed inside ML algorithms 6

importance of privacy protection in ML 7

regulatory requirements and utility vs.privacy tradeoff 7 – 8

use of private data 6

threats and attacks for 8 – 16

challenges of privacy protection in big data analytics 15 – 16

de-anonymization or re-identification attacks 15

membership inference attacks 13 – 15

model inversion attacks 12 – 13

problem of private data in clear 9

reconstruction attacks 9 – 12

using CP for 239 – 251

accuracy of utility task 246 – 248

effect of p' in DCA 249 – 251

implementing 240 – 246

MLaaS (Machine Learning as a Service) 4

model inversion attacks 12 – 13

models.GaussianNB module 67

MSE (mean squared error) 172

multi-marginal data generation, DP synthetic 162 – 167

multiple-class data owners (MCDO) 260 – 261

multivariate Gaussian generative model (MGGM) 287

mydca object 242

mypca object 242

N

noise addition techniques 226

noise information 238

noise matrix SW 260

non-CSP-based trust model 277 – 278

NoSQL database 280 – 282

ntests variable 246

O

OLAP (online analytical processing) 225

Olivetti faces dataset 240

ORDER BY query 286

OSDC (Open Science Data Cloud) 272

OT (oblivious transfer) techniques 254

OUE (optimal unary encoding) 117, 140

output perturbation 226

P

parallel composition 54 – 55, 297

PCA (principal component analysis) 143, 237, 284

on horizontally partitioned data 251 – 266

achieving privacy preservation on horizontally partitioned data 253 – 254

evaluating efficiency and accuracy of 263 – 266

how privacy-preserving computation works 258 – 263

overview of proposed approach 256 – 257

recapping dimensionality reduction approaches 254 – 255

using additive homomorphic encryption 255

PCI DSS (Payment Card Industry Data Security Standard) 8

perturbation-based approaches 285 – 286

phase representation scheme 10

pip command 211

platform sharing 270 – 272

important features 271 – 272

motivation behind DataHub 270 – 271

plausible deniability 19

postprocessing property 48 – 49

PPDM (privacy-preserving data mining) 19

PPE (property-preserving encryption) 224

PPML (privacy-preserving machine learning) 16 – 22, 284 – 286

compressive privacy 21 – 22

cryptographic-based approaches 284

perturbation-based approaches 285 – 286

privacy-preserving data mining techniques 19 – 21

different approaches to data publishing 20

how to protect privacy on data mining algorithms 20 – 21

techniques on privacy-preserving data collection 20

privacy-preserving synthetic data generation 18 – 19

preprocessing data 169 – 170

principal component analysis. See PCA

principal components 237

privacy 3 – 24

compressive privacy 21 – 22

privacy-preserving data mining techniques 19 – 21

privacy-preserving synthetic data generation 18 – 19

privacy complications in AI era 4 – 5

threat of learning beyond intended purpose 5 – 8

how data is processed inside ML algorithms 6

importance of privacy protection in ML 7

regulatory requirements and utility vs.privacy tradeoff 7 – 8

use of private data 6

threats and attacks for ML systems 8 – 16

challenges of privacy protection in big data analytics 15 – 16

de-anonymization or re-identification attacks 15

membership inference attacks 13 – 15

model inversion attacks 12 – 13

problem of private data in clear 9

reconstruction attacks 9 – 12

privacy budget 30, 32

processing, data 203 – 204

proximity matrix 168

publishing data 186 – 200

implementing data sanitization operations in Python 189 – 193

working with categorical values 191

working with continuous values 191 – 193

k-anonymity 193 – 198

implementing in Python 198 – 200

k-anonymity does not always work 195 – 198

what is k and how to apply 194 – 195

Python

data sanitization operations in 189 – 193

implementing privacy models with 211 – 215

k-anonymity implementation in 198 – 200

Q

queries

counting

Laplace mechanism 38 – 39

sequential composition of 52 – 54

privacy-preserving query processing 286 – 287

query (or data) restriction technique 227

query auditing and restriction 21

R

random initializations 76

randomized response (RR) 18, 101 – 104

randrange function 241

RAPPOR (Randomized Aggregatable Privacy-Preserving Ordinal Response) 5, 97

RBAC (role-based access control) 228, 279 – 280

re-identification attacks 152

using k-anonymity against 152 – 154

read operation 278

reconstruction (leakage-abuse) attacks 224

reconstruction attacks 9 – 12

attacker's perspective of 10 – 11

real-world scenario involving 11 – 12

regulatory requirements 7 – 8, 184

research collaboration workspace 272 – 280

architectural design 275 – 276

blending different trust models 276 – 278

CSP-based trust model 276 – 277

non-CSP-based trust model 277 – 278

configuring access control mechanisms 278 – 280

research data protection 270 – 272

important features 271 – 272

motivation behind DataHub 270 – 271

rho parameter 242

rho_p parameter 242, 249

role-based access control (RBAC) 228, 280

RR (randomized response) 18

run operation 278

S

sample space 62

sample-level clustering 170

sanitization operations 189 – 193

working with categorical values 191

working with continuous values 191 – 193

SCDO (single-class data owner) 260 – 261

scikit-learn load_digits dataset 75

score function 44

SDB (statistical database) systems

inference control in 219 – 220

privacy preserving techniques in 225 – 227

sensitivity, DP 30 – 31

sequential composition 51 – 52, 296 – 297

SHE (summation with histogram encoding) 113 – 114, 140

signal information 238

signal matrix SB 260

skeleton representation scheme 10

skewness attack 207

snapshot leaks 224

spectral decomposition of the center-adjusted scatter matrix 237

staircase mechanism 294

statistic extraction 169

SUE (symmetric unary encoding) 117, 121, 140

SUM query 219, 286

support parameter 217

suppression technique 152 – 153

SVMs (support vector machines) 62, 172

synthetic data generation 18 – 19, 146 – 176

application aspects of using for privacy preservation 149 – 150

assuring privacy via data anonymization 151 – 155

anonymization beyond k-anonymity 154 – 155

private information sharing vs. privacy concerns 151 – 152

using k-anonymity against re-identification attacks 152 – 154

DP for 155 – 167

DP synthetic histogram representation generation 156 – 159

DP synthetic multi-marginal data generation 162 – 167

DP synthetic tabular data generation 160 – 162

importance of 148 – 149

in DataHub platform 287 – 288

private synthetic data release via feature-level micro-aggregation case study 168 – 176

evaluating performance of generated synthetic data 171 – 176

generating synthetic data 169 – 170

using hierarchical clustering and micro-aggregation 168 – 169

process of 150 – 151

synthetic multi-marginal data 163

T

t-closeness 18, 208 – 211, 287

tabular data generation, DP synthetic 160 – 162

TDE (Transparent Data Encryption) 221

THE (thresholding with histogram encoding) 114 – 117, 140

“This Is Your Digital Life” quiz (Kogan) 4

threats, database systems 221 – 222

TLS (Transport Layer Security) 221

trust models 276 – 278

CSP-based trust model 276 – 277

non-CSP-based trust model 277 – 278

U

UCI ML repository 86, 141

unary encoding 117 – 121

update operation 278

utility function 44

utility task feature space 236

V

vector mechanism 295

view 229 – 230

VM image leakage attacks 224

VMs (virtual machines) 222, 224, 271

VoIP (voice over IP) 221

VPNs (virtual private networks) 221

vulnerabilities, database systems 221 – 222

W

W3C’s (World Wide Web Consortium’s) P3P (Platform for Privacy Preferences Project) 228

Wishart mechanism 295

Y

Yale Face Database 240

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.