ABAC (attribute-based access control) 229, 279
supporting attribute-level 228
additive homomorphic encryption 255
agglomerative hierarchical clustering 168
AI (artificial intelligence) 3 – 5
AMI (adjusted mutual information) score 75
private information sharing vs. privacy concerns 151 – 152
using k-anonymity against re-identification attacks 152 – 154
association rule hiding 20, 217 – 218
challenges of privacy protection in big data analytics 15 – 16
de-anonymization or re-identification attacks 15
membership inference attacks 13 – 15
model inversion attacks 12 – 13
targeting data confidentiality 223 – 224
targeting data privacy 224 – 225
attacker's perspective of 10 – 11
real-world scenario involving 11 – 12
attribute-based access control (ABAC) 229, 279
attribute-level access control 228
binary mechanism (randomized response) 35 – 37
CA (continuous authentication) applications 256
composition properties 51 – 55, 296 – 298
parallel composition 54 – 55, 297
sequential composition 51 – 52, 296 – 297
compressive privacy 21 – 22, 237
confidentiality of data 223 – 224
sequential composition of 52 – 54
CP (compressive privacy) 233 – 267, 269
other dimensionality reduction methods 238 – 239
privacy-preserving PCA and DCA on horizontally partitioned data 251 – 266
achieving privacy preservation on horizontally partitioned data 253 – 254
evaluating efficiency and accuracy of 263 – 266
how privacy-preserving computation works 258 – 263
overview of proposed approach 256 – 257
recapping dimensionality reduction approaches 254 – 255
using additive homomorphic encryption 255
using for ML applications 239 – 251
accuracy of utility task 246 – 248
implementing compressive privacy 240 – 246
CRUD (create, read, update, and delete) operations 286
cryptographic-based approaches 284
CSP (crypto service provider) 253, 255, 257
CSP (crypto service provider)-based trust model 276 – 277
DAC (discretionary access control) 279
privacy-preserving PCA and DCA on 251 – 266
how data is processed inside ML algorithms 6
problem of private data in clear 9
implementing data sanitization operations in Python 189 – 193
storage with NoSQL database 280 – 282
evaluating performance of 171 – 176
considerations for designing customizable privacy-preserving database system 228 – 231
how likely to leak private information 222
threats and vulnerabilities 221 – 222
modifying data mining output 216 – 220
association rule hiding 217 – 218
inference control in statistical databases 219 – 220
reducing accuracy of data mining operation 218 – 219
privacy protection beyond k-anonymity 204 – 215
implementing privacy models with Python 211 – 215
privacy protection in data processing and mining 203 – 204
data mining 19 – 21, 179 – 201
different approaches to data publishing 20
how to protect privacy on data mining algorithms 20 – 21
importance of privacy preservation in 180 – 182
association rule hiding 217 – 218
inference control in statistical databases 219 – 220
reducing accuracy of data mining operation 218 – 219
on privacy-preserving data collection 20
privacy protection in 183 – 184, 203 – 204
impact of privacy regulatory requirements 184
what is data mining and how can it help 183
implementing data sanitization operations in Python 189 – 193
data perturbation approach 226
data processing and mining 204
targeting data confidentiality 223 – 224
targeting data privacy 224 – 225
considerations for designing customizable privacy-preserving database system 228 – 231
implementing fine-grained access control to data 229
keeping rich set of privacy-related metadata 228
maintaining privacy-preserving information flow 231
protecting against insider attacks 231
supporting attribute-level access control mechanisms 228
how likely to leak private information 222
threats and vulnerabilities 221 – 222
data protection schemes currently employed by industry 221
privacy assurance as challenge 221 – 222
integrating privacy and security technologies into 280 – 288
data storage with cloud-based secure NoSQL database 280 – 282
privacy-preserving data collection with LDP 282 – 284
privacy-preserving query processing 286 – 287
using synthetic data generation 287 – 288
research collaboration workspace 272 – 280
architectural design 275 – 276
blending different trust models 276 – 278
configuring access control mechanisms 278 – 280
significance of research data protection and sharing platform 270 – 272
motivation behind DataHub 270 – 271
DCA (discriminant component analysis) 143
on horizontally partitioned data 251 – 266
achieving privacy preservation on horizontally partitioned data 253 – 254
evaluating efficiency and accuracy of 263 – 266
how privacy-preserving computation works 258 – 263
overview of proposed approach 256 – 257
recapping dimensionality reduction approaches 254 – 255
using additive homomorphic encryption 255
differential privacy. See DP
differentially private distributed PCA (DPDPCA) protocol 85
diffprivlib IBM's Differential Privacy Library 41, 67
dimensionality reduction (DR) 238 – 239, 254 – 255
direct encoding (DE) 104 – 110, 140
discretionary access control (DAC) 279
discriminant component analysis. See DCA
DLPA (distributed Laplace perturbation algorithm) 287
downgrading classifier effectiveness 20
DP (differential privacy) 15, 17 – 18, 25 – 55, 234 – 235, 268, 285, 291 – 298
composition properties of 296 – 298
sequential composition DP 296 – 297
for synthetic data generation 155 – 167
DP synthetic histogram representation generation 156 – 159
DP synthetic multi-marginal data generation 162 – 167
DP synthetic tabular data generation 160 – 162
formal definition of 291 – 292
formulating solution for private company scenario 32 – 35
mechanisms 35 – 48, 292 – 295
binary mechanism (randomized response) 35 – 37
composition properties 51 – 55
group privacy property 50 – 51
postprocessing property 48 – 49
DPDPCA (differentially private distributed PCA) protocol 85, 285
DR (dimensionality reduction) 238 – 239, 254 – 255
EDBs (encrypted databases) 224
EFB (Equal Frequency Binning) 138
EMD (earth mover distance) 209
European Union’s GDPR (General Data Protection Regulation) 8
EVD (eigenvalue decomposition) 237, 259
EWD (Equal-Width Discretization) 137
feature-level micro-aggregation case study 168 – 176
evaluating performance of generated synthetic data 171 – 176
datasets used for experiments 172
performance evaluation and results 172 – 176
generating synthetic data 169 – 170
how data preprocessing works 169 – 170
using hierarchical clustering and micro-aggregation 168 – 169
FERPA (Family Educational Rights and Privacy Act) 8
fine-grained access control 229
GDPR (General Data Protection Regulation) 8
GEVD (generalized eigenvalue decomposition) 262
grayscale representation scheme 10
group privacy property 50 – 51
hierarchical clustering 168 – 169
HIPAA (Health Insurance Portability and Accountability Act of 1996) 8
DP synthetic histogram representation generation 156 – 159
queries 40 – 41
privacy-preserving PCA and DCA on 251 – 266
achieving privacy preservation on horizontally partitioned data 253 – 254
evaluating efficiency and accuracy of 263 – 266
how privacy-preserving computation works 258 – 263
overview of proposed approach 256 – 257
recapping dimensionality reduction approaches 254 – 255
using additive homomorphic encryption 255
identification attacks 16, 225
maintaining privacy-preserving information flow 231
private information sharing vs. privacy concerns 151 – 152
k-anonymity 18 – 19, 193 – 198, 204 – 215, 287
anonymization beyond 154 – 155
does not always work 195 – 198
implementing in Python 198 – 200
implementing privacy models with Python 211 – 215
using against re-identification attacks 152 – 154
what is k and how to apply 194 – 195
l-diversity 18, 205 – 207, 287
LDA (linear discriminant analysis) 59, 255
LDP (local differential privacy) 18, 95 – 122
scenario with survey 100 – 101
privacy-preserving data collection with 282 – 284
randomized response for 101 – 104
linear discriminant analysis (LDA) 59, 255
MAC (mandatory access control) 279
machine learning. See ML
MCDO (multiple-class data owners) 260 – 261
MDA (multiple discriminant analysis) 59
MDAV (maximum distance to average record) 169
MDR (multiclass discriminant ratio) 239, 255
membership inference attacks 13 – 15
MGGM (multivariate Gaussian generative model) 287
minutiae representation scheme 10
ML (machine learning) 3 – 25, 124, 146, 269
privacy-preserving data mining techniques 19 – 21
privacy-preserving synthetic data generation 18 – 19
privacy complications in AI era 4 – 5
threat of learning beyond intended purpose 5 – 8
how data is processed inside ML algorithms 6
importance of privacy protection in ML 7
regulatory requirements and utility vs.privacy tradeoff 7 – 8
threats and attacks for 8 – 16
challenges of privacy protection in big data analytics 15 – 16
de-anonymization or re-identification attacks 15
membership inference attacks 13 – 15
model inversion attacks 12 – 13
problem of private data in clear 9
accuracy of utility task 246 – 248
MLaaS (Machine Learning as a Service) 4
model inversion attacks 12 – 13
multi-marginal data generation, DP synthetic 162 – 167
multiple-class data owners (MCDO) 260 – 261
multivariate Gaussian generative model (MGGM) 287
non-CSP-based trust model 277 – 278
OLAP (online analytical processing) 225
OSDC (Open Science Data Cloud) 272
OT (oblivious transfer) techniques 254
OUE (optimal unary encoding) 117, 140
parallel composition 54 – 55, 297
PCA (principal component analysis) 143, 237, 284
on horizontally partitioned data 251 – 266
achieving privacy preservation on horizontally partitioned data 253 – 254
evaluating efficiency and accuracy of 263 – 266
how privacy-preserving computation works 258 – 263
overview of proposed approach 256 – 257
recapping dimensionality reduction approaches 254 – 255
using additive homomorphic encryption 255
PCI DSS (Payment Card Industry Data Security Standard) 8
perturbation-based approaches 285 – 286
phase representation scheme 10
motivation behind DataHub 270 – 271
postprocessing property 48 – 49
PPDM (privacy-preserving data mining) 19
PPE (property-preserving encryption) 224
PPML (privacy-preserving machine learning) 16 – 22, 284 – 286
cryptographic-based approaches 284
perturbation-based approaches 285 – 286
privacy-preserving data mining techniques 19 – 21
different approaches to data publishing 20
how to protect privacy on data mining algorithms 20 – 21
techniques on privacy-preserving data collection 20
privacy-preserving synthetic data generation 18 – 19
principal component analysis. See PCA
privacy-preserving data mining techniques 19 – 21
privacy-preserving synthetic data generation 18 – 19
privacy complications in AI era 4 – 5
threat of learning beyond intended purpose 5 – 8
how data is processed inside ML algorithms 6
importance of privacy protection in ML 7
regulatory requirements and utility vs.privacy tradeoff 7 – 8
threats and attacks for ML systems 8 – 16
challenges of privacy protection in big data analytics 15 – 16
de-anonymization or re-identification attacks 15
membership inference attacks 13 – 15
model inversion attacks 12 – 13
problem of private data in clear 9
implementing data sanitization operations in Python 189 – 193
working with categorical values 191
working with continuous values 191 – 193
implementing in Python 198 – 200
k-anonymity does not always work 195 – 198
what is k and how to apply 194 – 195
data sanitization operations in 189 – 193
implementing privacy models with 211 – 215
k-anonymity implementation in 198 – 200
sequential composition of 52 – 54
privacy-preserving query processing 286 – 287
query (or data) restriction technique 227
query auditing and restriction 21
randomized response (RR) 18, 101 – 104
RAPPOR (Randomized Aggregatable Privacy-Preserving Ordinal Response) 5, 97
RBAC (role-based access control) 228, 279 – 280
using k-anonymity against 152 – 154
reconstruction (leakage-abuse) attacks 224
attacker's perspective of 10 – 11
real-world scenario involving 11 – 12
regulatory requirements 7 – 8, 184
research collaboration workspace 272 – 280
architectural design 275 – 276
blending different trust models 276 – 278
CSP-based trust model 276 – 277
non-CSP-based trust model 277 – 278
configuring access control mechanisms 278 – 280
research data protection 270 – 272
motivation behind DataHub 270 – 271
role-based access control (RBAC) 228, 280
sanitization operations 189 – 193
working with categorical values 191
working with continuous values 191 – 193
SCDO (single-class data owner) 260 – 261
scikit-learn load_digits dataset 75
SDB (statistical database) systems
inference control in 219 – 220
privacy preserving techniques in 225 – 227
sequential composition 51 – 52, 296 – 297
SHE (summation with histogram encoding) 113 – 114, 140
skeleton representation scheme 10
spectral decomposition of the center-adjusted scatter matrix 237
SUE (symmetric unary encoding) 117, 121, 140
suppression technique 152 – 153
SVMs (support vector machines) 62, 172
synthetic data generation 18 – 19, 146 – 176
application aspects of using for privacy preservation 149 – 150
assuring privacy via data anonymization 151 – 155
anonymization beyond k-anonymity 154 – 155
private information sharing vs. privacy concerns 151 – 152
using k-anonymity against re-identification attacks 152 – 154
DP synthetic histogram representation generation 156 – 159
DP synthetic multi-marginal data generation 162 – 167
DP synthetic tabular data generation 160 – 162
private synthetic data release via feature-level micro-aggregation case study 168 – 176
evaluating performance of generated synthetic data 171 – 176
generating synthetic data 169 – 170
using hierarchical clustering and micro-aggregation 168 – 169
synthetic multi-marginal data 163
t-closeness 18, 208 – 211, 287
tabular data generation, DP synthetic 160 – 162
TDE (Transparent Data Encryption) 221
THE (thresholding with histogram encoding) 114 – 117, 140
“This Is Your Digital Life” quiz (Kogan) 4
threats, database systems 221 – 222
TLS (Transport Layer Security) 221
CSP-based trust model 276 – 277
non-CSP-based trust model 277 – 278
utility task feature space 236
VMs (virtual machines) 222, 224, 271
VPNs (virtual private networks) 221
vulnerabilities, database systems 221 – 222
W3C’s (World Wide Web Consortium’s) P3P (Platform for Privacy Preferences Project) 228