Note: Page numbers followed by “b”, “f” and “t” indicate boxes, figures and tables respectively.
Alignment Comparator for Multi-valued Attributes (ACMA),
215–217, 216fAmbiguous representation,
24, 24fAmerican National Standards Institute (ANSI),
191Application programming interface (API),
93Approximate string match (ASM),
47Jaro String Comparator,
212Levenshtein edit comparator,
210–211confirmation assertions,
74–77correction assertions,
71–74initial login screen,
79fAttribute-based resolution
identity capture and update for,
188–190iterative update process for ER system,
189fAttribute-level matching,
46ER and MDM comparators,
47variation in string values,
47Automated update process,
66, 67fclerical review indicators,
67entity resolution and record linking,
67–68ER outcome analysis and root cause analysis,
68quality assurance validation processes,
68cluster-level review indicators,
69–70new entity references,
66pair-level review indicators,
69managed entity identifiers,
91–92unmanaged entity identifiers,
91–92Best record version,
55, 55fvalue-added proposition,
14problem of similarity functions,
152–153
Bring-Your-Own-Identifier (BYOI),
53–54“Brute force” method,
126Capture, Store, Resolution, Update, Dispose model (CSRUD model),
28, 161attribute-based resolution,
188–190large-scale ER
with single match key blocking,
161–163persistent entity identifiers,
181–182capture based on match keys transitive closure,
183ftransitive closure of references,
183single index generator,
162ftransitive closure problem,
163–165update problem identification,
180–181attribute
building foundation,
32–33data matching strategies,
46–50ER results assessment,
37–46identity attributes selection,
34–37equivalent references,
41fundamental law of ER,
41references with sets of links,
40ttrue and false positives and negatives,
41Cluster Comparison method,
45–46True link and ER link,
44, 45tCapture process implementation,
50Chief data officer (CDO),
, 116Chief information officer (CIO),
116Clerical review indicators,
67entity resolution and record linking,
67–68ER
outcome analysis and root cause analysis,
68quality assurance validation processes,
68Closed universe models,
99–100Cluster Comparison method,
45–46Cluster-level matching,
50Cluster-level review indicators,
69–70Cluster-to-cluster classification,
122, 126record-based projection,
123reference-to-cluster
unique reference assumption,
125–126
Common Object Request Broker Architecture (CORBA),
94Compressed Document Set Architecture (CoDoSA),
163depth and degree of match,
97–99Confirmation assertions,
74reference-to-reference assertion,
76, 77freference-to-structure assertion,
77, 77fConformance to data specifications,
199–200message and supporting references,
201message referencing data specification,
201fmultiple-record schema,
200fsingle-record message structure,
200fCorrection assertions,
71reference-transfer assertion,
74, 74fstructure-split assertion,
72, 73fsynchronization of identifiers,
73structure-to-structure assertion,
71, 72fset of assertion transactions,
72Critical data elements (CDEs),
34automated update configuration,
180–181update problem identification,
180–181Customer data integration (CDI),
, 55Customer relationship management (CRM),
6–7, 55Customer satisfaction,
6–8Data
Data governance program (DG program),
9–10data stewardship model,
10Data matching strategies,
46attribute-level matching,
46ER and MDM comparators,
47variation in string values,
47capture process implementation,
50cluster-level matching,
50reference-level matching,
47asserted resolution,
71–77automated update process,
66–70EIS visualization tools,
77–83entity identifiers management,
84–87root cause of information quality issues,
65Data warehousing (DW),
6–7Database administrator (DBA),
9–10Dedicated MDM systems,
55–58Depth and degree of match,
97–99Distributed resolution,
165references and match keys as graph,
166–167transitive closure as graph problem,
165–166Electronic Commerce Code Management Association (ECCMA),
191Entity identifiers management,
84problem of association information latency,
84–85ER and data structures,
identity information,
life cycle management models,
27“matching” records,
“merge-purge” operation,
OYSTER open source ER system,
time aspect,
ambiguous representation,
24, 24fculture and expectation,
25incomplete state,
25, 26fMDM
meaningless state,
25, 25fduplicate record filter,
57dedicated MDM systems,
55–58with duplicate record filter,
57fwith exemplar record,
58fwith record filter and exemplar record,
58fstoring
vs. sharing,
59–60survivor record strategy,
55best record version,
55, 55fvisualization tools,
77–78assertion management,
78–80negative resolution review mode,
81–82, 83fpositive resolution review mode,
83, 85fappropriate algorithm selection,
126–145cluster-to-cluster classification,
122–126comparators
phonetic comparators,
218with consistent classification,
5fde-duplication applications,
3–4exact match and standardization,
207overcoming variation in string values,
208–209scanning comparators,
209information quality,
key data cleansing process,
outcomes measurements,
42results assessment,
37–46Entity-relation database model (E-R database model),
11entity-based data integration,
6–8eXtensible Business Reporting Language (XBRL),
197Extensible markup language (XML),
191External reference architecture,
60–61, 61fFellegi-Sunter Theory of Record Linking,
67–68, 105context and constraints of record linkage,
105–106fundamental Fellegi-Sunter theorem,
108–110attribute level weights and,
110–111frequency-based weights and,
112Frequency-based weights,
112Garbage-in-garbage-out rule (GIGO rule),
92Global Justice XML Data Model (GJXML),
197Hadoop Map/Reduce framework,
161–162IAIDQ Domains of Information Quality,
192Identification Guide (IG),
203Identity, internal
vs. external view,
19–20occupancy history,
20, 20finternal view of identity,
20primary identity attributes,
34–35supporting identity attributes,
35interactive identity resolution,
92–93, 93fIdentity Visualization System (IVS),
78, 79fIncomplete state,
25, 26fInformation Quality Certified Professional (IQCP),
, 192Information retrieval (IR),
155Interactive identity resolution,
92–93, 93fInternational Association for Information and Data Quality (IAIDQ),
192International Organization for Standardization (ISO),
191data quality
vs. information quality,
191–193equivalent references,
41fundamental law of ER,
41
references with sets of links,
40ttrue and false positives and negatives,
41ISO 8000–110 standard,
191conformance to data specifications,
199–202general requirements,
196message referencing a data specification,
201fmultiple-record schema,
200fsingle-record message structure,
200fISO 22745 standard industrial systems and integration,
203simple and strong compliance with,
202–203unambiguous and portable data,
193Jaro String Comparator,
212Key-value pairs, decoding,
163“Large entity” problem,
150Large-scale ER
with single match key blocking,
161decoding key-value pairs,
163Hadoop Map/Reduce framework,
162single index generator,
162fLatent semantic analysis,
218Levenshtein edit comparator,
210–211Levenshtein Edit Distance comparator,
47Managed entity identifiers,
91–92Master data,
Master data management (MDM),
1–4external reference architecture,
60–61, 61freconciliation engine,
63registry architecture,
61–63transaction hub architecture,
63–64business case for,
better service,
cost reduction of poor data quality,
customer satisfaction and entity-based data integration,
6–8data stewardship model,
10policies,
system using background and foreground operations,
59fclosed universe models,
99–100
preresolution blocking with multiple,
154–155problem of similarity functions,
152–153“Matching” records,
Meaningless state,
25, 25fMerge-purge
Metadata,
Multiple-index resolution,
165references and match keys as graph,
166–167transitive closure as graph problem,
165–166Natural language processing (NLP),
14Negative resolution review mode,
81–82, 83fNorth Atlantic Treaty Organization (NATO),
193, 203Occupancy history,
20, 20fOnce-and-Done MDM (O&D MDM),
54Open Technical Dictionary (OTD),
203OYSTER open source ER system,
, 7fPair-level review indicators,
69Phonetic comparators,
218Phonetic encoding algorithms,
151Point-of-sale (POS),
92–93Positive resolution review mode,
83, 85fPreprocess standardization,
207–208Primary identity attributes,
34–35q-Gram Tetrahedral Ratio algorithm (qTR algorithm),
211–212Radio frequency tag identification (RFID),
54Reconciliation engine,
63Record-based projection,
123, 165references and match keys as graph,
166–167transitive closure as graph problem,
165–166Reference
Reference data management (RDM),
Reference-level matching,
47Reference-to-cluster classification,
124–125Reference-to-reference assertion,
76, 77fReference-to-structure assertion,
77, 77fReference-transfer assertion,
74, 74fRegistry architecture,
61trusted broker architecture,
62Representational State Transfer (REST),
94Return-on-investment (ROI),
11Root mean square (RMS),
216Scanning comparators,
209attribute level weights and,
110–111frequency-based weights and,
112Service level agreement (SLA),
89–90, 196Shannon’s Schematic for Communication,
18Social security number (SSN),
34–35, 158Software-as-a-service (SaaS),
10Soundex algorithm,
47, 218Structure query language (SQL),
179Structure-split assertion,
72, 73fsynchronization of identifiers,
73Structure-to-structure assertion,
71, 72fset of assertion transactions,
72Supporting identity attributes,
35Survivor record strategy,
55best record version,
55, 55fSystems of record (SOR),
Taguchi’s Loss Function,
Talburt-Wang Index (TWi),
43–44True link and ER link,
44, 45tTechnical Committee (TC),
191term frequency-inverse document frequency (tf-idf),
214Theoretical foundations
Fellegi-Sunter Theory Of Record Linkage,
105–112Transaction hub architecture,
63–64iterative, nonrecursive algorithm for,
167–168distributed processing,
168Hadoop implementation example,
175–177match key generators,
164Trusted broker architecture,
62U.S. Technical Advisory Group (TAG),
191Uniform resource identifiers (URI),
198Universal Product Code (UPC),
19–20Unmanaged entity identifiers,
91–92Variation in string values,
208–209Very large database system (VLDBS),
59–60