Please note that index links point to page beginnings from the print edition. Locations are approximate in e-readers, and you may need to page down one or more times after clicking a link to get to the indexed material.
References to figures are in italics.
$JAVA_HOME, 118
@IF function, 109–110
A
“A Relational Model of Data for Large Shared Data Banks” (Codd), 19
ACID, 242–243
Ada, 34
Advanced Replication, 79
After (A) image, 103
agents, 58
See also gateways
Agile, integrations with Agile development, 182–183
Apache Hadoop. See Hadoop
Apache Oozie, 245
Apache Pig, 245
APEX, 226–230
application integrations, 6
apply (replicat) process, 94–95
architecture
GoldenGate, 91–95
Oracle Data Integrator, 114–118, 147–153
archive logs, 77
ASCII-formatted files, 102–104
asynchronous mode, 79
atomicity, 242
auditing, 223
B
bad files, 49
BASE, 243
Before (B) image, 103
BerkleyDB, 243
Big Data
defined, 234
and GoldenGate, 246–247
and Oracle Data Integrator, 244–245
overview, 234–236
querying of, 20
volume, variety, and velocity (the three V’s), 234–235
Big Data Appliance
Cloudera, 237
NoSQL, 237
Oracle Big Data SQL, 240
Oracle Loader for Hadoop, 238
Oracle R distribution, 238
Oracle SQL Connector for Hadoop, 239–240
Oracle XQuery for Hadoop, 238
overview, 236–237
Big Data Connectors, 244
big endian systems, 74
business purpose, 187–189
C
capture (extract) process, 92–94
CDBs. See multitenant container databases (CDBs)
challenges
business purpose, 187–189
change, 184–187
data problems, 190–192
designing for integrations, 181–184
integrations with Agile development, 182–183
latency, 194–195
managing mapping tables, 198–200
overview, 180–181
performance, 201–202
standardization, 189–190
synchronizing data and copies, 192–194
testing, 200–201
tool issues, 195–197
change, 184–187
Cloudera, 237
Codd, Edgar, 19
common and uniform access, 5
common data, 6
common data storage, 6
Common Infrastructure Object, 130
communication, 11–12
of the business purpose, 187–189
complete refresh, 61
components needed for integration, 8–11
connected users, 55
consistency, 243
consistent naming, 189–190
consolidation use-cases, 89, 90
COPY command, 41–43
Create Table As Select command, 32, 42
CTAS. See Create Table As Select command
Cutting, Doug, 240
data classification, 9
automated updates, 210–211
defined, 206–207
governing data sources, 217–221
manual updates, 210–211
steps for, 212–213
using Oracle Data Integrator, 222–225
See also data quality; master data management (MDM)
data context, 218–219
Data Control Language. See DCL
Data Definition Language. See DDL
data distribution use-cases, 90
data integration
components needed for, 8–11
defined, 3–4
designing systems for, 7
history of, 4–6
today, 6–11
data lakes, 235
Data Manipulation Language. See DML
data marts, 5
data merge, 9
data migrations. See migrations
data owners, 11
data pools, 235
data pump (extract) process, 94
Data Pump utility, 69–72
data quality, 10, 190–192, 206
assessing the data, 210
cleansing the data, 210–211
establishing data context, 210
evolving business requirements, 210
monitoring, 211–212
using Oracle Data Integrator, 222–225
See also data cleansing; master data management (MDM)
data replication. See replication
data sources, 217–221
data stewards, 220
data types, 10
database links, 54–57
DataNodes, 241
DBMS_HS_PASSTHROUGH package, 59
DCL, 33
DDL, 31–32
Create Table As Select command, 32
TRUNCATE command, 28
WHERE clause, 32
decision flow chart, 11–13
Definition Generator (DEFGEN) utility, 95–96
configuring, 96–97
running, 98–99
DELETE statements, 27–28
delimiter separated values (DSV) files, 100
designing systems for integration, 7, 181–184
DML
DELETE statements, 27–28
INSERT statements, 24–25
MERGE statements, 28–29
overview, 23
ROLLBACK statements, 30
rollbacks, 29
transactions, 29–31
VALUES clause, 24
WHERE clause, 25–26
durability, 243
E
endianness, 74
ETL, 9
See also ODI agent
event triggers, 40–41
export/import utility
full database export, 68
invoking export from the command line, 65
invoking export interactively from the command line, 65–66
invoking export via parameter files, 67–69
external data sources, 3
Extract, Transform, and Load. See ETL
F
federated queries, 193
File System, 241
fixed user links, 55
fixed-length format, 47
configuring a flat file physical schema, 148–151
delimiter separated values (DSV) files, 100
extract parameters to write, 101
generating, 100–102
generating an ASCII-formatted file, 102–104
length separated values (LSV) files, 100
megabyte clause, 102
parameters, 102
types of, 100
force logging, 93–94
FORMATASCII parameter, 102, 105
full database export, 68
See also export/import utility
functions, PL/SQL, 34–36
Fusion Middleware Configuration Wizard, 135–140
G
gateways, 58–60
GoldenGate, 14, 82–83, 88, 114
After (A) image, 103
adaptors, 246–247
apply (replicat) process, 94–95
architecture, 91–95
Before (B) image, 103
benefits of using, 88
and Big Data, 246–247
capture (extract) process, 92–94
changing data using @IF function, 109–110
compressed updates (V), 103
creating native database loader files, 104–106
data pump (extract) process, 94
Definition Generator (DEFGEN) utility, 95–99
extracting for database utility usage, 105–106
flat files, 99–104
force logging, 93–94
functions, 107–109
supplemental logging, 93–94
supported format parameters, 104
testing data with, 107–110
use-cases, 88–91
user exit functions, 106
Google Corporation, 241
GROUP BY clause, 21
H
Hadoop, 192
clusters, 20
Oracle Loader for Hadoop, 238
Oracle SQL Connector for Hadoop, 239–240
Oracle XQuery for Hadoop, 238
overview, 240–241
Hadoop Distributed File System, 239, 241
HAVING clause, 21
HBase, 244
heterogeneous data, 191
heterogeneous platforms, 221
Heterogeneous Services, 58
Hive, 245
Hive KMs, 245
Hive Query Language (HQL), 245
HQL, 245
import. See export/import utility
incremental refresh, 61
INSERT statements, 24–25
instead-of triggers, 40
integrated knowledge modules (IKMs), 171
See also Knowledge Modules (KMs)
integrating data. See data integration
Integrator Studio. See ODI Studio
internal data, 3
Internet of Things, 234
IoT, 234
isolation, 243
J
Java Development Kit (JDK), 136–137
Java EE agents, 117
Java Messaging Service, 99
Java Virtual Machine (JVM), 117, 134–135
K
Knowledge Modules (KMs), 244–245
See also integrated knowledge modules (IKMs)
L
latency, 194–195
LCRs. See logical change records (LCRs)
length separated values (LSV) files, 100
little endian systems, 74
Loading Knowledge Modules (LKMs), 244–245
logical change records (LCRs), 80, 83–84
logical schemas, 152–153, 162–163
Logminer, 76–78
Lovelace, Ada, 34
M
manual integrations, 5
managing mapping tables, 198–200
running, 172–176
simulation, 174–175
step-by-step execution, 175–176
master data management (MDM), 9, 190, 199, 224
overview, 208–209
process, 209–213
See also data cleansing; data quality; metadata management
master repository, 116
materialized views, 60–64
MDM. See master data management (MDM)
MERGE statements, 28–29
metadata management, 10–11, 198–200
near-zero-downtime migrations, 89, 90–91
planning for, 185–186
transportable tablespaces, 72–75
multitenant container databases (CDBs), 75–76
multitenant databases. See multitenant container databases (CDBs)
N
NameNode, 241
near-zero-downtime migrations, 89, 90–91
net service name, 56
NonStop, 82
querying of, 20
null, 24
Nutch, 241
O
ODI. See Oracle Data Integrator
ODI agent, 134–141
scripting startup, 141
starting manually, 141
OGG. See GoldenGate
OLTP, 80
online transaction processing. See OLTP
Oozie, 245
Oracle Application Express (APEX), 226–230
Oracle Big Data Appliance. See Big Data Appliance
Oracle Big Data SQL, 240
Oracle Data Integrator, 14, 224
adding a database data model, 163–166
architecture, 114–118, 147–153
and Big Data, 244–245
configuring a flat file physical schema, 148–151
configuring a topology, 147–153
configuring the ODI agent, 134–141
Console, 118
context menus, 154
creating a database logical schema, 162–163
creating a database physical schema, 159–162
creating a new model, 154–155
creating a project, 167–172
as a data flow modeler, 153–166
defining a datastore for a model, 155–158
deploying the binaries, 118–124
designing models, 153–166
initial connection and wallet configuration, 143–146
installation, 118–124
interacting with Oracle Enterprise Manager 12c, 118
Java EE agents, 117
logical architecture, 151–153
mappings, 166–176
master repository, 116
overview, 114
physical architecture, 147–151
preparing the repository, 124–133
repositories, 115–116
Repository Creation Utility (RCU), 124–133
rules for data quality and cleansing, 222–225
running mappings, 172–176
run-time agents, 117
setting connections, 142–143
setting up the target side of the integration, 159–166
standalone agents, 117
standalone co-located agents, 117
starting, 141–146
users, 116
validating a data integration, 176–177
verifying the repository, 133–134
work repository, 115–116
Oracle Enterprise Data Quality, 224
See also data quality
Oracle Enterprise Manager 12c, 118
Oracle Enterprise Metadata Management (OEMM), 200
Oracle GoldenGate. See GoldenGate
Oracle Inventory, 119
Oracle Loader for Hadoop, 238
Oracle Master Data Management, 224
See also master data management (MDM)
Oracle R distribution, 238
Oracle SQL Connector for Hadoop, 239–240
Oracle Streams, 79–82
Oracle Technology Network, 226
Oracle Universal Installer, 119, 120
Oracle Wallet Manager (OWM), 145
Oracle XQuery for Hadoop, 238
ORDER BY clause, 21–22
OUI. See Oracle Universal Installer
outbound servers, 84
P
packages, 37–38
par files. See parameter files
parameter files, 50
PDBs. See pluggable databases
performance, 201–202
physical schemas, 159–162
Pig, 245
planning a data integration, 180–181
anticipating other uses, 184
business purpose, 187–189
for change, 184–187
data quality, 190–192
designing for integrations, 181–184
integrations with Agile development, 182–183
involving the business and data owners, 183
latency, 194–195
managing mapping tables, 198–200
performance, 201–202
reference data, 183
standardization, 189–190
synchronizing data and copies, 192–194
testing, 200–201
tool issues, 195–197
PL/SQL
functions, 34–36
overview, 33–34
packages, 37–38
stored procedures, 36–37
SYSDATE operator, 37
triggers, 38–41
See also SQL
pluggable databases, 54, 75, 144–145
moving, 76
procedures. See stored procedures
PySpark, 245
Q
queries, 19
federated queries, 193
subqueries, 23
R
RCU wizard, 124–133
RDBMS systems, 19
record indicators, 104
redo logs, 77
refreshing materialized views, 61–62
remote databases, 55
hiding, 57
replication, 9
repositories, 115–116, 124–133
verifying, 133–134
Repository Creation Utility (RCU), 124–133
ROLLBACK statements, 30
rollbacks, 29
row triggers, 39
run-time agents, 117
S
scrubbing data. See data cleansing
SELECT statement, 19–23, 35–36
snapshots, 60
See also materialized views
Spark, 245
SPOOL command, 43–45
SQL
DCL, 33
DDL, 31–32
DML, 23–31
external tables, 20
fields, 20
GROUP BY clause, 21
HAVING clause, 21
joins, 22
ORDER BY clause, 21–22
overview, 19
queries, 19
ROWID, 22
SELECT statement, 19–23, 35–36
semicolons, 20
subqueries, 23
tables, 20
views, 20
See also PL/SQL
SQL Developer Data Modeler, 153
SQL Parser, 60
SQL*Loader
control file, 45–49
dealing with the bad file, 49
invoking, 49–50
overview, 45
SQL*Plus, 14
COPY command, 41–43
SPOOL command, 43–45
SQLLOADER, 105
SQLService, 58
Sqoop, 245
standalone agents, 117
standalone co-located agents, 117
standardization, 189–190
standards
field naming for, 215
outliers, 216
when standards don’t work, 215–217
statement triggers, 39–40
store and forward, 79
stored procedures, 36–37
Streams. See Oracle Streams
structured data, 10
Structured Query Language. See SQL
supplemental logging, 78, 93–94
synchronizing data and copies, 192–194
synchronous mode, 79
SYSDATE operator, 37
T
tablespace migrations. See transportable tablespaces
Tandem space, 82
testing, 200–201
with GoldenGate, 107–110
timing, 12–13
tnsname. See TNSNAMES.ora file
TNSNAMES.ora file, 56
combining, 224
data cleansing, 221–225, 226–230
data quality, 219–220
transactions, 29–31
transport databases, 75–76
transportable tablespaces, 72–75
triggers
event triggers, 40–41
instead-of triggers, 40
overview, 38
row triggers, 39
statement triggers, 39–40
TRUNCATE command, 28
U
unidirectional use-cases, 90
unstructured data, 10
UPSERT statement. See MERGE statements
use-cases
data distribution, 90
near-zero-downtime migrations, 89, 90–91
overview, 88–89
unidirectional, 89
users, and Oracle Data Integrator, 116
V
validation. See data validation
value of data, 2–3
volume of data, 192
W
WebLogic Domain, 135
WebLogic Server, 135
WHERE clause, 20–21, 25–26, 32, 35
work repository, 115–116
X
XML, 191
XStream API
overview, 83
XStream Out, 83–84