Chapter 7. Surrounding Oracle with open source software

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Surrounding Oracle with open source software

In addition to Oracle workloads, IBM z Systems offers you the opportunity to combine the best of your enterprise databases and open source technology, including several open source database managers that are supported on IBM z Systems.

This chapter includes the following topics:

•Open source database managers on IBM z Systems

•Summary

7.1 Open source database managers on IBM z Systems

The question is often asked, “Why are open source database managers gaining popularity?”

Through mergers and acquisitions, experimentation and IT trends, most IT shops seem to have more than one database model and more than one database manager. The innovation in the open source software community created experimentation and even production use of open source database managers (OSDBMs) to handle new data formats that are driven by cloud computing, mobile computing, and the API economy.

Most information technology organizations run their OSDBMs on a separate system from their production Oracle databases. However, this configuration is not necessary with IBM z Systems. IBM z Systems allows organizations to securely run multiple database managers and database models on the same system, without compromising service level agreements for response time and availability.

This section introduces several popular OSDBMs that can be deployed on IBM z Systems alongside your production Oracle databases. Each OSDBM that is described in this chapter was selected based on the significant potential business value from being deployed on IBM z Systems.

7.1.1 Database types

The reason for having multiple database managers is not only to serve different applications with different sets of data and different requirements for qualities of service. Some types of data are not well-managed by traditional data models.

Several models are available for storing and managing data. Today, database models and how programs access the data are in two broad categories: SQL (or relational) and NoSQL (Not only SQL or non-relational).

SQL

This section describes relational database managers, which use structured query language (SQL) as the programming language for inserting, deleting, updating, and querying data. This relational model for database management was invented by E.F. Codd of IBM’s San Jose Research Laboratory and is the theoretical basis for relational databases, such as the Oracle Database and other proprietary or closed-source SQL relational database managers that run on IBM z Systems, such as IBM DB2®, IBM Informix®, Microsoft SQL Server, and SAP Adaptive Server Enterprise (formerly Sybase Adaptive Server Enterprise and Sybase SQL Server).

Many other open source relational database managers run on IBM z Systems, including the following examples:

•Apache Derby (previously IBM Cloudscape)

•MariaDB

•MySQL (owned by Oracle)

•PostgreSQL (object-relational)

NoSQL

This category describes databases that were created to meet a new demand as a result of the Web 2.0 era, including modern applications that generate large volumes of data (structured, semi-structured, or unstructured data). NoSQL databases help companies to better manage this type of data through a flexible scheme that is boosting the agility and increasing the performance, while sacrificing the Atomicity, Consistency, Isolation, and Durability (ACID) properties of transactions in relational databases.

Therefore, depending on your business needs, data can be stored in document-oriented databases, graph databases, wide column databases, or key-value databases, which are the main types of NoSQL databases.

Several closed-source non-relational databases run on IBM z Systems, including IBM Domino® server, IBM FileNet® ECM, and Software AG Adabas. The following open source NoSQL database managers run on IBM z Systems:

•Alfresco

•Apache Cassandra

•Apache Geode

•Apache ZooKeeper (key-value store)

•Apache CouchDB (document-oriented DB)

•Couchbase (formerly Membase, a document-oriented DB)

•MongoDB

•Neo4j (graph database)

•PostgreSQL (PostgreSQL also supports several NoSQL models)

•Redis (key-value store)

Two of these OSDBMs (PostgreSQL and MongoDB) are described next.

7.1.2 PostgreSQL

PostgreSQL is an open source object-relational database management system (ORDBMS). It conforms to current SQL standards and ensures the ACID database proprieties. This ORDBMS features a large and active community of contributors and is used by companies of all sizes from all around the world.

Many companies choose to use PostgreSQL to build their IT solutions for the following reasons:

•Performance, robustness, and stability

•Standards-compliance to ensure data reliability and integrity

•Optimizing database cost savings

•Compatibility with the main open source and proprietary databases

•Reuse skills from other databases

Added value of running PostgreSQL on IBM z Systems

IBM z Systems capabilities improve PostgreSQL performance and maximize the security of sensitive data, including advanced encryption features. In addition to a significant improvement of the throughput compared to alternative platforms, IBM z Systems vertical scaling reduces the complexity of managing different boxes and enhances business agility.

IBM works closely with the PostgreSQL open source community to port and validate PostgreSQL packages on IBM z Systems while ensuring that PostgreSQL uses all IBM z Systems capabilities. In addition, a partnership between IBM and 2ndQuadrant provides PostgreSQL enterprise support for clients (PostgreSQL running on RHEL or SUSE). For more information about this support, see the 2ndQuadrant website.

Where to obtain PostgreSQL open source packages

On IBM z Systems, PostgreSQL can be run on RHEL, SUSE, or Ubuntu. Table 7-1 lists the PostgreSQL open source packages that are available from IBM z Systems. By using this information, you can get the PostgreSQL open source packages tested and validated by IBM Open-source team or instructions to build the binary files.

Table 7-1 PostgreSQL open source packages

Linux distribution	PostgreSQL 9.6 packages	PostgreSQL 9.4 packages
RHEL	Link for building PostgreSQL for the three distributions	RHEL6
		RHEL7
SUSE	Link for building PostgreSQL for the three distributions	SUSE Linux Enterprise Server 12
		SUSE Linux Enterprise Server 11
Ubuntu	Link for building PostgreSQL for the three distributions	Ubuntu 16: Available by default in Ubuntu distribution

7.1.3 MongoDB overview

MongoDB is a NoSQL Document Oriented Database. It stands out from relational databases by using dynamic schemas, with which records can be inserted without creating an initial schema to define data structure. Therefore, fields and their values can be easily modified to map application changes without interruption. This OSDBM is widely used for mobile apps, real-time analytics, product catalogs, and content management systems. It can also be used for many other use cases, such as storing stream of data from IOT.

MongoDB stores data in JSON-like documents (aligning data storage formats with modern programming languages that are used by developers) instead of columns and rows in relational databases. Each document can contain different fields, which contain a value that belongs to the same data type including subdocuments and arrays.

Instead of having related data that is represented by different tables, the cost of joining separate tables is eliminated by storing the linked objects in the same document. This configuration reduces complexity and simplifies data access. Similar Documents are organized into collections in the database.

Figure 7-1 shows the difference between MongoDB document and relational data models.

Figure 7-1 MongoDB document model versus relational data model

Table 7-2 lists some of the concepts that are used in Mongodb and their counterparts in relational databases; for example, Oracle database.

Table 7-2 MongoDB versus Oracle database

MongoDB	Oracle database
JSON-like document	Row
Collection	Table
Embedded subdocuments and Linking	Join
Index	Index

Added value of running MongoDB on IBM z Systems

With IBM z Systems, many cores can be used (scale 1 - 141 cores), which allows MongoDB to scale vertically (up to 2 TB single-node MongoDB) without the use of the sharding technique, which is also known as horizontal scaling, to split a large amount of data across different servers. This method is the best way to remove the overhead of sharding and reduce the complexity of managing different Mongodb shards.

The network communication between the MongoDB shards is also drastically reduced (near zero latency) by using IBM HiperSockets technology that is used in IBM z Systems to accelerate the communication between the different LPARs.

The enterprise edition of MongoDB (called MongoDB Enterprise Advanced) is supported on IBM z Systems.

In addition, the capabilities of IBM z Systems in terms of security (EAL5+ certification), high availability, and resiliency can be used to efficiently enhance MongoDB’s capabilities and achieve the most stringent market requirements.

Obtaining MongoDB packages

For more information, see the following resources:

•The enterprise edition of MongoDB can be downloaded from the MongoDB Download Center website.

At the website, click the Enterprise Server tab. Then, find your Linux distribution (s390x) by using the Platforms drop-down list and click Download.

• The open source packages that are tested and validated by IBM Open-source team and instructions to build the binaries can be found at the Building MongoDB website.

7.1.4 Integrating Apache Spark with Oracle and MongoDB databases that are running on IBM z Systems

IBM z Systems can be used to run open source databases on the same server as enterprise databases. At the same time, the same quality of service can be maintained and client requirements can be met in terms of performance, high availability, data security, and cost optimization.

One important need that all companies must meet consists of extending the value of internal and external data by extracting useful information at the correct time by using Analytics, which is a data analysis process that is necessary for the survival of any business.

As a result of this critical business need, IBM z Systems provides support for one of the most popular open source analytics frameworks, which is called Apache Spark. With Apache Spark on IBM z Systems, organizations can implement their critical analytics use cases without moving sensitive data, while taking advantage of IBM z Systems capabilities that are coupled with a powerful analytics framework. This combination simplifies and accelerates data analysis, especially when the different databases are running on the same server.

In this section, we consider a sample analytics use case that involves a data integration challenge for a client that is running Oracle and MongoDB databases. This use case shows how Apache Spark on IBM z Systems can help you perform analytics by easily combining and integrating SQL and NoSQL without any external data movement.

This example is intended for illustrative purposes only. You might be able to build your own use cases by mixing different sources of data from other databases that have not been mentioned here, with the possibility of including external data; for example, social data, depending on your business needs.

The following topics are described next:

•An overview of Apache Spark

•Use case overview: Integration between Oracle data and MongoDB data

•Use case implementation by using Apache Spark

Apache Spark overview

Apache Spark is an open source cluster in-memory computing engine for fast and efficient execution of complex analytics. It includes the following features:

•Parallel processing compute engine: Spark core engine with High-level APIs in Scala, Java, Python, and R.

•Integrated tools (see Figure 7-2 on page 99) for SQL and structured data processing, machine learning (MLib), graph processing, and streaming.

Figure 7-2 shows a high-level overview of Apache Spark.

Figure 7-2 High-level overview of Apache Spark

Use case overview: Integration between Oracle data and MongoDB data

As a solution to new business needs, some clients choose to combine SQL and NoSQL data stores. In this example, the client chose to use MongoDB for its new web application to store the multiple types of products that are presented on the website. This configuration benefits from MongoDB’s simple and flexible data model because MongoDB is a good fit to build e-commerce product catalogs compared to RDBMS. Oracle database is still used as a system of record to perform complex transactions, such as order management.

For example, to obtain details about the most purchased type of product by women during a certain period for each region, the data analysis must go through Oracle tables, such as orders table, and MongoDB documents, such as products collection. This process appears to be simple, but it is a meticulous process that can take a long time to receive analytics results. Therefore, how can Apache Spark help to perform this data analytics with the best performance and the least effort possible?

Use case implementation that uses Apache Spark

Among the concepts that are used by Apache Spark to catch data from different sources, we can find DataFrames. A DataFrame is an immutable distributed collection of data that is organized into named columns and partitioned across nodes in a cluster. It features the benefits of Spark SQL’s optimized execution engine and provides relational view of data to perform data analysis, especially for people with SQL skills.

In this case, you can use these DataFrames to load the different data from Oracle and MongoDB databases that are running on IBM z Systems. After the DataFrames are instanced, you can easily use the different algorithms and high-level operators that are provided by Apache Spark to transform these DataFrames (map, filter, join, and so on).

Operations, such as complex joins between data from MongoDB and Oracle, can be performed in a short time without taking into account all complications and costs that are related to data movement between different physical servers. It also benefits from the memory processing capacity of Apache Spark.

Figure 7-3 shows an overview of the use of Apache Spark (Spark SQL module) with Oracle and MongoDB databases to perform analytics operations on two types of data (SQL: Table “Order” and NoSQL: Document “Product”).

Figure 7-3 Data integration use case that uses Apache Spark

Loading data from Oracle database into Apache Spark

In this section, we describe an example of how to load data from Oracle database 12c into Apache Spark 2.1.0 (both running on IBM z Systems) by using JDBC driver and DataFrame API.

Note: The IBM packages for Apache Spark can be found at the IBM developerWorks website.

Complete the following steps:

1. Download Oracle Database 12c JDBC driver from the Oracle Database 12.1.0.1 JDBC Driver and UCP Downloads website.

2. Go to the directory where you installed Spark. Then, start the Spark shell and specify the path location of the JDBC driver to use, by running the command shown in Example 7-1.

Example 7-1 Command to specify JDBC driver path location

sparkdv1:/opt/spark/spark-dk-2.0.0.0/spark/bin # ./spark-shell --driver-class-path ojdbc6.jar --jar s ojdbc6.jar

After running the command shown in Example 7-1 on page 100, you should see the Spark interactive shell in Scala, as shown in Figure 7-4.

Figure 7-4 Spark shell

Note: In this example, the Oracle database JDBC driver is in the same directory where we can run the Spark Shell.

3. Before establishing the connection with the database, ensure that the Oracle listener is running by using the lsnrctl status command.

Example 7-2 shows the command that is used to connect an Oracle database to Apache Spark by using the JDBC driver and read data from the Oracle table “Users” into a “jdbcDF” DataFrame.

Example 7-2 Command to connect to an Oracle database and its resulting output

scala> var jdbcDF = spark.read.format("jdbc").option("url","jdbc:oracle:thin:@10.3.58.125:1521/orcl.mop.fr.ibm.com").option("dbtable","narjisse.ORDERS").option("user", "narjisse").option("password","****").load()

jdbcDF: org.apache.spark.sql.DataFrame = [ORDER_ID: decimal(6,0), STATUS: string ... 7 more fields]

A successful execution of this command returns a DataFrame that contains all data read from table “Users” that can be used for future SQL queries with other DataFrames.

A similar process can be used to read MongoDB documents into DataFrames by using the MongoDB Spark connector, which simplifies the integration between different data sources and rapid and efficient implementation of analytics use cases.

7.2 Summary

In a data-driven world, IBM offers a server that provides the flexibility and stability to run a wide range of open source products next to Oracle databases. This ability enables companies to benefit from the constant innovation that stems from Open source world.

By choosing IBM z Systems, organizations can benefit from a secure, highly available, and easy-to-manage environment to consolidate their data while optimizing overall costs.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 7. Surrounding Oracle with open source software

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 7. Surrounding Oracle with open source software