This chapter describes the fundamental concepts of integration, and is intended as an introduction to integration technology and terminology. You will:
The term integration has a number of different meanings. A fundamental knowledge of the terms and concepts of integration is an essential part of an integration architect's toolkit. There are many ways of classifying the different types of integration. From an enterprise-wide perspective, a distinction is made between application-to-application (A2A), business-to-business (B2B), and business-to-consumer (B2C) integration. Portal, function, and data integration can be classified on the basis of tiers. Another possible grouping consists of integration based on semantics.
Fundamental integration concepts include Enterprise Application Integration (EAI), Enterprise Service Bus (ESB), middleware, and messaging. These were used to define the subject before the introduction of SOA, and still form the basis of many integration projects today. EAI is, in fact, a synonym of integration. In David Linthicum's original definition of EAI, it means the unrestricted sharing of data and business processes among any connected applications. The technological implementation of EAI systems is, in most cases, based on middleware. The main base technology of EAI is messaging, giving the option of implementing an integration architecture through asynchronous communication, using messages which are exchanged across a distributed infrastructure and a central message broker.
The fundamental integration architecture variants are:
A point-to-point architecture is a collection of independent systems, which are connected through a network.
Hub-and-spoke architecture represents a further stage in the evolution of application and system integration, in which a central hub takes over responsibility for communications.
In pipeline architecture, independent systems along the value-added chain are integrated using a message bus. The bus capability is the result of interfaces to the central bus being installed in a distributed manner through the communication network, which gives applications local access to a bus interface. Different applications are integrated to form a functioning whole by means of distributed and independent service calls that are orchestrated through an ESB and, if necessary, a process engine.
A fundamental technique for integration is the usage of design patterns. These include process and workflow patterns in a service-oriented integration, federation, population, and synchronization of patterns in a data integration, and direct connection, broker, and router patterns, which form part of EAI and EII. It is important to be familiar with all of these patterns, in order to be able to use them correctly.
The most recent integration architectures are based on concepts such as event-driven architecture, grid computing, or extreme transaction processing (XTP). These technologies have yet to be tested in practice, but they are highly promising and of great interest for a number of applications, in particular, for corporate companies and large organizations.
The Trivadis Integration Architecture Blueprint applies a clear and simple naming to each of the individual layers. However, in the context of integration, a wide range of different definitions and terms are used, which we will explain in this chapter.
Nowadays, business information systems in the majority of organizations consist of an application and system landscape, which has grown gradually over time. The increasing use of standard software (packaged applications) means that information silos will continue to exist. IT, however, should provide end-to-end support for business processes. This support cannot, and must not, stop at the boundaries of new or existing applications. For this reason, integration mechanisms are needed, which bring together individual island solutions to form a functioning whole. This happens not only at the level of an individual enterprise or organization, but also across different enterprises, and between enterprises and their customers. At an organizational level, a distinction is made between A2A, B2B, and B2C integration (Pape 2006). This distinction is shown in the image below. Each type of integration places specific requirements on the methods, technologies, products, and tools used to carry out the integration tasks. For example, the security requirements of B2B and B2C integrations are different from those of an A2A integration.
Modern concepts such as the Extended Enterprise integration across organizational boundaries, (Konsynski 1993) and the Virtual Enterprise (Hardwick, Bolton 1997) can be described using a combination of the different integration terms.
Integration projects are generally broken down into information portals, shared data integration, and shared function integration. Portals integrate applications at a user interface level. Shared data integration involves implementing integration architectures at a data level, and shared function integration at a function level.
The majority of business users need access to a range of systems in order to be able to run their business processes. They may need to be able to answer specific questions (that is, a call center taking incoming customer calls must be able to access the latest customer data) or to initiate or implement certain business functions (that is, updating customer data). In these circumstances, employees often have to use several business systems at the same time. An employee may need to access an order system (on a host) in order to verify the status of a customer order and, at the same time, may also have to open a web-based order system to see the data entered by the customer. Information portals bring together information from multiple sources. They display it in one place so that users do not have to access several different systems (which might also require separate authentication) and can work as efficiently as possible (Kirchhof et al. 2003). Simple information portals divide the user's screen into individual areas, each of which displays the data from one backend system independently, without interacting with the others. More sophisticated systems allow for limited interaction between the individual areas, which makes it possible to synchronize the different areas. For example, if the user selects a record in one area, the other areas are updated. Other portals use such advanced integration technology that the boundaries between the portal application and the integrated application become blurred (Nussdorfer, Martin 2006).
Shared databases, file replication, and data transfers fall in the category of integration using shared data (Gorton 2006).
In the same way that different business systems store redundant data, they also have a tendency to implement redundant business logic. This makes maintenance and adapting to new situations both difficult and costly. For example, different systems must be able to validate data using predefined, centrally managed business rules. It makes sense to manage such logic in a central place.
In many cases, EAI solutions have only been able to fulfill the expectations placed on them to either a limited extent, or in an unsatisfactory way. This is, among other things, due to the following factors (Rotem-Gal-Oz 2007):
If EAI solutions are used in combination with web services to link systems together, this is still not the equivalent of an SOA. Although the number of proprietary connection components between the systems being linked are reduced by the use of open WS-* standards, a "real" SOA involves a more extensive architectural approach, based on a (business) process-oriented perspective on integration problems.
While EAI is data driven and puts the emphasis on application interface integration, SOA is a (business) process-driven concept, which focuses on integrating service interfaces in compliance with open standards encapsulating the differences in individual integration approaches. As a result, it removes the barrier between the data integration and application integration approaches. However, SOA has one significant problem, which is that of semantic integration. Existing web services do not provide a satisfactory answer to this problem, but they do allow us to formulate the right questions in order to identify future solutions.
The challenge represented by semantic integration is based on the following problem:
This means that the data structure and data information (its meaning) are often not the same thing and, therefore, have to be interpreted (Inmon, Nesavich 2008).
The following example will help to make this clearer:
A date, such as "7 August 1973," forms part of the data. It is not clear whether this information is a text string or in a date format. It may even be in another format and will have to be calculated on the basis of reference data before runtime. This information is of no relevance to the user.
However, it might be important to know what this date refers to, in other words, its semantic meaning in its reference context. Is it a customer's birthday, or the date on which a record was created? This example can even be more complex.
Another example that can be interpreted differently in different contexts is the term Caesar, for instance. Depending on the context, it could be the name of a pet or the name of pet food, a well-known salad, a gambling casino, or the name of a Roman emperor.
It is clear that data without a frame of reference is lacking any semantic information, causing the data to be ambiguous and possibly useless. Ontologically-oriented interfaces, as well as adaptive interfaces, can help to create such semantic reference and will become increasingly important in the field of autonomous B2B or B2C marketplaces in the future.
One semantic integration approach is, for example, the use of model-based semantic repositories (Casanave 2007). These repositories store and maintain implementation and integration designs for applications and processes (Yuan et al. 2006). They access existing vocabularies and reference models, which enable a standardized modeling process to be used. Vocabularies create a semantic coupling between data and specific business processes, and it is through these vocabularies that the services and applications involved are supplied with semantic information in the surrounding technical context. The primary objective of future architectures must be to maintain the glossary and the vocabularies, in order to create a common language and, therefore, a common understanding of all the systems and partners involved. Semantic gaps must be avoided or bridged wherever possible, for example transforming input and output data by using canonical models and standardized formats for business documents. These models and formats can be predefined for different industries as reference models [EDI (FIPS 1993), RosettaNet (Damodaran 2004), and so on]. Transformation rules can be generated and stored on the basis of reference models, in the form of data cards and transformation cards. In the future, there will be a greater focus on the declarative description (what?) and less emphasis on describing the concrete software logic (how?) when defining integration architectures. In other words, the work involved in integration projects will move away from implementation, and towards a conceptual description in the form of a generative approach, where the necessary runtime logic is generated automatically.
The term Enterprise Application Integration (EAI) has become popular with the increased importance of integration, and with more extensive integration projects. EAI is not a product or a specific integration framework, but can be defined as a combination of processes, software, standards, and hardware that allow for the end-to-end integration of several enterprise systems, and enable them to appear as a single system (Lam, Shankararaman 2007).
Definition of EAI
The use of EAI means the unrestricted sharing of data and business processes among any connected applications (Linthicum 2000).
From a business perspective, EAI can be seen as the competitive advantage that a company acquires when all its applications are integrated into one consistent information system. From a technical perspective, EAI is a process in which heterogeneous applications, functions, and data are integrated, in order to allow the shared use of data and the integration of business processes across all applications. The aim is to achieve this level of integration without major changes to the existing applications and databases, by using efficient methods that are cost and time effective.
In EAI, the focus is primarily on the technical integration of an application and system landscape. Middleware products are used as the integration tools, but, wherever possible, the design and implementation of the applications are left unchanged. Adapters enable information and data to be moved across the technologically heterogeneous structures and boundaries. The service concept is lacking, as well as the reduction of complexity and avoidance of redundancy offered by open standards. The service concept and the standardization only came later with the emergence of service-oriented architectures (SOA), which highlighted the importance of focusing on the functional levels within a company, and its business processes.
Nowadays, software products which support EAI are often capable of providing the technical basis for infrastructure components within an SOA. As they also support the relevant interfaces of an SOA, they can be used as the controlling instance for the orchestration, and help to bring heterogeneous subsystems together to form a whole. Depending on its strategic definition, EAI can be seen as a preliminary stage of SOA, or as a concept that competes with SOA.
SOA is now moving the concept of integration into a new dimension. Alongside the classic "horizontal" integration, involving the integration of applications and systems in the context of an EAI, which is also of importance in an SOA, SOA also focuses more closely on a "vertical" integration of the representation of business processes at an IT level (Fröschle, Reinheimer 2007).
SOAs are already a characteristic feature of the application landscape. It is advisable when implementing new solutions to ensure that they are SOA-compliant, even if there are no immediate plans to introduce an integration architecture, or an orchestration layer. This allows the transition to an SOA to be made in small, controllable steps, in parallel with the existing architecture and on the basis of the existing integration infrastructure.
Integration architectures are based on at least three or four integration levels (after Puschmann, Alt 2004 and Ring, Ward-Dutton 1999):
Message queues were introduced in the 1970s as a mechanism for synchronizing processes (Brinch Hansen 1970). Message queues allow for persistent messages and, therefore, for asynchronous communication and the guaranteed delivery of messages. Messaging decouples the producer and the consumer with the only common denominator being the queue.
The most important properties of messaging, quality attributes of messaging, are shown in the following table:
Attribute |
Comment |
---|---|
Availability |
Physical queues with the same logical name can be replicated across several server instances. In the case of a failure of one server, the clients can send the message to another. |
Failure handling |
If communication between a client and a server fails, the client can send the message via failover mechanisms to another server instance. |
Modifiability |
Clients and servers are loosely coupled by the messaging concept, which means that they do not know each other. This makes it possible for both clients and servers to be modified without influencing the system as a whole. Another dependency between producer and consumer is the message format. This dependency can be reduced or removed altogether by introducing a self-descriptive general message format (canonical message format). |
Performance |
Messaging can handle several thousands of messages per second, depending on the size of the messages and the complexity of the necessary transformations. The quality of service also has a major influence on the overall performance. Non-reliable messaging, which involves no buffering provides better performance than reliable messaging, where the messages are stored (persisted) in the filesystem or in databases (local or remote), to ensure that they are not lost if a server fails. |
Scalability |
Replication and clustering make messaging a highly scalable solution. |
Publish/subscribe represents an evolution of messaging (Quema et al. 2002). A subscriber indicates, in a suitable form, its interest in a specific message or message type. The persistent queue guarantees secure delivery. The publisher simply puts its message in the message queue, and the queue distributes the message itself. This allows for many-to-many messaging:
The most important properties of publish/subscribe, quality attributes of publish/subscribe, are listed in the following table:
Attribute |
Comment |
---|---|
Availability |
Physical topics with the same logical name can be replicated across several server instances. In the case of the failure of one server, the clients can send the message to another. |
Failure handling |
In the case of the failure of one server, the clients can send the message to another replicated server. |
Modifiability |
The publisher and the subscriber are loosely coupled by the messaging concept, which means that they do not know each other. This makes it possible for both publisher and subscriber to be modified without influencing the system as a whole. Another dependency is the message format. This can be reduced or removed altogether by introducing a self-descriptive, general message format (canonical message format). |
Performance |
Publish/subscribe can process thousands of messages per second. Non-reliable messaging is faster than reliable messaging, because reliable messages have to be stored locally. If a publish/subscribe broker supports multicast/broadcast protocols, several messages can be transmitted to the subscriber simultaneously, but not serially. |
Scalability |
Topics can be replicated across server clusters. This provides the necessary scalability for very large message throughputs. Multicast/broadcast protocols can also be scaled more effectively than point-to-point protocols. |
A message broker is a central component, which is responsible for the secure delivery of messages (Linthicum 1999). The broker has logical ports for receiving and sending messages. It transports messages between the sender and the subscriber, and transforms them where necessary.
The most important tasks of a message broker, as shown in the preceding diagram, are implementing a hub-and-spoke architecture, the routing, and the transformation of messages.
The most important properties of a message broker, quality attributes of a message broker, are listed in the following table:
Attribute |
Comment |
---|---|
Availability |
To provide high availability, brokers must be replicated and operate in a clusters. |
Failure handling |
Brokers have different types of input ports that validate incoming messages to ensure that they have the correct format, and reject those with the wrong format. If one broker fails, the clients can send the message to another replicated broker. |
Modifiability |
Brokers separate transformation logic and routing logic from one another and from senders and receivers. This improves modifiability, as the logic has no influence on senders and receivers. |
Performance |
Because of the hub-and-spoke approach, brokers can potentially be a bottleneck. This applies in particular in the case of a high volume of messages, large messages and complex transformations. The throughput is typically lower than with simple reliable messaging. |
Scalability |
Broker clusters allow for high levels of scalability. |
A messaging infrastructure provides mechanisms for sending, routing, and converting data, between different applications running on different operating systems with different technologies, as shown in the following diagram (Eugster et al. 2003):
A messaging infrastructure involves the following parties/components:
An Enterprise Service Bus is an infrastructure that can be used to implement an EAI. The primary role of the ESB is to decouple client applications from services, as shown in the following diagram (Chappell 2004):
The encapsulation of services by the ESB means that the client application does not need to know anything about the location of the services, or the different communication protocols used to call them. The ESB enables the shared, enterprise-wide, and even intra-enterprise use of services and separate business processes from the relevant service implementations (Lee et al. 2003).
The major SOA vendors now offer specific Enterprise Service Bus products, which provide a series of core functions in one or another form, shown in the following diagram:
The following diagram shows the basic structure of an ESB in a vendor-neutral way:
The naming for the single components used by the different vendors of SOA products will vary from those shown in the above diagram, but the products provide the following functions as a minimum (Craggs 2003):
In most cases the technological realization of EAI systems is done through what is commonly termed middleware. Middleware is also described as a communication infrastructure. It allows communication between software components, regardless of the programming language in which the components were created, the protocols used for communication, and the platform on which the components are running (Thomson 1997). A distinction is made between the different types of middleware according to the methods of communication used, and the base technology.
Communication methods for middleware can be broken down into five categories:
Middleware type |
Communication |
Relationship |
Synchronous/ asynchronous |
Interaction |
---|---|---|---|---|
Peer-to-peer, API |
Conversational |
1:1 |
Synchronous |
Blocking |
Database gateways |
Request/reply |
1:1 |
Synchronous |
Blocking |
Database replication |
Request/reply/ Message queue |
1:N 1:N |
Synchronous Asynchronous |
Blocking Non-blocking |
Remote procedure calls |
Request/reply |
1:1 |
Mostly synchronous |
Mostly blocking |
Object request brokers |
Request/reply |
1:1 |
Mostly synchronous |
Mostly blocking |
Direct messaging |
Message passing |
1:1 |
Asynchronous |
Non-blocking |
Message queue systems |
Message queue |
M:N |
Asynchronous |
Non-blocking |
Message infrastructure |
Publish/subscribe |
M:N |
Asynchronous |
Non-blocking |
Middleware can be broken down into the following base technologies:
Information can be routed in different ways within a network. Depending on the type of routing used, routing schemes can be broken down into the following four categories:
The unicast routing scheme sends data packages to a single destination. There is a 1:1 relationship between the network address and the network end point:
The broadcast routing scheme sends data packets in parallel to all the possible destinations in the network. If there is no support for this process, the data packets can be sent serially to all possible destinations. This produces the same results, but the performance is reduced. There is a 1:N relationship between the network address and the network end point.
The multicast routing scheme sends data packets to a specific selection of destinations. The destination set is a subset of all the possible destinations. There is a 1:N relationship between the network address and the network end point: