Chapter 17. XML and EAI

 

Internet is so big, so powerful and pointless that for some people it is a complete substitute for life.

 
 --Andrew Brown

With so many vendors promoting XML as the standard integration mechanism—the common data exchange format that everyone can agree upon—any discussion of EAI deserves a discussion of XML. However, while the coupling of XML with EAI exists, it is a tenuous coupling. Still, it is relevant enough to warrant devoting an entire chapter to it and its value within a typical EAI problem domain.

From the outset, it is important to remember that XML is in a state of flux. For more detailed and more current information (mostly pertaining to XML's application to the Web), see http://www.w3c.org/xml. The site lists the current state of XML, along with a large number of proposals to evolve XML, many looking to expand its use into the enterprise. The site also lists the XML strategies (or lack thereof) employed by middleware vendors. For example, IBM is listed as having an aggressive XML-enablement strategy for MQSeries message-queuing software.

The Rise of XML

XML provides a common data exchange format, encapsulating both metadata and data. This allows different applications and databases to exchange information without having to understand anything about each other. In order to communicate, a source system simply reformats a message, a piece of information moving from an interface, or a data record as XML-compliant text and moves that information to any other system that understands how to read XML (see Figure 17.1). Simple as one, two, three.

Despite its value to EAI, XML was not originally designed for EAI. XML was created as a mechanism to publish data through the Web without the originator having to understand anything about the system sending the data. As the enterprise integration problem became more evident, EAI architects and developers saw the value of applying XML to the problem domain in order to move information throughout an enterprise. Even more valuable, XML has an application as a common text format to move information between enterprises, supporting supply chain integration efforts. As a result, many are calling XML the next EDI (Electronic Data Interchange).

While its benefit may appear revolutionary, XML itself is anything but revolutionary. It also falls short of being a panacea for the EAI solution. Still, it presents some real value for integration, even though applying this value means stretching XML far from its original intention. The problem is, of course, that if stretched too far, XML may be applied in areas where it has little chance of success. The over-application of XML to so many areas of technology dilutes its real value and results in a great deal of unnecessary confusion. Moreover, many vendors are looking to take XML and recast it using their own set of proprietary extensions. While some want to add value to XML, others are seeking only to lock in users.

XML provides a standard mechanism for data exchange.

Figure 17.1. XML provides a standard mechanism for data exchange.

What's XML?

Like HTML (Hypertext Markup Language), XML is a subset of Standard Generalized Markup Language (SGML), a venerable standard for defining descriptions of structure and content in documents. Unlike HTML, which is limited to providing a universal method to display information on a page (without context or dynamic behavior), XML takes the next step—providing context and giving meaning to data.

XML redefines some of SGML's internal values and parameters while simultaneously removing many of the little-used features that make SGML so complex. XML maintains SGML's structural capabilities, letting middleware users define their own document types. It also introduces a new type of document, one in which it is unnecessary to define a document type at all.

XML has shown great promise in defining vertical market XML vocabularies for a particular industry, vocabularies that define a set of elements and describe where they should fit in the document structure. Both the automobile industry and the health care industry have become prominent players in the new world of XML vocabularies.

The use of industry-specific vocabularies is one of the most promising aspects of EAI and XML. Such vocabularies can define information interchange and even transformation characteristics to fit a specific industry. As a result, EAI solutions, inclusive of metadata and behavior, can be used among verticals. Herein lies the future of EAI—the ability to leverage existing best-practice integration solutions in the same way packaged applications have been leveraged. Sometime in the not too distant future, it will be possible to purchase the best-of-breed integration solution rather than build it from the ground up.

Data Structures

XML is simple to understand and use. XML can take large chunks of information and consolidate them into an XML document—meaningful pieces that provide structure and organization to the information.

The primary building block of an XML document is the element defined by tags (see the example in the next section). An element has both a beginning and an ending tag. All XML documents contain an outermost element known as the root element, in which all other elements are contained. XML is also able to support nested elements, or elements within elements. Nesting allows XML to support hierarchical structures and, as a result, the use of traditional hierarchical and object-oriented databases for XML document storage. Element names describe the content of the element, and the structure describes the relationship between the elements.

An XML document is considered to be "well formed" (able to be read and understood by an XML parser) if it is in a format that complies with the XML specification, if it is properly marked up, and if elements are properly nested. XML also supports the ability to define attributes for elements and describe characteristics of the elements. This is contained within the beginning tag of an element.

DTDs

The Document Type Definition (DTD) determines the structure and elements of an XML document. When a parser receives a document using a DTD, it makes sure the document is in the proper format.

XML documents can be very simple, with no document type declaration:

<?xml version="1.0" standalone="yes"?>
<conversation>
<greeting>Hello, world!</greeting>
<response>Stop the planet, I want to get off!</response>
</conversation>

Or they can be more complicated. They may be DTD specified, contain an internal subset, and possess a more complex structure:

<?xml version="1.0" standalone="no" encoding="UTF-8"?>
<!DOCTYPE titlepage SYSTEM
  "http://www.frisket.org/dtds/typo.dtd" 
[<!ENTITY % active.links "INCLUDE">]>
<titlepage>
<white-space type="vertical" amount="36"/>
<title font="Baskerville" size="24/30"
  alignment="centered">Hello, world!</title>
<white-space type="vertical" amount="12"/>
<!— In some copies the following decoration is
  hand-colored, presumably by the author —>
<image location=http://www.foo.bar/fleuron.eps
  type="URL" alignment="centered"/>
<white-space type="vertical" amount="24"/>
<author font="Baskerville" size="18/22"
  style="italic">Munde Salutem</author>
</titlepage>

[Source: W3C]

XML Parsers

The fundamental difference between HTML and XML is that, unlike HTML, XML defines the content rather than the presentation. For example, Microsoft Internet Explorer uses an XML parser, which is able to read an XML page and extract the data for access by another program.

Parsers are becoming part of the middleware layer, able to process XML documents. Major middleware vendors, including IBM and BEA, are planning to offer XML parsers as part of their middleware offerings. Most middleware users will leverage these parsers to move XML documents in and out of the middleware layers (see Figure 17.2).

XML Metadata

XML metadata can be any attribute assignable to a piece of data, from the concrete to such abstract concepts as the industry associated with a particular document. XML can also be used to encode any number of existing metadata standards. The binding of data and metadata is a key feature that makes XML most applicable to information-sharing scenarios. This feature is consistent with the concept of a common enterprise metadata repository that's supported throughout an organization. XML is attempting to support common metadata standards throughout the Internet. In that context, supporting metadata standards within the enterprise should not pose an insurmountable problem.

Because XML doesn't depend on any particular type of metadata format, there is little risk that a particular technology vendor will define its own set of metadata tags. In other words, XML cannot be made proprietary to any particular type of data. When the Resource Description Framework (RDF, discussed in the section "RDF and EAI," later in this chapter) brings all the metadata initiatives together, the data can be shared.

Most middleware vendors will provide XML parsers with their products.

Figure 17.2. Most middleware vendors will provide XML parsers with their products.

XML and Middleware

There is tremendous vendor competition here, with vendors staking early claims to XML dominance. In the process, they seem to be turning XML, which is really just a specification, into a technology, which may not be beneficial to XML at this time.

Each vendor looks at XML in a different way. Some vendors see XML as a common information exchange mechanism. Others see it as a database storage format. Some look to the Web attributes of XML, while the rest see XML as a mechanism to finally get metadata under control.

XML (and the technology that accompanies it) is an excellent text-processing facility. It can follow a set of rules for the creation and organization of documents, resulting in a text format most users are willing to agree on. As a result, users of different types of middleware will be able to share information more easily. These benefits are possibly due to the self-describing nature of XML data.

XML's benefit to middleware is clear. XML provides a generic format for the exchange of information. Middleware simply needs to move messages that encapsulate XML and have those messages understood by any source or target application that understands XML. Seems pretty straightforward. Unfortunately, things are not always as they seem, and a number of downsides have to be considered.

There are XML and middleware "arguments" in which people state the value of moving information from system to system using XML because XML provides a "standard information format." Information coming from the source system is being reformatted into XML (as content and metadata). The XML is being moved using some sort of middleware, and then the information is again reformatted into something understandable by the target system (see Figure 17.3). What's the gain in all this reformatting? Not much.

This process requires reformatting twice, when it should be necessary to reformat only once. Moreover, because XML requires that the metadata come along for the ride with the information, the size of the message is going to be increased. In the end, a lot of redundant and unnecessary information is being passed around with little gain other than the "warm, fuzzy feeling" that accompanies messages using a "standard" text format.

A less warm and fuzzy, but better approach is to translate a message, such as MQSeries or MSMQ, into XML when needed. Most middleware vendors, such as IBM and BEA, are taking this approach.

Moving information using XML as the common format

Figure 17.3. Moving information using XML as the common format

Persistent XML

While there are products that provide persistent XML storage, XML itself does not provide a good database format for medium-to-large data sets. XML requires that portions of the XML document exist in memory. Otherwise, the document will be parsed and reparsed, which results in significant performance problems. While this approach may sound reasonable, it demands a large amount of memory over time with typical organic database growth. Moreover, pre- and post-processing is a requirement to take care of special characters (e.g., the ampersand) encapsulated within the XML document.

Database vendors are moving quickly to address this need. Virtually every major relational database vendor, such as IBM and Oracle, is pledging support for XML. Object-oriented database vendors, who have yet to see a significant market for their products, consider XML storage to be their salvation. XML is so easy to incorporate into products, due to the simplicity of the technology, that most vendors can join the XML playing field with minimal effort. There probably won't be a lot of product dollars to be made here by technology vendors, but with XML on the checklists at most major corporations, it cannot be ignored.

Data types represent another limitation of XML. An ASCII text-based format, XML does not provide facilities for complex data types or binary data (e.g., multimedia data). To solve this problem many vendors are proposing new standards to the Worldwide Web Consortium (W3C) to bind XML to binary information. They are also coming out with their own XML hybrids before the W3C is able to react. Of course, the result is a smattering of products without any notion of standardization.

RDF and EAI

RDF (Resource Description Framework), a part of the XML story, provides interoperability between applications that exchange information. RDF is another standard defined for the Web that's finding use everywhere, including EAI. RDF was developed by the W3C to provide a foundation of metadata interoperability across different resource description communities.

RDF uses XML to define a foundation for processing metadata and to provide a standard metadata infrastructure for both the Web and the enterprise. The difference is that XML is used to transport data using a common format, while RDF layers on top of XML—defining a broad category of data. When the XML data is declared to be of the RDF format, applications are then able to understand the data without understanding who sent it.

RDF extends the XML model and syntax to be specific for describing either resources or a collection of information. (XML points to a resource, in order to scope and uniquely identify a set of properties known as the schema.)

RDF metadata can be applied to many areas, including EAI, for example, searching for data and cataloging data and relationships. RDF is also able to support new technology (such as intelligent software agents and exchange of content rating).

RDF itself does not offer predefined vocabularies for authoring metadata. However, the W3C does expect standard vocabularies to emerge once the infrastructure for metadata interoperability is in place. Anyone, or any industry, can design and implement a new vocabulary. The only requirement is that all resources be included in the metadata instances using the new vocabulary.

RDF benefits EAI in that it supports the concept of a common metadata layer that's shareable throughout an enterprise or between enterprises. Thus, RDF can be used as a common mechanism for describing data within the EAI problem domain. However, RDF is still in its infancy, and middleware vendors will have to adapt their products to it in order for it to be useful.

XSL and EAI

XSL (Extensible Style Language) is a simple declarative language that programmers use to bind rules to elements in XML documents and so provide behavior. Extensible through JavaScript and written in XML, XSL builds on existing standards, such as the Document Style Semantics and Specification Language (DSSSL) and Cascading Style Sheets (CSS).

The ability to provide a rules-processing layer to XML gives XSL its greatest benefit to EAI because this ability allows for the creation of rules to define routing, flow, and even transformation. However, as with XML, XSL is a Web-born standard, and the value to intra-enterprise information movement is still not clear.

Unlike HTML, XML tags do not support a behavior mechanism for formatting their content. XSL was designed from the ground up to support higher-structured and data-rich XML documents, as well as transformation features. XSL can take an XML document and reformat it, changing both the schema and the content (see Figure 17.4), albeit in a very limited way.

XSL can transform an XML document.

Figure 17.4. XSL can transform an XML document.

XML and EAI

XML will add value to EAI. However, because it is over-applied in so many areas of technology, this new standard is actually devalued. It will be some time before it will be clear where XML best fits within most EAI problem domains. Right now, it is difficult to cut through the hype and misinformation. It remains prudent to understand the features of XML and the possible emergence of XML-related standards (RDF, XSL, and so on) as standard EAI technologies. The trend is clear. We're just not there yet.

XML is text based and, therefore, easily understood by all platforms. Although this limits the types of data that can be encapsulated using XML, its simplicity remains its strength. XML's value is its ability to share information with systems that may be beyond the user's control (perhaps in the enterprise but still out of immediate control). In those situations, XML becomes the least common denominator for information, with XML parsers available on most platforms and middleware vendors beginning to support parsers. Still, too many tradeoffs still jeopardize the certainty that XML will dominate the future of EAI.

XML is good at providing a format for messaging and protocols. Therefore XML makes sense because it requires that the data be formatted in a way easily processed by the target systems.

XML's ability to support simple, self-describing messages is a clear advantage in some types of situations. Its use on the Web is a clear fit due to the fact that data is being sent down to a browser that does not know the structure of the data unless it's sent along with the data. Traditional message-oriented middleware rarely provides a self-describing message format. However, most enterprises are usually integrating systems that are tightly coupled, and the format of the data is well defined by the developer using the middleware.

Building on this idea, XML's value is really the ability to share information between loosely-coupled systems. This means supply chain integration or merger and acquisition scenarios. Electronic Data Interchange (EDI, including ANSI X.12) has dominated this type of communication for some time now. However, its complexity and expense could drive many toward XML. This is a shift just waiting to happen. EDI vendors see it coming, and they are scrambling around trying to incorporate XML technology into their core product sets.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset