CHAPTER 14 Conclusions

For those readers who are accustomed to various sorts of knowledge modeling, the Semantic Web looks familiar. The notions of classes, subclasses, properties, and instances have been the mainstay of knowledge modeling and object systems modeling for decades. It is not uncommon to hear a veteran of one of these technologies look at the Semantic Web and mutter, “Same old, same old,” indicating that there is nothing new going on here and that everything in the Semantic Web has already been done under some other name elsewhere.

As the old saying goes, “There is nothing new under the sun,” and to the extent that the saying is correct, so are these folks when they speak of the Semantic Web. The modeling structures we have examined in this book do have a strong connection to a heritage of knowledge modeling languages. But there is something new that has come along since the early days of expert systems and object-oriented programming; something that has had a far more revolutionizing effect on culture, business, commerce, education and society than any expert system designer ever dreamed of. It is something so revolutionary that it is often compared in cultural significance to the invention of the printing press. That something new is the World Wide Web.

The Semantic Web is the application of advanced technologies that have been used in the context of artificial intelligence, expert systems and business rules execution in the context of a world-wide web of information. The Semantic Web is not simply an application running on the Web somewhere; it is a part of the very infrastructure of the Web. It isn’t on the Web; it is the Web.

Why is this important? What is it that is so special about the Web? Why has it been so successful, more so than just about any computer system that has come before it?

In the early days of the commercial Web, there was a television ad for a search engine. In the ad, a woman driving a stylish sports car is pulled over by traffic policeman for speeding. As he prepares to cite her, she outlines for him all the statistics about error rates in the various machines used by traffic policemen for detecting speeding. He is clearly thrown off his game, and unsure of how to continue to cite her. She adds personal insult by quoting the statistics of prolonged exposure to traffic radar machines on sperm count. The slogan “Knowledge is Power” scrolls over the screen, along with the name of the search engine.

What lesson can we learn from ads like this? This kind of advertising made a break from television advertising that had come before. Knowledge was seen not as nerdy or academic but useful in everyday life—and even sexy. Or at least it is if you have the right knowledge at the right time. The web differed from information systems that preceded it by bringing information from many sources—indeed, sources from around the world—to one’s fingertips. In comparison to Hypercard stacks that had been around for decades, the Web was an open system. Anyone in the world could contribute, and everyone could benefit from that contribution. Having all that information available was more important than how well a small amount of information was organized.

The Semantic Web differs from expert systems in pretty much the same way. Compared to the knowledge representations systems that were developed in the context of expert systems, OWL is quite primitive. But this is appropriate for a web language. The power of the Semantic Web comes from the web aspect. Even a primitive knowledge modeling language can yield impressive results when it uses information from sources from around the world. In expert systems terms, the goals of the Semantic Web are also modest. The idea of an expert system was that it could behave in a problem-solving setting with a performance that would qualify as expert-level if a human were to accomplish it. What we learned from the World Wide Web (and the story of the woman beating the speeding ticket) is that typically people don’t want machines to behave like experts; they want to have access to information so they can exhibit expert performance at just the right time. As we saw in the ad, the World Wide Web was successful early on in making this happen, as long as someone is willing to read the relevant webpages, digest the information, and sift out what they need.

The Semantic Web takes this idea one step further. The Web is effective at bringing any single resource to the attention of a web user, but if the information the user needs is not represented in a single place, the job of integration rests with the user. The Semantic Web doesn’t use expert system technology to replicate the behavior of an expert; it uses expert system technology to gather information so an individual can have integrated access to the web of information.

Being part of the web infrastructure is no simple matter. On the Web, any reference is a global reference. The issue of managing global names for anything we want to talk about is a fundamental web issue, not just a Semantic Web issue. The Semantic Web uses the notion of a URI as the globally resolvable reference to a resource as a way of taking advantage of the web infrastructure. Most programming and modeling languages have a mechanism whereby names can be organized into spaces (so that you and I can use the same name in different ways but still keep them straight when our systems have to interface).

With the World Wide Web, the notion of a name in a namespace must be global in the entire web. The URI is the web-standard mechanism to do this; hence, the Semantic Web uses the URI for global namespace identification. Using this approach allows the Semantic Web to borrow the modularity of the World Wide Web. Two models that were developed in isolation can be merged simply by referring to resources in both of them in the same statement. Since the names are always maintained as global identifiers, there is no ad hoc need to integrate identifiers each time; the system for global identity is part of the infrastructure.

An important contributor to the success of the World Wide Web is its openness. Anyone can contribute to the body of information, including people who, for one reason or another, might publish information that someone else would consider misleading, objectionable, or just incorrect. At first blush, a chaotic free-for-all of this sort seems insane. How could it ever be useful? The success of the Web in general (and information archiving sites like Wikipedia in particular) has shown that there is sufficient incentive to publish quality data to make the overall Web a useful and even essential structure.

This openness has serious ramifications in the Semantic Web, which go beyond considerations that were important for technologies like expert systems. One of the reasons why the Web was more successful than Hypercard was because the web infrastructure was resilient to missing or broken links (the ”404 Error”). The Semantic Web must be resilient in a similar way. Thus, inferencing in the Semantic Web must be done very conservatively, according to the Open World assumption. At any time, new information could become available that could undermine conclusions that have already been made, and our inference policy must be robust in such situations.

In the World Wide Web, the openness of the system presents a potential problem. How does the heroine of the search engine commercial know that the information she has found about radar-based speed detection devices is correct? She might have learned it from a trusted source (say, a government study on these devices), or she might have cross-referenced the information with other sources until she had enough corroborating evidence to be certain. Or perhaps she doesn’t really care if it is correct but only that she can convince the traffic cop that it is. Trust of information on the web is done with a healthy dose of skepticism but in the same way as trust in other media like newspapers, books, and magazine articles.

In the case of the Semantic Web, trust issues are more subtle. Information from the Semantic Web is an amalgam of information from multiple sources. How do we judge our trust in such a result even if we know about all the sources? To some extent, the same principles apply. We can trust entities that we know or have experience with, and we can trust entities that have gone through some process of authorization and authentication. When we combine information, we must also understand the impact that each information source has on the outcome and what risk we are taking if we cannot trust that source. These important issues for understanding the reliability of the Semantic Web are still a subject of research.

In this book, we examined the modeling aspects of the Semantic Web: How do you represent information in such a way that it is responsive to a web environment? The basic principles underlying the Semantic Web—the AAA slogan, the Nonunique Naming assumption, and the Open World assumption—are constraints placed on a representation system if it wants to function as the foundation of a World Wide Web of information. These constraints have led to the main design decisions for the Semantic Web languages of RDF, RDFS, and OWL.

There is more to a web than just the information and how it is modeled. At some point, this information must be stored in a computer, accessed by end users, and transmitted across an information network. Furthermore, no triple store, and no inference engine, will ever be able to scale to the size of the World Wide Semantic Web. This is clearly impossible, since the Web itself grows continually. In the light of this observation, how can the World Wide Semantic Web ever come to pass?

The applications we discussed in this book demonstrate how a modest amount of information, represented flexibly so that it can be merged in novel ways, provides a new dynamic for information distribution and sharing. SKOS allows thesaurus managers around the globe to share, connect, and compare terminology. FEARMO allows government agencies to operate autonomously while conforming to a central standard for enterprise architecture. The NCI ontology coordinates efforts of independent life sciences researchers around the globe.

How is it possible to get the benefit of a global network of data if no machine is powerful enough to store, inference over, and query the whole network? As we have seen, it isn’t necessary that a Semantic Web application be able to access and merge every page on the Web at once. The Semantic Web is useful as long as an application can access and merge any webpage. Since we can’t hold all the Semantic Web pages in one store at once, we have to proceed with the understanding that there could always be more information that we don’t have access to at any one point. This is why the Open World assumption is central to the infrastructure of the Semantic Web.

This book is about modeling in the context of the Semantic Web. What role does a model play in the big vision? The World Wide Web that we see every day is made up primarily of documents, which are read and digested by people browsing the Web. But behind many of these webpages, there are databases that contain far more information than is actually displayed on a page. To make all this information available as a global, integrated whole, we need a way to specify how information in one place relates to information somewhere else. Models on the Semantic Web play the role of the intermediaries that describe the relationships among information from various sources.

Look at the cover of this book. An engineering handbook for aquifers provides information about conduits, ducts, and channels sufficient to inform an engineer about the pieces of a dynamic fluid system that can control a series of waterways like these. The handbook won’t give final designs, but it will provide insight about how the pieces can be fit together to accomplish certain engineering goals. A creative engineer can use this information to construct a dynamic flow system for his own needs.

So is the case with this book. The standard languages of RDF, RDFS, and OWL provide the framework for the pieces an engineer can use to build a model with dynamic behavior. Particular constructs like subClassOf and subPropertyOf provide mechanisms for specifying how information flows through the model. More advanced constructions like owl :Restriction provide ways to specify complex relations between other parts of the model. The examples from the “in the wild” chapters show how these pieces have been assembled by working ontologists into complex dynamic models that achieve particular goals. This is the craft of modeling in the Semantic Web: combining the building blocks in useful ways to create a dynamic system through which the data of the Semantic Web can flow.

