CHAPTER 1

Getting Started: What Do We Need to Say?

1.1    WHAT IS AN ONTOLOGY? WHAT IS OWL?

There are countless definitions of “ontology” that you may wish to explore further, and whole papers were written on the subject in the formative years of the field. For our purposes, just think of an ontology as a model which represents some subject matter. We avoid the common usage of the term “domain” as a synonym for subject matter, because it has a formal meaning in OWL. An ontology communicates what kinds of things there are (for the subject matter of interest) and how they are related to each other. It is built so that automated reasoning software can draw conclusions resulting in new information.

An ontology is different from other models you may have seen in that it represents some (suitably scoped) subject matter as a whole, rather than a model of a particular thing (like an airplane) or a predictive model (of hurricanes, earthquakes, or climate). Also, an ontology can provide a structure for data like a database schema can. However, unlike the latter, an ontology can provide great value even in the absence of data.1

Many notations and languages have been used over the years to represent ontologies, some more rigorous than others. The Web Ontology Language (OWL), developed by the World Wide Web Consortium (W3C), is a language for representing ontologies that is based on formal logic, a discipline that evolved from philosophy and mathematics. It is the only standard for representing ontologies that is widely used both in academia and industry. This book uses examples from a variety of industries based on the commercial ontologies that have been developed for our clients.

In the remainder of this chapter, we explore some of the key concepts in a variety of industries that one might wish to model using OWL. We will discover the kinds of things we need to be able to say about the subject matter to be modeled and show many examples. In doing so, we identify requirements for what OWL must be able to express. In Chapter 2, we explain how to express them in OWL.

1.2    IN THE BEGINNING THERE ARE THINGS

An ontology is a model that represents some subject matter. For any subject, there are things that you care about and want to identify and express in an OWL ontology. What are some specific things in different industries? Say you are a healthcare provider. What are the most important things in healthcare? Stop reading for a moment and brainstorm. Write down a dozen or more of the things that come to mind. For now, focus mostly on things that are written down as nouns (or noun phrases). These nouns will correspond to kinds of things in your subject. Then ask yourself, what are the one, two, or three most fundamental things in healthcare that would concern you as a provider?

Say you are in the finance industry—perhaps a discount broker, or maybe the CIO of a full service investment bank. What are the things that come to mind? What are the most important things in this industry? If it helps, limit the scope to managing assets. Do the same exercise. Brainstorm, write down a dozen or more of the key things, and identify up to three of the most central things in this industry.

Say your job is to manage registration and documentation of ongoing changes for corporations and charities for some particular jurisdiction (e.g., the state of Washington). Go through the same exercise once more. What are the most central things of importance in this subject area?

If none of these examples stimulate you, think of others, ones you know a lot about and have passion for. Pick any subject matter that you would like to have an ontology for. Identify a dozen or so key things, and select the most important ones. In the database field, this activity of identifying what is important in some subject matter is part of data modeling.

Table 1.1: Core ideas for different subjects

Image

Table 1.1 shows some of the things you might have written down. The items in bold are central to that subject.

1.3    KINDS OF THINGS VS. INDIVIDUAL THINGS

Whatever the purpose of your ontology, there is always a dance that goes on between thinking about a particular individual vs. the generic kind of thing that it is. For example, Google Inc. is an individual thing. The kind of thing it is, is Corporation. We need to be clear whether we are talking about an individual thing, or the kind of thing that individual is.

If John Doe saw the doctor on January 12, 2014, the kind of thing that “seeing the doctor” is, is called a “patient visit.” The kind of thing that John Doe is, is patient. If John was treated by Jill Smith, then the kind of thing Jill Smith is, would be, person, and more specifically, a doctor or nurse. The patient, John Doe is also a person, and might also be a cancer patient.

Image

Figure 1.1: Individual things and their kinds. Individuals have rounded corners; the rectangles depict kinds.

Figure 1.1 depicts individuals as shapes with rounded corners and their kinds as rectangles. Pick a few of the concepts listed in Table 1.1, especially those in your favorite subject. Then identify the kinds of things and, for each, identify some specific individuals.

1.4    NO THING IS AN ISLAND

By now, I hope you have written down a few dozen things in a few industries. The things that came to mind likely have important roles to play in that subject area, and thus will be related to a number of other things that are also important for that subject. Decide which of the things you wrote down are most important in the sense that they have the richest set of relationships with other things in the same subject area. These are likely to be the most important things you chose in the last exercise.

1.4.1    HEALTHCARE

Arguably the most central thing in healthcare is the event of a person receiving healthcare of some sort. After all, the primary goal of healthcare is to keep people healthy. Do you have something in your list that includes this? Perhaps a doctor’s visit or hospital stay. We will use the more general notion of a patient visit which links to both a patient and a healthcare provider (for example, a doctor). There is also the care that was provided, which often includes a diagnosis and treatment. Figure 1.2 illustrates some key things in healthcare and how they are related.

The dotted lines can be concluded from knowing the solid line connections. If you know who the care recipient and care provider are for a given patient visit, you can conclude who received care from who (or, conversely, who gave care to whom).

Image

Figure 1.2: Inter-related things in healthcare.

1.4.2    FINANCE

In the asset management side of finance, the goal is to get a return on assets. This is accomplished by a series of financial transactions, most prominently trades. For example, you might purchase 50 shares of Google common stock (GOOG). You might have to sell a bond to free up assets to make that purchase. Each of these transactions is a trade, so the trade is central in asset management.

A trade is not an island. Like a patient visit, it is connected to a variety of other things. What is it related to? There is a buyer, a seller, and possibly a broker. Money changes hands, and ownership is transferred from one legal entity to another along with associated formal documentation. Figure 1.3 illustrates some key things in asset management and how they are related.

Image

Figure 1.3: Inter-related things in finance.

1.4.3    CORPORATE REGISTRATIONS

Exercise 1: Draw a diagram highlighting the key concepts and relationships for the subject of registering corporations just like we did for finance and healthcare. The idea is to scan your list of items or the list in the third column of Table 1.1 and to pick what you think is the central concept. Then list a few of the key relationships that link it to other concepts that are in the table, or that you think are important but are not in the table.

1.5    THINGS CAN HAVE A VARIETY OF ATTRIBUTES

We can say quite a lot about a given thing by specifying relationships connecting it to other things, as in the above examples. But there is more that we want to say that is not so easily handled in that way. For example, many things have associated names and dates. We may wish to state how old someone is. For example, Jill Smith has the first name, “Jill” and her age is 32. Google’s official name is “Google LLC” and it was incorporated on September 4, 1998 (see Figure 1.4).

These kinds of statements are very important, but are different from the other statements we have been making. We have been talking about two individual things being related to one another. But capturing information about names and dates is specifying information about what characteristics or attributes a given individual has. Unlike connecting one individual to another, we are connecting something to a literal value, typically a string, a number or a date.

From another perspective, note that it makes sense to say a trade is related to the broker on that trade or that Jane is related to her patients. But it is quite awkward to say “Google is related to the string ‘Google LLC’ ”, or that Jane is in a relationship with the string “Smith.” Instead of connecting one individual to another, we are connecting an individual to a literal value. Things like “age,” “date of incorporation,” and “first name” are commonly referred to as attributes. Rather than saying one thing is in a relationship with another thing, we say that something has an attribute whose value is a literal of some kind.

The most common kind of literal is a string, which is used for names, descriptions, and many other things. Other kinds of literals include dates and different kinds of numbers like integer or decimal. Numbers will be used for measuring and counting things like weight and age. A date is a specially formatted item with a very specific meaning. From a modeling perspective, the main thing that characterizes a literal is that we won’t be saying anything more about particular literals such as “John” or the number 32. Literals can be thought of as pure values; they don’t have properties or attributes of their own (see Figure 1.4).

Image

Figure 1.4: Some common attributes and their literal values.

1.6    MORE GENERAL THINGS AND MORE SPECIFIC THINGS

In our examples so far, we have come across different kinds of things, including patient visit, trade, person, doctor, nurse, patient, legal entity, and corporation. Notice that every corporation is a legal entity. Thus, a corporation is a specific kind of legal entity. What other kinds of legal entity are there? Persons are legal entities, as are many other kinds of organizations, e.g., partnerships, limited liability companies (LLCs), cooperatives, and some charities to name a few. We can think of a legal entity as generalizing these other kinds of things.

Consider the concepts doctor and nurse. There is a more general kind of thing that each of these can be seen as more specific variations of. What might that be? Both are healthcare providers, and both are also persons. What other examples can you find in the things we have seen so far, where one is more general or more specific than another? Have a look at Figure 1.5 that covers the three subject areas we have considered.

Image

Figure 1.5: A hierarchy of different kinds of things.

This concludes the discussion about what kinds of things you need to say to build an ontology. Recall that we said an ontology was built in such a way as to support drawing conclusions from existing information. We consider that next.

1.7    DRAWING CONCLUSIONS

Despite their remarkable capabilities, computers take things quite literally and at times seem rather dumb. They don’t know the most simple and obvious things. Fortunately, that is changing. These days once a man indicates he is male on an online health form, there is good chance it won’t bother to ask him whether he is pregnant. This is a simple example of the computer doing something a little smarter. It was able to draw the conclusion that the man was not pregnant because it knows that he is male and males cannot be pregnant. Computer programs that are designed to draw conclusions that logically follow from an existing set of data or assertions are called automated reasoners or inference engines.

While this sort of thing can easily be accomplished through hard-coded rules, that approach does not scale. We want a more general way to tell the computer things and have it apply some general principles that allow it to draw a wide variety of interesting and useful conclusions.

Given our examples so far, can you think of some situations where you would want the computer to automatically draw some conclusions for you? Below are a couple examples to get you started. They are depicted in Figure 1.6. Bold lines are for a kind of links, thinner lines are for instance of links. Solid lines are directly asserted, dotted lines indicate the drawing of conclusions.

1.  If a cancer patient is a kind of patient, and a patient is a kind of person, then we want the computer to be able to figure out what common sense tells us, that a cancer patient is also a kind of person.

2.  If Google is a corporation, and a corporation is a kind of legal entity, then we want the computer to conclude that Google is in fact a legal entity, just as common sense tells us.

Image

Figure 1.6: Drawing simple conclusions.

When we say the system should be able to “figure out what common sense tells us” we mean it should be able to take the existing information it has and to conclude some new information that follows logically from that information. OWL is based on formal logic, which we will discuss in Chapter 3. It specifies exactly when something follows logically from something else and thus what conclusions should be drawn. It’s one thing to draw a conclusion that is immediately obvious to a human. Automated reasoning with OWL can also draw conclusions that logically follow through a chain of reasoning, even when the conclusions are not obvious.

One kind of conclusion is determining that there is a logical inconsistency. This tells you there is a bug in your ontology or in your data. For example, in Figure 1.7 the red line with an X in it denotes that nothing can be both a person and a corporation. If we already know that Google is a corporation, and someone comes along and mistakenly says that Google is also an instance of person, there is a problem that an automated reasoner can detect. The computer has been explicitly told that:

1.  nothing can be both a corporation and a person and

2.  Google is both a corporation and a person.

Image

Figure 1.7: Reasoning helps to find an inconsistency.

This is an example of two kinds of things having no overlap. Look again at Table 1.1. See if you can find other examples of two kinds of things that cannot overlap.

Exercise 2: How do you resolve the apparent contradiction that the U.S. Supreme Court’s Citizen’s United decision declared that a corporation is legally a person with the above common sense example in the ontology? Can you think about it in such a way that there is in fact no contradiction?

1.8    DATA AND METADATA

To create a good model of the subject matter of interest, OWL needs to give modelers a way to say the following.

1.  There are individual things.

2.  There are kinds of things, some of which do not overlap.

3.  An individual is an instance of a certain kind of thing.

4.  There are more specific and more general kinds of things.

5.  There are relationships between things.

6.  Things have attributes with literal values.

These things must be said in a way that supports drawing conclusions both to add new information, and to detect and debug logical inconsistencies. They can be viewed as requirements for OWL, or any ontology modeling language, for that matter.

So far we have been very informal in describing different subjects. The diagrams I have been drawing are meant to be the kinds of diagrams you might draw on a whiteboard in a brainstorming session. Afterward, you would use them as the basis for creating an ontology using the more formal notation of OWL. That is covered in detail in the next chapter. We will now give a hint about what that will look like.

Once you have an ontology, it can be used to communicate meaning to other humans. While this is important in its own right, we will focus on how humans can get the computer to do useful things with the ontology. One of the main things you will want to do with an ontology is to use it as the basis for creating individuals and relationships between them and storing that as data. This is called populating the ontology.

The ontology provides a vocabulary for creating individuals and making statements about them. For example, consider the following statements in (not-too-stilted) English.

•  John Doe is an instance of Person.

•  Jill is an instance of Doctor.

•  Jane is an instance of Nurse.

•  John Doe’s visit to the doctor is an instance of Patient Visit.

•  John Doe’s visit to the doctor was on date 12Jan2013.

•  John Doe’s visit to the doctor has care recipient, John Doe.

•  Jill was a care provider on John Doe’s visit to the doctor.

•  Jane was a care provider on John Doe’s visit to the doctor.

•  John Doe received care from Jill.

•  John Doe received care from Jane.

The vocabulary for the subject matter of healthcare is in gold for kinds of things, blue for relationships connecting individuals, and green for attributes with literal values. While not as natural-sounding, it is technically accurate to say that attributes are relationships that connect individuals to literals.

The individuals are in burgundy, and the single literal is in black. Note the generic relationship, is an instance of; it is neither a created individual nor part of the vocabulary of healthcare. Rather, it is part of the vocabulary for modeling. The formal name for this in OWL is rdf:type. We will get into that in the next chapter.

Each of the above sentences in English has a subject, a predicate (i.e., verb), and an object and is asserting something to be true. Count the parts: one, two, three—each sentence is a triple. Some example triples are graphically depicted in Figure 1.8.

Image

Figure 1.8: Assertions as triples.

So the ontology is the vocabulary for talking about the subject matter of interest. It is used to create and give meaning to data. That vocabulary is also represented as triples. For that reason the ontology is said to play the role of metadata for a database of triples.

In the next chapter, we describe how OWL meets the six requirements (at the beginning of this section) for describing subject matter in general, and how to create data using the subject matter vocabulary.

1.9    SUMMARY LEARNING

In this chapter, we learned what kinds of things we need to say when building an ontology.

What Is an Ontology?

An ontology is a model of some subject matter that you care about and OWL is a formal language for expressing the ontology. It communicates what kinds of things there are and how they are related to each other in a way that supports automated reasoning. An ontology can also be used informally to communicate shared meaning among humans.

What Do You Need to Say When You Build an Ontology?

The kinds of things we need to say to build an ontology are relatively few. They constitute an informal set of requirements for OWL. What you need to say is that there are:

1.  individual things;

2.  kinds of things (some of which do not overlap);

3.  individuals of a certain kind;

4.  more specific things and more general kinds of things;

5.  relationships that connect things to other things; and

6.  relationships that connect things to literals.

The things that are said are called assertions, and they are represented as triples. The ontology provides a vocabulary that plays the role of metadata.

Drawing Conclusions

If we are careful to say things very precisely, then the computer can draw conclusions for us. The act of drawing conclusions is referred to as performing inference. Special computer programs do this; they are called inference engines or reasoners. Two ways that this helps are (1) it makes the computer seem smarter and (2) the reasoner can detect and explain logical inconsistencies which leads to better ontologies. Stating that two kinds of thing are not overlapping is a big help in spotting inconsistencies.

1   See: Ontologies and Database Schema: What’s the Difference?

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset