CHAPTER 8

Go Forth and Ontologize

This final chapter offers a variety of practical tips and guidelines to put the wrapping touches on what you need to know to get out there and build some real-world ontologies. We’ll have a look at some modeling tools, principles patterns, and pitfalls to speed you along your way.

Although the sections in this chapter have been thoughtfully ordered, there are relatively few dependencies among them. Feel free to skip around.

8.1    MODELING PRINCIPLES AND TOOLS

8.1.1    CONCEPTUAL AND OPERATIONAL

In traditional relational data modeling there are three distinct modeling levels: conceptual, logical, and physical. The first captures the semantics of the subject matter of the data but is not executable and therefore is not operational in any implementation. The physical model is executable and operational. Unfortunately, the process of going from conceptual to physical involves various optimizations and some of the semantics is lost. It was always a bit of a fairy tale that the logical and conceptual models would be kept in synch as the database evolved. It just didn’t happen.

This problem does not arise when you create triple store databases using ontologies as the schema. The ontology is loaded directly into a triple store and used in an operational system. Therefore it is both conceptual and operational.

Keep this in mind when designing your ontology. It’s a balancing act. You don’t want to just put more and more into your ontology because it is cool and seems important. Be guided by what you expect triples to look like in triple stores of various applications that will use the ontology. But don’t go so far as to include a lot of highly application-specific things that are not really about the subject being modeled.42

8.1.2    CONCEPTS, TERMS, AND NAMING CONVENTIONS

The saying “a rose by any other name is still a rose” is just as true in logic as it is in English or any other natural language. Although it makes no difference to the computer or the inference engine, choosing good names makes it easier for the human to understand and use the ontology. So choose names carefully.

The word “term” is commonly used to refer to a name of a concept. The informal term will typically be the label on the concept. The formal term will be the IRI, usually abbreviated by just using the local name. It can also have alternate labels to represent synonyms.

In principle, because OWL does not make the unique name assumption, a single concept can have more than one IRI. In practice that will not arise unless you are interacting with ontologies built by others.

Terms are important, but not as important as the concepts. In healthcare, a key concept is that of being a person who receives care during a patient visit, the term that names that concept is doe:Patient. By analogy, the word “rose” is the term that is used to name the concept we all know about in our world, described in detail in Wikipedia,43 and other places.

Ideally, when a person sees a term, the meaning you intend is the first one that comes to mind. Avoid ambiguous terms. For example, one definition of “loan” is: “an amount of money loaned at interest by a bank to a borrower, for a certain period of time.”44 Here, the loan is the amount of money. But the money is really just a part of the broader agreement between the borrowing and lending parties with a repayment schedule and other contract terms. Use of the ambiguous term, “loan” could lead to wrong or inconsistent use of the ontology and should be avoided. Focus first on the key concept, and then give it a name. Here, the loan contract is key, and a good class name would be: LoanContract.

Avoiding ambiguous names often leads to longer names, as in this example. You can use the OWL annotation property, rdfs:label to provide a prettier name, e.g., “loan contract” for use in the user interface of software driven by the ontology.

Don’t Let Terms Get in the Way

When modeling, it often happens that terms can get in the way. If you are focused on coming up with an OWL representation of the term “loan” and there are multiple parties that must agree, you may quickly run into trouble because people use the term differently. The same is true for the term “risk.” For example, in the sentence “there are several risks if we go off-piste skiing in the mountains today’, the word risk means ‘something bad happening.” If you are in the mountains and stop to consider the risk of avalanche if you ski down a steep gully, then the word “risk” is a synonym for probability.

Different meanings will come to mind for different people when they see terms like “loan” and “risk.” Don’t risk using such overloaded terms (so to speak).45

Naming Conventions

Other than blank nodes, resources are named with IRIs. The main things you will be naming are classes, properties, and individuals. However, each ontology also has its own IRI. You will also be naming the ontology files. It is good practice to name the ontology file to exactly echo the local name of the ontology IRI. For example, if the ontology IRI is http://ontology.myco.com/event, then the main part of the file name should be “event.” Think twice before including the file extension in the IRI because the same ontology can be saved out in various syntaxes or file extensions.

The following conventions for naming classes and properties are widely used.

1.  Classes: singular noun phrases in upper camel case, e.g., LoanContract

2.  Properties: tense neutral verbs in lower camel case, e.g., hasBorrower

There are no standard conventions for naming individuals. You can use the one we use in this book (prefixing with an underscore). Or you may wish to look around and choose among other conventions, or invent your own.

It is critically important that every IRI is globally unique. An important place to start is to use only those namespaces whose domains you control. For example, at Semantic Arts, http://ontologies.semanticarts.com/gist#Person is the IRI for the person class.

Generally, you should follow any conventions that are widely used independently from your organization. This makes it easier for others to understand your ontology. Only change things up if you need to, and document the decisions and rationale.

Meaningless IRIs

Although the practice of IRIs being suggestive of the meaning of the resource is nearly universal, there are compelling reasons to have the IRIs themselves be meaningless, and to use labels and other annotations to convey meaning to the humans. For example, instead of doe:DataSystem, you would have an IRI such as doe:UffcsEE3452X and the label would say: “data system.”

Why would anyone do this? Suppose someone decides that the term “data system” is misleading and what’s really meant is “data store.” If the IRI is meaningless, you can leave it as it is and just change the label. There is minimal impact.

If the IRI was doe:DataSystem, then you have an ontology update problem. All the triples using the old IRI need to change. All the SPARQL queries referring to the globally unique IRI in applications using that ontology need to change.

It’s a tradeoff. The problem with meaningless IRIs is that you never have any idea what they mean by looking at them. There always has to be some kind of tool support to show you the meaningful labels.

Summary: Naming Conventions

1.  Ensure that all IRIs are globally unique and resolvable as URLs.

2.  Follow widely used conventions when possible. Otherwise document your decisions and apply them consistently across your organization.

3.  Use IRIs and labels that are suggestive of the meaning of the concept. Err on the side of long and specific names to avoid ambiguity.

4.  Think about the tradeoffs of using meaningless IRIs and see what makes sense in your situation.

8.1.3    MODELING CHOICE: DATA OR OBJECT PROPERTY?

Always use an object property when you are relating two individuals with IRIs. Consider a data property when it is natural to think of the property as having a value, such as a string, date, or number. Most of the time it is pretty clear what to do. If you are not sure, ask the following question:

Will you need to say something about the value of the property?

If not, you can safely choose to use a data property. If so, then you will need an object property. See Section 4.8.2 for a detailed analysis of when to use datatype vs. object properties.

8.1.4    MODELING CHOICE: CLASS OR PROPERTY?

Most of the time it is clear whether a concept should be modeled as a class or as a property. Use a class to represent a kind of thing that is conveniently described using a noun phrase. Use a property to represent a relationship that is conveniently described using a verb phrase. For each of the terms in Table 8.1, decide whether you think you should represent the underlying concept as class or as a property.

Table 8.1: Classes or properties?

Image

Many of these are fairly straightforward. However, it is not always so obvious, especially when roles are involved. You could create an employs object property connecting an organization to the persons that it employs. If you need to track additional information about employment such as start and end dates, then you can use a class to model the concept behind the word “employs” that appears in the table. See Figures 7.7 and 8.8 for a standard way to model employment.

As another example, consider the borrowing and lending parties on a loan contract. Below are three ways to model this (see Figure 8.2).

Image

Figure 8.2: Three ways to model borrowing and lending.

Using properties: You could have hasBorrower and hasLender as properties that link the contract to the borrowing and lending parties, (either persons or organizations). In this case, the concepts of borrowing and lending are represented as properties.

Using classes: You could have a more general property called hasParty that links to instances of classes Borrower and Lender. Here the concepts of borrowing and lending are represented as classes.

Using properties and classes: You can combine the two and represent the concepts of borrowing and lending both as classes and properties. Then the contract would be linked to the instances of Borrower and Lender using the properties hasBorrower and hasLender, respectively.

The questions below provide criteria for choosing a good approach.

1.  Which approach introduces the fewest new things to cover the same requirements? Avoid redundancy; less is more.

2.  Is the class a “true” kind of thing, or do individuals just happen to be members of the class as a result of being in a relationship with other things?

3.  Are you representing more meaning than you need to?

Ignoring Bank, Person, and hasParty, which are not specific to borrowing, the first approach introduces two properties, the second approach introduces two classes, and the third introduces all four. It needlessly represents the semantics of borrowing both as classes and properties and thus scores low by the first criteria.

Representing borrowing as a class scores low on the second criteria because being a borrower is just a consequence of being the borrowing party on a loan. Lending is slightly different, because some lenders lend as a business. So far so good, by the second criteria. However, what about all the persons and organizations that are lenders only because they are the lending party on a loan? So, having a class called Lender would be confusing. Does it refer to only organizations that are inherently lenders, or to any organization or person that happens to lend? Therefore, the second approach does not get high marks for the second criteria.

The third criteria is about meaning. Is there any meaning that needs to be expressed on a special borrower class that cannot be carried by a property relating the contract to the person? The answer depends on the circumstances. But chances are you don’t need it. When in doubt, leave it out.

8.1.5    MODELING CHOICE: CLASS OR INDIVIDUAL?

The question of whether to model something as a class or an individual usually does not arise. Model it as a class if it is a kind of a thing which has instances. Model it as an individual if it has no instances. Again, it is not always so obvious. For example, Starbucks is an individual corporation. Is each Starbucks store an instance of Starbucks? Think for a moment about this. Should we consider representing Starbucks the corporation as a class with individuals?

The answer boils down to the nature of the relationship between Starbucks the corporation and an individual Starbucks store. Is the store an instance of Starbucks? If not, what is it? One thing we can say is that Starbucks the corporation owns each store. So what is each store an instance of? We could create a class called StarbucksStore which is a subclass of Store which is necessarily owned by Starbucks the corporation. It is now more clear that a specific person like you or me and the specific corporation Starbucks should be represented as individuals, not classes. Individual people or corporations are not kinds of things having instances.

In Section 7.1, we saw that a product model such as the iPhone can be viewed both a member of some class, and also as a kind of thing that has individual members. The right choice depends on your specific needs (see Figure 7.2).

8.1.6    MODULARITY FOR REUSABILITY

Any ontology that will cover a useful part of your enterprise will have a mix of topics, some more specific than others. It is useful to create separate ontologies for different topics, and to have the more specific ontologies import the more generic ones. For example, most of the product specification ontology we built for electrical products had nothing to with electrical products. We took great care to split out anything that was specific to electrical products into a separate ontology. That way the product specification ontology could be used for other products across a range of industries.

In another example, suppose your business includes tracking various kinds of entertainment including sports, cinema, and theater. The central concept is an event that people can attend, whether it is called a performance, a game, or a showing.

Let’s say you start with theater and you will be tracking performances, theater groups, venues, and attendance. You might start to put classes in the theater ontology for things like performance, venue, capacity, ticket sales, etc. However, most of this has nothing to do with theater per se. Venues are important for a wide range of events, outside of the entertainment arena. Venues involve geography and the people and/or organizations that operate them. Entertainment always involves people and organizations that are putting on and/or attending the events. One way to organize this subject matter into ontology modules is depicted in Figure 8.1.

As discussed in Section 3.1.11, the construct for importing is owl:imports. The result of ontology A importing ontology B is the union of the sets of triples for both ontologies.

Image

Figure 8.1: Ontology modules for entertainment.

8.1.7    ONTOLOGY EDITORS AND INFERENCE ENGINES

There are a variety of tools for building ontologies. The most commonly used are Protégé and Top-Braid Composer. The former is open source. The latter is a commercial tool that has a free version. It’s a good idea to try each to see what you prefer. Both can be used to build OWL ontologies, covering most or all of what you are likely to need.

TopBraid Composer offers more robust support for RDF and SPARQL than Protégé, but the latter offers better support for OWL inference. Protégé has a variety of OWL DL reasoners built in and others are available as plugins. TopBraid Composer provides inferencing for some of the OWL 2 profiles, but none that will support all of OWL 2 DL.

Because OWL is a standard, either tool can load the output of the other. Therefore, you can choose to work with both tools, depending on your needs. There is a minor caveat: TopBraid Composer does not support files with a .owl extension. Use .rdf or .ttl instead.

Available inference engines include include Hermit, Fact++, Pellet, and TrOWL. Each of Pellet, Hermit, and Fact++ behave differently—some go faster on some ontologies but slower on others. They support OWL 2 DL, but run slowly on large complex ontologies. TrOWL goes dramatically faster (from minutes to seconds). This is because it leaves out inferences regarding datatypes.

If inference still takes too long, it is time to look at the ontology to see what you can change to speed things up. It is a bit of a black art. If you have lots of instances, try removing them. Sometimes too many inverse properties or disjoints used in certain ways can be a problem. In the worst case, it is a matter of trial and error.

Be careful about vendors claiming that they have ontology development tools that support OWL. The support can vary widely. Find out if it was developed natively to support OWL or whether OWL as an add-on. Find out what if any loss of information occurs when importing OWL 2 ontologies, and whether it supports round-tripping. Otherwise, once it gets in, it can’t get out!

8.2    MODELING PATTERNS

8.2.1    GENUS DIFFERENTIA

Many of the classes we have defined use what is known as the genus + differentia style of definition. The genus is the existing class on which the new class is based, the differentia says what is different about the new class. This is good practice (see Table 8.2).

Table 8.2: Genus + differential style definition

Class Being Defined

Genus

Differentia

Patient

Person

Received care on a patient visit

PatientVisit

Event

There is a care provider and a care recipient; both are persons

SecurityAgreement

WrittenContract

Has an owned thing with estimated value as collateral

SemviaOrganization

Organization

Is part of the overall Semvia organization

InternalSemviaTransaction

FinancialTransaction

Both the buyer and seller are Semvia organizations

8.2.2    ORPHAN CLASSES AND HIGH-LEVEL DISJOINTS

At the end of Section 2.3.5, we showed how stating that two classes are disjoint can help spot errors. This illustrated an important and useful pattern. Namely, to have a relatively small number of high level classes, and to carefully specify the disjointness relationships that exist between them.

Look at the inconsistent ontology example in Figure 8.3. The problem is that the individual, AdmitPatient is a member of two disjoint classes.

Image

Figure 8.3: High-level disjoints.

The highest-level classes are orphans, i.e., they have no parent class. In our simple example, there are five classes, and just two orphans. You want a small number because it is far easier to use the high-level disjoints pattern. If, instead of the five-classes depicted in Figure 8.3, the hierarchy was completely flat, you would need to make ten decisions, one for each pair of classes. With just two high-level classes, we only need to make one. Furthermore, this single decision results in 5 inferred disjointness axioms. Saying that say that Behavior is disjoint from Intention allows us to infer that:

Image

A realistic ontology will likely have dozens or possibly hundreds of classes. Take gist,46 for example, as depicted in Figure 6.14. This version has 131 classes and just 26 orphan classes in the asserted hierarchy, which reduces the number of decisions from 8,51547 in the case of a totally flat hierarchy to 325. After inference, there are just 18 orphan classes and only 153 pairs of classes to consider. That is manageable.

The next question is how best to ensure that there are a small number of orphan classes? First, you should make liberal use of the genus differentia pattern. The new class is inferred to be the subclass of the genus class.

Often it is easy to do this, but sometimes it is trickier. Ask yourself: what exactly is this thing? One way to make this task easier is to connect to an existing upper ontology. That is the next topic.

8.2.3    UPPER ONTOLOGIES

It is useful to connect your ontology to an upper ontology, which has generic concepts that are not specific to any subject, industry, or application. There are a number of advantages. First, you don’t have to reinvent the wheel, so it saves a lot of time. Second, you can build a better ontology by starting with a solid foundation and leveraging the axioms to find errors. Find one that has been developed by people you trust over a long period of time. Many of the wrinkles will have been worked out and it will be more stable and reliable.

What to look for in an upper ontology? In the enterprise context, look for the following characteristics:

1.  The upper ontology is easy to learn, understand, and use. This means manageable scope, being well documented and structured with plenty of axioms to specify the semantics.

2.  It matches your way of thinking and modeling your enterprise.

3.  It is mature and relatively stable, yet is still evolving and supported.

4.  It is suitable for business use, designed by business people for business people. An academically first rate ontology may not be a good fit for your enterprise.

5.  There is a community of users.

A comprehensive review of existing upper ontologies is beyond the scope of this book.48 A reasonable fit to the above ideal is gist. It is the only upper ontology specifically designed for the enterprise. It is free and open source, so long as you give appropriate attribution.

Non-business users will have different requirements. For example, there is an upper ontology that is widely used by scientists called BFO—Basic Formal Ontology.49

A note on terminology. The notion of “upper” is somewhat relative. The most useful way to think about it is whether an ontology has many of the higher level concepts that you can reuse. If you want an ontology for enterprise applications, you can thinik of gist as an “upper enterprise ontology.”

How an Upper Ontology Helps

The following deceptively simple example illustrates some of the benefits of importing an upper ontology. Suppose you are tracking data centers, each havening unique identifiers. One straightforward way to model this is to create the two classes DataCenter and DataCenterID that are linked by the property hasIdentifier (with inverse identifierFor). A restriction requires every DataCenter to have a DataCenterID. Figure 8.4 shows what this looks like in Protégé.

Careful inspection of this model will reveal two problems. One illustrates a common mistake that arises when using OWL restrictions. The other is a mistake of omission: an important characteristic about identifiers is left unsaid. Neither of these two mistakes will be found by running inference, and they could impact downstream applications.

Image

Figure 8.4: Simple ontology for data centers.

We will now show how we can catch both of these errors by importing and mapping to a simple upper ontology like gist, which already models identifiers. The DataCenterID class is a subclass of gist:ID and Datacenter is a subclass of gist:Building. The property gist:identifies (with inverse gist:identifiedBy) could be used directly, but the datacenter ontology already has different IRIs for the same properties as we saw above. The property identifierFor is equivalent to gist:identifies, and their respective inverses are also equivalent.

Image

Figure 8.5: Connecting to an upper ontology.

We say as much using owl:equivalentProperty, as described in Section 4.9. The datacenter properties now have all the characteristics of the gist properties that they are equivelant to (e.g., inverse functional). This is depicted in Figure 8.5. Figure 8.6 shows what this looks like in Protégé.

If we run inference, Datacenter becomes equivalent to owl:Nothing (which represents the empty set). That means it DataCenter cannot possibly have any members. This is clearly an error. To find the mistake, you click on the question mark to get an explanation justifying the inference (see Figure 8.7).

The explanation is not very human-friendly, but it does offer clues. The root cause is that the restriction used the property the wrong way around (identifierFor instead of hasIdentifier). A more human-friendly explanation for how the error was caught is as follows.

1.  A DataCenter is a gist:Building which is, in turn, a gist:Place.

2.  The range of gist:identifiedBy (which is equivalent to identifierFor) is gist:ID.

3.  The restriction says that the DataCenter is the identifierFor at least one DataCenterID.

4.  Because identifierFor is equivalent to gist:identifies, which is the inverse of identifiedBy, it has a domain of gist:ID.

5.  Therefore, DataCenter is inferred to be a gist:ID.

Image

Figure 8.6: Data centers in Protégé.

6.  But gist:ID is disjoint with gist:Place.

7.  Therefore, DataCenter is both a gist:Place and a gist:ID, but those two classes are disjoint, so DataCenter cannot have any members.

If you explicitly added a DataCenter instance, the ontology would become inconsistent, which is a worse problem than one class not being able to have any members. The mistake of omission is corrected automatically by connecting up the properties. An identifier must identify only one thing. This characteristic is inherited from the gist:hasA property which is inverse functional. In summary, by making a few simple connections to an existing upper ontology, the following benefits are possible.

1.  The existing model for identifier includes things that the modeler of the new ontology might or might not have thought of (e.g., disjoint classes, property characteristics).

2.  There is no need to model the concept of an identifier, it is already done for you.

Image

Figure 8.7: Detecting and resolving bugs.

3.  The inference catches logical inconsistencies, such as the common mistake of getting the property direction wrong on an OWL restriction.

4.  The accuracy and completeness of the ontology is improved, resulting in fewer mistakes in downstream applications.

5.  To the extent that other ontologies will be built in a similar manner, the problem of silos is much less likely to arise. The ontologies will have a shared core.

6.  It will be possible to query across multiple data stores without doing any additional mapping. For example, if someone is interested in getting a list of buildings in a given area, they may be surprised to see data centers turning up. This happened because DataCenter is a subclass of Building. The effort to connect to the upper ontology is modest compared to the work to map across silos after the fact.

In short, using an upper ontology helps you build better ontologies faster. It takes less time because you can reuse already modeled concepts. Your ontology is better because by hitching a ride on the semantics of the imported ontology, inference can catch errors that you can easily miss.

8.2.4    N-ARY RELATIONS

OWL only alows binary properties (i.e., “2-ary relations”) that connect two related things in a triple. We saw some relationships that genuinely involve more than two things. For example, stearyl alcohol is used as a surfactant in a shampoo, and I was employed at Boeing from 1997–2008.

Figure 7.7 shows how OWL is used to represent these n-ary relations. N is 3 for the material usage example (the material, the purpose, and the product) and n is 4 for the employment example (the employer, the employed, the start date, and the end date).

The general pattern for reifying an n-ary relationship to create a class with n restrictions. An instance of that class representing the relationship holding between n particular individuals requires n+1 triples—one to create the instance, and one for each restriction (see Figure 8.8).

Image

Figure 8.8: Representing a 4-ary relationship with 5 triples.

8.2.5    BUCKETS, BUCKETS EVERYWHERE

As categorizing machines, we humans like to create metaphorical buckets, and put things in them. Different kinds of buckets are modeled differently in OWL. The most common bucket represents a kind of thing, such as person, or building. Bill Gates goes into the person bucket, and the Edinburgh Castle (Figure 3.2) goes into the building bucket. Such buckets are represented as OWL classes and we use rdf:type to put things into the bucket. For example:

Image

Another kind of bucket is for when you have a collection of things, like people on a jury, that are functionally connected in some way. Those related things go into the bucket. For this case, we create a class called Collection and a subclass called Jury whose instances represent the buckets containing individual jurors, e.g., the jury for the OJ Simpson trial. Use isMemberOf to put the jurors into the bucket. For example:

Image

Convince yourself that a collection does not represent a kind of thing. A jury is a kind of thing, a particular jury is not.

A third kind of bucket corresponds to a tag which represents a topic and is used to categorize individual items for the purpose of indexing a body of content. For example, the tag “_Winter” might be used to index photographs, books and/or YouTube videos. Any content item that depicts or relates to winter in some way should be categorized using this tag.

The representation for this echoes how we represent collections. The differences are (1) the bucket is an instance of a subclass of Category, rather than of Collection and (2) we put things into the bucket using categorizedBy rather than isMemberOf.

The tag for winter corresponds to a bucket containing all the things that have been indexed/categorized using that tag. Some of the classes and properties used here are borrowed from gist.

Table 8.3: Different kinds of buckets

Image

Image

Figure 8.9: Representing different kinds of buckets.

8.2.6    ROLES

Table 8.4: Using properties to represent roles

Image

Roles are fairly pervasive in modeling. They arise most frequently for events, agreements, and when reifying n-ary relations. For example, the event of a patient visit always has one person as the patient and at least one as the care provider. There may be others participating in some capacity, say a physician’s assistant. The event of playing any game will have at least two individuals or teams playing a competitor role. In some cases, those roles are distinguished, e.g., home team vs. away team.

Roles also show up in agreements. There are always at least two parties to any agreement. For example, an employment contract has an employer and an employee. A software licensing agreement has a licensor and the licensee. A loan contract has a borrower and a lender. There is an important event attached to a mortgage contract, which is a property appraisal, which in turn has other roles, e.g., the appraiser (a person) and the property being appraised.

Earlier, in Section 8.1.6, we discussed different ways to model such situations. There is a simple approach that works most of the time illustrated at the top of Figure 8.2. It is to represent the semantics of the role in a property, rather than create classes for each role. The main advantage of this approach is that it is simple and it works (see Table 8.4).

It’s easy to make this overly complicated, and even easier to get confused—I’ve been there. Stick with this simple approach unless you have specific examples that won’t work. I know exactly what might justify a more complex approach, but in my many years of building enterprise ontologies in many industries, it almost never arises.

This completes our discussion of modeling patterns. Next, we consider some common pitfalls

8.3    COMMON PITFALLS

8.3.1    READING TOO MUCH INTO IRIS AND LABELS

A term that is suggestive of its meaning helps humans understand the intended meaning. However, it is easy to fall into the trap of thinking that the computer will somehow be able to divine that information also. The computer “knows” only as much as you tell it. For further discussion on this, see Section 3.1.8.

8.3.2    UNIQUE NAME ASSUMPTION

If you are used to systems that make the unique name assumption, it may catch you off guard that in OWL two different IRIs can both refer to the same thing.

If an individual is linked by a functional property such as hasBiologicalMother to two different IRIs, then a triple connecting the two IRIs using owl:sameAs can be inferred (see Figure 4.13).

8.3.3    NAMESPACE PROLIFERATION

It is a common practice to have a different namespace for every ontology. We regard this as a bad practice. Why? It makes refactoring the ontology more difficult and much more disruptive. If you need to move some classes or properties from one ontology to another, you have to change the IRI. This means you have to change every use of that IRI. That could impact hundreds of triples in dozens of ontology modules for large ontologies with many separate ontology modules.

That is time-consuming and error-prone if done by hand. Automation is possible, but may be disruptive is there are multiple ontology authors working on different modules.

There is also a risk of disruption to the user community. They will have to change all their data and application code that makes use of the old IRIs. Although this can be mitigated by using the OWL deprecation feature (discussed in Section 8.4.5), having lots of deprecated things around creates clutter. In our experience, it is better to limit the introduction of new namespaces to situations where there is a different governance body that is minting the IRIs.50

8.3.4    DOMAIN AND RANGE

Given two things connected by a given property, domain, and range are used to indicate what classes those two things necessarily belong to.

The most common pitfall is to think of domain and range as integrity constraints, they are not. Instead, domain and range sanction certain inferences. If you get surprising inferences that don’t make any sense, double check that you are using domain and range correctly.

Another common pitfall is not realizing that if you specify two or more domain classes, then the actual domain of the property is the intersection of those two or more classes. The same goes for multiple range classes.

Finally, be careful not to define the domain or range too narrowly. This limits the possibility of reuse. Additional details on pitfalls of using domain and range are found in Section 4.4.1.

8.3.5    LESS IS MORE

It is easy to be lured into thinking that more is better. It’s not. Smaller and simpler ontologies are easier to understand and use, and more computationally efficient as well. A good principle to use is “when in doubt, leave it out.”

One thing to be wary of is building out a nice big class hierarchy for something when most of the distinctions will never be used. Below we consider two specific aspects of the “less is more” principle. One is to keep the number of primitives small. The other is to avoid proliferation of properties.

Create and Use a Small Number of Primitives

The ontology will be easier to understand and use if the number of primitives is relatively small and other concepts are built up from these primitives. For example, consider the definitions of Patient, PerformedProcedure, and Party. Each is defined entirely from existing lower level concepts. The genus differential pattern is used for Patient. These examples are illustrated in Figure 8.10. A “primitive” is where you stop. You don’t define it in terms of other things, it is foundational.

Image

Figure 8.10: Class equivalence reusing existing concepts.

Avoid Proliferation of Properties

One way we keep the number of primitives small is to avoid introducing new properties that don’t add new semantics. In the genealogy domain, say we have the property, hasSibling and we want to model brothers and sisters. One way is to create two sub-properties, hasBrother and hasSister, whose ranges are Male and Female, respectively. See the top of Figure 8.11. Note that the dotted green arrows show what can be inferred.

Alternatively, we can capture the semantics of brother and sister using a restriction with the inverse of the existing property, hasSibling. This way we can define the brother and sister concepts entirely in terms of existing primitives with the same number of classes and without creating any new properties.

The first approach captures the essential difference in meaning between Brother and Sister by creating two new properties and using Male and Female as their domains, and then using those properties in restrictions. The second approach expresses the exact same meaning by reusing the hasSibling property as part of the genus differentia pattern where the genus classes are Male and Female and the hasSibling property is used in the restrictions expressing the differentia. (see Figure 8.11).

Keeping the number of primitives low has practical value. The fewer things you have, the easier it is to find what you need. It helps during ontology development and evolution.

Image

Figure 8.11: Avoiding property proliferation.

Exercise 1: What justifies the inference of Brother into Male and Sister into Female in the examples shown in Figure 8.11.

When To Create New Properties?

We have seen a number of examples analogous to the sibling, brother, sister example. You can use hasPart for many circumstances, rather than numerous variations such as hasWheel and hasChapter. There are many kinds of identifiers including vehicle identification numbers and serial numbers. You don’t need separate properties to connect to each kind, e.g., hasVIN, hasSerialNumber. Rather, you can represent the different kinds of identifiers as classes (VIN, SerialNumber) and use a single property such as isIdentifiedBy to connect things to their identifiers.

Sometimes the opposite occurs. Instead of having a single property that points to several different classes, there are times when you want several different properties pointing to the same kind of thing. For example, the properties hasAnnualSalary, hasCreditLimit, and hasPrice all point to an amount of money. These properties are not pointing to different kinds of money, they denote different relationships that things have with money.

The common theme here is that it makes sense to introduce a new property when it represents semantically different way of relating two different things, and you need to know about that difference to support your requirements. If the new subproperty you wish to create is only different because it has a narrower domain or range, there is a good chance you don’t need it.

At times, if a group has been using particular terms describing relationships for a long time, it can be better to just have some extra properties, even if they are not adding anything semantically. It can make the ontology easier to understand and use.

8.4    LESS FREQUENTLY USED OWL CONSTRUCTS

So far, we have focused on the most widely used OWL constructs. There are some additional ones that we briefly mention here, in case you may need them.

8.4.1    PAIRWISE DISJOINT AND DISJOINT UNION

Pairwise disjoint: If you have a set of classes that are all disjoint from one another, you can use the owl:AllDisjointClasses construct to avoid the need to specify each pair, one by one,

Disjoint union: If you want to partition a class into subclasses that are pairwise disjoint and their union is equal to the starting class, then you can use the owl:disjointUnionOf construct.

8.4.2    DATATYPES

At the end of Section 3.1.5, we went over the most commonly used datatypes, however there are many others. The complete list may be found in the OWL2 Reference Card51; they are fully documented in the OWL 2 Specification.52 There are various kinds of numbers (real, float, positiveInteger) as well as some miscellaneous datatypes such as Boolean, hexBinary, and base64Binary.

It is also possible to create custom datatypes. For example, you could define a datatype and call it SSN for social security number and require it to be 9 digits where the first digit has to be a 0 or 1. Due to poor inference performance, this facility is rarely used.

8.4.3    DIFFERENT INDIVIDUALS

Because the unique name assumption does not hold for OWL, the inference engine does not assume that two different IRIs refer to two different individuals. If you want inferences that will only fire if two individuals are different, then you have to explicitly declare them to be different. For example:

:_JanSmith owl:differentFrom :_JanAnnSmith.

Pairwise different from: You can use the owl:AllDifferent construct to say a group of specified individuals are all different from each other. This works just like owl:AllDisjoint-Classes.

8.4.4    SAME INDIVIDUALS

Similarly, two IRIs can refer to the same individual. For example, if you are integrating different databases, you might realize that the two individuals _Barak_H_Obama and _BarakObama are the same individual. You can say this with the following triple:

:_Barak_H_Obama owl:sameAs :_BarakObama.

When you are building your own ontology, you will generally not need to use owl:sameAs. It comes is most handy when you are using other ontologies or data sets. In Section 4.6, we saw how the inference engine can infer owl:sameAs when using functional properties (see Figure 4.13).

8.4.5    DEPRECATION

If an ontology evolves in such a way that may cause disruption to a user community, OWL provides a mechanism to handle deprecation. For example, suppose you want to change the name of the property memberOf to isMemberOf. You would deprecate the former and make it equivalent to the latter. This works by using the Boolean annotation, owl:deprecated. For example:

Image

This gives the downstream users the option to leave the old property in place and update to the new property if and when desired.

8.5    THE OPEN WORLD REVISITED

As explained in Section 3.1.3, OWL uses open-world inference. This means that the inference engine is open to the possibility that there is more to know than what has been directly asserted. In short, there is a difference between “no” and “don’t know.” Fortunately, you don’t have to worry too much about it in the early stages of learning OWL.

Practical Import

What does the open world mean in practice? Most of the time when you use a description logic (DL) reasoner on the TBox to check the consistency of your ontology, the fact of open world reasoning won’t be so noticeable. We highlight two situations when you want to be aware of the open world.

1.  You cannot infer an individual to be a member of a universal (all values) restriction, nor a max or exact cardinality restriction.

2.  You can put lots of restrictions on your classes that you know to be true in the world, even if there is no expectation that any one application will use them all.

Preventing Inference into Restriction Classes

If you are trying to use inference to classify individuals, you need to be aware that you can never infer an individual into a restriction class using max or exact cardinality. Why? Consider the bicycle that must have exactly two wheels. Because of the open world, the inference engine will always allow that there might be other wheels that it does not know about, so it can never be certain that there are only two. It can be certain if there are more than two, in which case it will conclude that the individual is not a member of the class.

You also cannot infer into an all values restriction for a similar reason. There might be another assertion that comes along that breaks the pattern.

Even though you cannot make certain inferences, it can still be a good idea to include these restrictions. Why? Because it helps achieve the goal of faithfully representing subject matter. It communicates to the humans using the ontology what is intended. This reduces the likelihood of errors in applications using the ontology. It can also serve as a formal specification for later implementation of integrity constraints using SPARQL or SHACL.

Restrictions that May Go Unused

If you are creating a loan ontology for a financial institution, the central concept will likely be loan contract. There are dozens of possible things you could say about loan contracts by connecting the contract to other individuals or literals. There will be many applications in the enterprise about loans, and every one of those properties will be used in at least one of the applications, but probably no one application will need to use all of them.

During ontology development, when you attach restrictions to the class for loan contract, you are modeling the real world. So you can say whatever is true about loan contracts that you think you will need in your enterprise. So you might have 20 restrictions. In the open world, this is fine. Every loan contract in the world has that many properties.

If you have a closed-world mentality, you may be hesitant to put so many restrictions on a class, for fear that if you don’t have the data, then instances of loan contract might not be allowed. Don’t hesitate. Just put those restrictions on, as long as they are in scope for the anticipated uses of the ontology. The idea of “allowing” something to be the triple store is where SHACL comes in. That is where you choose just what subset of properties you wish to use, and their cardinality.53

Closing the World

Once you load an ontology into a triple store, populate it with data and use SPARQL to query it, the world is no longer so open. A SPARQL query processor will only see the triples that are in the store. If you are confident in your data quality, you could use SPARQL or SHACL to specify rules to assert new triples that a DL reasoner will not infer due to the open world. For example, if you could infer things to be Bicycles. When you move to creating applications and using SHACL and SPARQL, you are for practical purposes, closing the world.

A way to close the world in a localized way is to use enumerated classes that cannot have any more members than the ones explicitly listed.

8.6    SUMMARY LEARNING

Modeling Principles and Tools

An ontology in OWL can be both conceptual and operational at the same time. There are two main ontology editors in widespread use, Protégé and TopBraid Composer. The former has better support for OWL, the latter has better support for RDF and SPARQL. Pay attention first to the concepts, and then think of good terms. It is wise to follow standard conventions for naming, and to document any departures. It’s a good idea to break up large ontologies into smaller reusable modules, using the OWL import mechanism.

Given a property that has some kind of value, represent it as a data property unless you need to say something about that value, in which case, use an object property.

Usually it is clear whether to model something as a class or a property. A common exception is for roles, where either choice can work.

It is also usually obvious whether something should be represented as a class or as an individual. However, due to OWL limitations about metaclasses, you sometimes have to resort using an OWL individual to represent what in the world is a kind of thing.

Modeling Patterns

Use the genus differentia pattern to keep the number of orphan classes small, making it easier to specify high-level disjoints. When using the genus differentia pattern, connect to high-level classes in an upper ontology, which has generic concepts that apply across many industries and subjects. An upper ontology helps you build better ontologies faster by reusing already modeled concepts and leveraging existing axioms to detect inconsistencies.

To represent an n-ary relationship, create a class with n restrictions. The standard way to represent a “bucket” in OWL is to use a class whose instances are in that bucket. A collection is another kind of bucket, that is for functionally connected things. A third type of bucket corresponds to the set of items categorized with a particular tag.

There is a simple and effective way to model roles that works most of the time. The semantics of the role is captured in a property such as hasLender or hasCEO. These properties point to persons or organizations that play the role.

Common Pitfalls

Beware of falling into the trap of thinking that the computer will know what you mean by your terms. It only knows as much as you tell it. There is no unique name assumption in OWL. Domain and range are used to infer what kinds of individuals are the subject or object of a triple. They cannot be used as integrity constraints. It is common bad practice to over-constrain domain and range or properties. This limits their use and drives the creation of unnecessary properties.

Less Frequently Used OWL

There are some convenience constructs for specifying disjointness among multiple classes. There are numerous datatypes that are available for special uses and a mechanism for creating your own datatypes. You can declare two or more individuals to be the same or different from each other. There is a deprecation mechanism for evolving ontologies.

Open World

An OWL inference engine does not assume it knows everything. If it cannot prove something is true or false, it is open to the possibility of there being more information out there that it does not know about yet. Once you move to a triple store and use SPARQL, the world is “closed.” A SPARQL query processor cannot distinguish between “no” and “don’t know”, where as an OWL inference engine can.

8.7    FINAL REMARKS

There are challenges and techniques along the way for learning any skill, whether it is fixing things around the house, kayaking down a raging river, or tracking down cyber criminals. Initially, you will make plenty of mistakes and it will take hours to figure out how to do things for the first time. The process of becoming an expert is about learning from mistakes and broadening the scope of situations that you can comfortably handle. The same is true for building an ontology.

There is a saying: “If there are two ontologist in the room, there will be at least three opinions about how to model a given thing”. There are many skilled ontologists whose work I know and respect, who do things differently than I do.

The process of becoming expert also entails having a sense of when and whether to “break the rules”. Expert photographers know which scenes work well divided in two, and will successfully break the rule of thirds. In this book I have shared a number of guidelines for what I consider to be good practice. As your expertise grows, you will begin to question some of the recommendations in this book. It is good to evaluate and re-evaluate why you do things in a certain way. If you decide to break a rule, do so with care and understand your rationale. Sometimes it matters less how you do something, as to do things in a consistent way.

GO FORTH AND ONTOLOGIZE!

42  For more information on conceptual vs. operational ontologies, see: https://www.slideshare.net/UscholdM/conceptual-vs-operational-a-false-distinction

43  See https://en.wikipedia.org/wiki/Rose (accessed on January 16, 2018).

44  See http://www.dictionary.com/browse/bank-loan (accessed on January 16, 2018).

45  See “The Importance of Distinguishing between Terms and Concepts,” https://semanticarts.com/blog/terms-concepts-whats-important-pt-1/,

46  https://www.semanticarts.com/gist/.

47  If there are n classes, then there are (n*(n-1))/2 different pairs of classes.

48  There was a recent meeting devoted to the topic: http://ontologforum.org/index.php/SummerInstitute2017.

49  http://ifomis.uni-saarland.de/bfo/.

50  See: “Finding and Avoiding Bugs in Enterprise Ontologies” for an example: http://ceur-ws.org/Vol-1586/codes1.pdf.

51  https://www.w3.org/2007/OWL/refcardA4.

52  https://www.w3.org/TR/owl2-syntax/#Datatype_Maps.

53  See: https://www.slideshare.net/UscholdM/putting-fibo-to-use.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset