5Information Modeling

UML

The Unified Modeling Language (UML89), created in the mid-1990s, is an industry standard that defines a set of modeling languages for making various kinds of models and diagrams in support of object-oriented problem analysis and software design. Its core languages are Class Diagrams for information/data modeling, and Sequence Diagrams, Activity Diagrams and State Diagrams (or State Charts) for process/behavior modeling.

UML class diagrams provide a visual syntax for expressing UML class models, which allow defining information and data models. They can be used both at the more abstract level of conceptual modeling for requirements engineering and at the more detailed level of design modeling for designing the model classes of an app. Their main building blocks are class rectangles and association lines.

A class rectangle has one, two or three compartments, containing the name of the class, its properties, and its methods. The purpose of a class is to classify objects and to define their properties and the methods that can be invoked on them.

RDF and OWL

The Resource Description Framework (RDF90) has been defined by the W3C in 2004 as a logical formalism that allows (1) formalizing information models in the form of RDF vocabularies, and (2) representing propositional information (e. g., meta-data) on the Web. The core of a UML class model can be expressed as an RDF vocabulary. However, many types of integrity constraints cannot be expressed in RDF. The W3C has therefore defined an extension of RDF, called the Web Ontology Language (OWL91), which allows to formalize the core logic (classes, properties and integrity constraints) of a UML class model in the form of an OWL ontology.

An association line connects two class rectangles. The purpose of an association is to classify relationships (links) between objects. While UML classes have a direct counterpart in the class concepts of object-oriented programming (OOP) languages, UML associations do not have such a direct OOP counterpart. They are therefore often more difficult to understand for developers. Only in the special case of a unidirectional functional association there is a direct OOP counterpart: a reference property for referencing the objects that are linked to a given object by the association.

From a logical point of view, a class model defines a vocabulary, or language, for expressing various types of fact statements about objects. The knowledge representation languages RDF and OWL allow to formalize the vocabularies defined by class models and the fact statements made when instantiating their classes by creating objects. In this way, they help to understand the semantics of information models.

5.1Classes with Properties and Methods

In a UML class diagram, a class has a name (shown in the first compartment of the class rectangle), and it may have properties (shown in the second compartment) and methods (shown in the third compartment). Properties and methods may be described with or without details. The following diagrams illustrate these options using the example of a class books or Book for describing books as information objects.

A class can be expressed in UML by just providing its name, without any further detail, like so

This option is useful for making sketches and overview diagrams. Using an ordinary English plural name like books makes the class diagram more readable for non-tech-savvy people.

A more informative description of a class is obtained by listing its properties, possibly without any further detail, like in the following example:

However, for better understanding the meaning of properties and for being able to code a class in an OO programming language, we need to know the range of each property, which is the type of its values. The range of a property can be either a primitive datatype or another class.

In the following diagram we use general implementation-agnostic datatype names (like “Integer”), for which a specific programming language may have specific names (like “int” in Java). Notice that we now use a common OOP naming convention of giving classes a capitalized singular (mixed-case) name like Book (or LearningUnit). This allows saying that “an instance of a class C is a C (object)”, like “an instance of Book is a book (object)”.

Notice how the standard identifier attribute isbn is marked with the keyword id appended to the property declaration in curly braces. This is the UML syntax for defining several kinds of property constraints discussed in the next chapter.

Finally, we can also define the methods and functions of a class in a third compartment, like so:

In this example, the Book class has a function checkISBN, which returns a string. Given a class diagram in this form, it is straightforward to code it in an OO programming language like JavaScript or Java.

Recall that in JavaScript a class is defined in the form of a constructor function that assigns the values of its parameters to the properties of the newly created object, like so:

In JavaScript, the (instance-level) methods of a class are defined as method slots of the constructor’s built-in prototype object. This is how we code the checkISBN method:

If we don’t have to care about older web browsers, such as Internet Explorer 9, we can also use the new class definition syntax (introduced in the ES6 version of JavaScript) and combine the definition of properties and methods in one piece of code:

As opposed to JavaScript, Java has always had a language element class for defining classes:

We need to be aware of the ambiguity of the term “object”. We have to distinguish between objects in the sense of real-world objects (also called “business objects” or “entities”) and objects in an OO program, such as JS objects or Java objects. When we want to manage information about business objects of some type in an app, we represent them in the form of JS/Java objects instantiating a JS/Java class that represents their (business) object type. We call these classes model classes for two reasons: first because they implement the classes defined in an app’s data model, and second because they represent the ‘model’ part of an app’s Model-View-Controller codebase architecture.

Therefore, in a JS/Java app, a business object is a JS/Java object, but not every JS/ Java object represents a business object because we use JS/ Java objects for many purposes (e. g., in JavaScript, an array is a a JS object, but it’s not a business object). The same applies to classes: (business) object types are represented as model classes, but not every JS/Java class is a model class because we may use JS/Java classes also for other purposes (e. g., in Java, a class can be used as a container for a method library, but such a class is not a model class).

5.2Connecting Classes with Associations

Whenever an app has to manage the data of more than one object type, it is very likely that there are associations between some of them. For instance, in the following class diagram, there is an association between publishers and books and an association between books and people as authors.

An association between two classes can be read in both directions. The association between publishers and books associates

  1. the books published by a publisher with this publisher, as indicated by the association end name published books,
  2. inversely (from right to left), the publisher of a book with this book.

The association between books and people as authors associates

  1. the authors of a book with this book, as indicated by the association end name authors,
  2. inversely (from right to left), the books authored by a person with this person, as indicated by the association end name authored books.

As will be discussed in Volume 2, associations are characterized by multiplicity constraints, which restrict the possibilities of how many objects of the associated class can be linked to an object of the given class. In our example, we have a one-to-many association between publishers and books and a many-to-many association between books and people.

For keeping things simple, we only include one object type and no association in the apps discussed in this volume of the book. In Volume 2, we will discuss how to model associations and how to implement them.

5.3From a Conceptual Model via a Design Model to Class Models

In a new development project, we start our analysis and modeling effort with making a conceptual information model. This type of model is also called domain model since it describes the entities of a given (real-world) problem domain, and does not model software entities.

Recall the conceptual information model for books obtained as the result of the inception phase:

Taking this conceptual model as a starting point, we have to make a number of design decisions for obtaining an information design model:

  1. What are the ranges of the properties isbn, title and year?
  2. Which property is the standard identifier (ID) attribute?
  3. Which constraints should be defined?
  4. Which methods/ functions should be part of the design?

The result of this design phase is a design model like the following:

It is important to understand that such a design model provides an implementation-agnostic (platform-independent) computational design, that is, it does not use any concept or syntax of any specific programming language or technology. Therefore, the same design model can be used for deriving different platform-specific implementation models for different programming languages or technologies, such as for a Java- or PHP-based framework, or for a plain JavaScript approach.

Based on the design model, by replacing the platform-independent datatype names with JavaScript-specific datatype names, and by adding “setter” methods, we obtain the following JavaScript implementation model, which we prefer to call a JavaScript class model:

Notice that in this model, we have used the JavaScript datatypes string and number, and we have added the methods setISBN, setTitle and setYear. These “setter” methods are supposed to be used for setting a property to a new value, instead of directly assigning the value to the property.

Having a setter method for each property is a best-practice approach that allows more control over property value assignments. For instance, we could check the validity of values before they are assigned, or we could notify other modules of the app about the assignment event.

The implementation phase consists of making an implementation model for a specific technology platform, and then coding this model and testing the resulting program code. In this book, we make both JavaScript class models and Java class models, which are subsequently coded in plain JavaScript and in Java EE, respectively.

The entire transformation chain, from a conceptual model via a design model to a JavaScript class model (as a special type of implementation model), is summarized in the following figure.

Figure 5.1 From a conceptual model via a design model to a JavaScript class model

In summary, the process of model-based development takes a conceptual model as the starting point for making a general (platform-independent) design model, from which one or more implementation models for a (set of) specific target technologies can be derived. Typically, they include a class model for an object-oriented programming language and a database model for an SQL DBMS. This process is illustrated by the following diagram:

5.4Excursion: Formalizing Information Models with RDF and OWL

The Resource Description Framework (RDF), together with its extension RDF Schema, is a logical formalism that allows

  1. formalizing information models in the form of RDF vocabularies consisting of class definitions and property definitions, where both class names and property names are URIs (representing globally unique identifiers);
  2. representing propositional information (in the form of statements about individuals) on the Web, embedded in web pages or in the form of special web data sources.
Figure 5.2 From a conceptual model via design models to implementation models

RDF is the basis of the Semantic Web. It has several syntaxes, including the textual XML-based syntax of RDF/XML and the visual syntax of RDF Graphs.

5.4.1RDF vocabularies

Consider the Book class defined in the following class diagram

The corresponding RDF vocabulary, with one class definition and three property definitions, is defined in the following RDF graph:

In an RDF graph, nodes with an elliptic shape represent “resources” (like properties and classes), and arrows represent relationships defined by a property. Each arrow between two nodes represents a statement (also called “triple”). For instance the rdf:range arrow between year and xs:int represents the statement that the range of the property year is the XML Schema datatype xs:int, where xs is a namespace prefix for the XML Schema namespace.

Notice that RDF has the predefined meta-classes rdfs:Class and rdf:Property, used to define classes and their properties with the help of the predefined property rdf:type. For instance the rdfs:type arrow between year and rdf: Property represents the statement that year is of type rdf:Property, that is, it is defined to be an RDF property.

RDF graphs are a formalism for theoretical purposes. They can be used for illustrating simple examples. As opposed to UML class diagrams, they are not useful for visually expressing realistic vocabularies, due to their convolution and unnecessary visual complexity.

The domain of a property has to be defined explicitly in an RDF vocabulary (with an rdfs:domain property statement), as opposed to a UML class diagram where it is defined implicitly. While it is natural to define properties in the context of a class, as in UML, RDF allows defining properties independently of any class.

The RDF/XML syntax allows publishing an RDF vocabulary on the Web. For instance, the simple Book vocabulary defined in the RDF graph above, can be represented by the following RDF/ XML document:

Notice that the values of the rdf:resource attribute must be URIs. If an attribute value is a fragment identifier like #Book, it represents a relative URI and is resolved into a full URI by appending the fragment identifier to the in-scope base URI, which may be defined with the xml:base attribute.

If an attribute value is an absolute URI like “ http://www.w3.org/2001/XMLSchema#string”, it contains a full namespace URI (like “ http://www.w3.org/2001/XMLSchema”), even if a namespace prefix (like “xsd” or “xs”) is defined for it. This is because namespace prefixes can only be used for XML element and attribute names, but not for attribute values, which unfortunately makes RDF/XML hard to read for human users.

Notice that the RDF formalization of our simple UML class model above has several shortcomings:

  1. It does not express the constraints that all three properties are mandatory and single-valued, which they are by default in UML.
  2. It does not express the constraints that the ISBN property, as a standard identifier (or primary key) attribute, is mandatory and unique.

We show how to solve these two issues with the greater expressivity of OWL below.

5.4.2RDF fact statements

The propositional information items, or fact statements, expressible with RDF are

  1. classification statements like “ex:Book is a rdfs:Class” or “urn: isbn:006251587X is a ex:Book”, and
  2. property statements of the sort “the ex:isbn property value of urn: isbn:006251587X is ’006251587X’”.

Consequently, for a UML object definition like

we obtain several RDF fact statements:

5.4.3Expressing structured data in web documents

There are many use cases for machine-readable data (e. g., about people, events, products, etc.) embedded in web documents. For instance, search engines like Google can use such structured data92 for providing more meaningful search results.

Structured data, or meta-data, can be embedded in a web document by either adding a JSON-LD93script element containing it, or by annotating the document’s content, e.g., the HTML elements of a web page, with RDFa94.

Very limited annotation approaches, called “microformats” (proposed around 2005), are the historic predecessors of the general annotation language RDFa, which is derived from RDF. Some microfomats, like vCard and vEvent, are still being used today, but they are increasingly replaced with one of the two general formats RDFa and JSON-LD.

The main author of HTML5, Ian Hickson, has proposed an alternative general annotation language, called microdata95, with the goal to simplify RDFa and remedy its usability issues (in particular, by dropping its use of XML namespaces). Despite the (rather unfortunate) choice of using different names for the same annotation concepts (like “itemprop” instead of “property”), Hickson’s microdata proposal succeeded to show

  1. how to get essentially the same annotation functionality at lower usability costs, and
  2. how to integrate annotations with the DOM.

Since Hickson ended his collaboration with the W3C, the microdata proposal did not succeed to get an official W3C status, and web browsers have discontinued their support for it. However, it triggered a W3C proposal to use the RDFa Lite subset of RDFa, which “can be applied to most simple to moderate structured data markup tasks, without burdening the authors with additional complexities”.

We present a simple example for using structured data in a web page. Consider the following HTML fragment:

For this content, we may want to code the information that

  1. the available information is about an entity of type Person, which has been defined as a class by the search engine standard vocabulary schema. org96;
  2. the name of the person is “Carly Rae Jepsen”;
  3. the telephone number of the person is “1–800 –2437715”.

Using the RDFa attributes typeof, vocab and property, we can code this information by adding the following annotations to the HTML content:

Using JSON-LD, as recommended by Google, we need to add a script element of type “application/ld+json” containing the meta-data:

The propositional information expressed with RDFa annotations and JSON-LD corresponds to the following RDF/XML code:

5.4.4OWL vocabularies and constraints

OWL extends RDF by adding many additional language elements for expressing constraints, equalities and derived classes and properties in the context of defining vocabularies. Facts are expressed as in RDF (e. g., with rdf:Description).

OWL provides its own predefined language elements for defining classes and properties:

  1. The predefined class owl:Class is a subclass of rdfs:Class.
  2. The predefined class owl:DatatypeProperty is a subclass of rdf:Property. It classifies attributes. Therefore, the values of an owl:DatatypeProperty are data literals.
  3. The predefined class owl:ObjectProperty is a subclass of rdf:Property. It classifies reference properties corresponding to unidirectional binary associations. Since the values of a reference property are object references, the values of an owl:ObjectProperty are object references in the form of resource URIs.

We only show with the help of an example that an OWL vocabulary can represent a class diagram more faithfully than the corresponding RDF vocabulary by allowing to express certain constraints.

Consider the standard identifier attribute isbn defined in the Book class. In an RDF vocabulary, this attribute is defined in the following way:

There are two issues with this RDF definition of an attribute:

  1. It doesn’t make it explicit that the property defined is an attribute, and not a reference property. This can only be inferred by finding out that the range class is a datatype, and not an object type.
  2. It does not constrain the attribute to have exactly one value, as implied by the defaults of UML class diagram semantics.

Using OWL, we can remedy these shortcomings of RDF. The following OWL property definition makes it explicit that the property http://example.org/ex1#isbn is an attribute, while the added OWL restriction defines an “exactly one” cardinality constraint for it:

Since the ISBN attribute of the Book class has been designated as the standard identifier attribute in the UML class diagram above, we should define a uniqueness constraint for it. We can do this by including an owl:hasKey element within the class definition:

5.4.5Usability issues of RDF and OWL

Both RDF and OWL have many usability issues. Especially OWL is so difficult to use that most potential users will be discouraged by it.

Because OWL was created by a community that is more concerned with formal logic than with information modeling and is not familiar with the concepts and terminology established in information modeling, they have introduced many new unfamiliar terms for concepts that had already been established and named in information modeling. They have even introduced duplicate names within OWL: an attribute is in most places called “data property”, but in some places it is called “datatype property” (specifically in OWL/RDF).

Usability issues of RDF are:

  1. For historical reasons, RDF comes with a strange jargon. Especially, its “subject”-“predicate”-“object” terminology sucks.
  2. For historical reasons, RDF comes with two different XML namespaces, typically in the from of the two namespace prefixes “rdf” and “rdfs”. The history of a language should not be imposed on its syntax. Users shouldn’t have to bother about which prefix to use.
  3. RDF is using the uncommon term “IRI” (as an abbreviation of “International Resource Identifier”), following the unfortunate naming history from “URL” via “URI” to “IRI”, while the What Working Group’s URL Living Standard97 has reverted this naming history.
  4. For practical purposes, RDF is incomplete:

a.it does not make an explicit syntactic distinction between attributes (having a datatype as range) and reference properties (having an object type as range);

b.it does not allow expressing simple class definitions, which include mandatory value and single-value constraints, in an RDF vocabulary.

OWL is needed for getting these fundamental features.

Usability issues of OWL are:

  1. it uses an uncommon terminology: e. g., “data property” instead of attribute, “restriction” instead of constraint;
  2. some of its elements have confusing names: e. g., “ObjectIntersectionOf” does not denote an intersection of objects, but of object types, and “DataSomeValuesFrom” actually refers to “some data values from”;
  3. many of its language elements are kind of unnatural and hard to grasp (much less to remember): e. g., an exactly-one-value property constraint cannot be expressed in the definition of a class along with the property declaration, but requires a separate Restriction element (as shown above).

5.5Summary

  1. The inception phase of a development project includes problem analysis and requirements engineering. The main goal in this phase is to make a conceptual information model that lays the foundations for the app’s information architecture, which is defined by an implementation-agnostic information design model made in the design phase. Then, in the implementation phase, a platform-specific data model is derived from the information design model and coded in the platform’s programming language. In the case of an OOP platform, platform-specific data models take the form of class models that are coded as a set of model classes.
  2. UML class diagrams provide a visual language for defining an information architecture. Their main building blocks are class rectangles and association lines (connecting class rectangles).
  3. A class rectangle has one, two or three compartments, containing the name of the class, its properties, and its methods.
  4. An association line connects two classes. The purpose of an association is to classify relationships (links) between objects. In the special case of a unidirectional functional association there is a direct OOP counterpart: a reference property for referencing the objects that are linked to a given object by the association.

5.6Exercises

Consider the problem of managing information about movies, like the Internet Movie Database98.

  1. Describe an illustrative information fragment with the help of some sample data about 3 movies in the form of itemized lists.
  2. Turn the itemized lists of sample data into corresponding data tables with table names and suitable column headings.
  3. Make a conceptual information model describing movies, then derive an information design model from it and turn it into a JavaScript class model.
  4. Finally, code the JavaScript class model in the form of JavaScript model classes.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset