Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 9

Automatic Integration and Querying of Semantic Rich Heterogeneous Data

Laying the Foundations for Semantic Web of Things

Muhammad Rizwan Saeed; Charalampos Chelmis; Viktor K. Prasanna Ming Hsieh Department of Electrical Engineering, University of Southern California, Los Angeles, CA, United States

The suite of technologies developed in the Semantic Web, such as Ontologies, Semantic Annotation, Linked Data and Semantic Web Services, can be used as principal solutions for the purpose of realizing the IoT.

–Barnaghi et al. [1]

Abstract

Enormous amount of data from physical objects, such as devices comprising Internet of Things (IoT), is being made available through Web APIs on a daily basis. Manual discovery and integration of relevant data sources can be cumbersome. A unified view of relevant data sources is desirable for creating applications for monitoring and decision making. Considerable research has been conducted in the Semantic Web domain in terms of modeling and integrating data from physical devices, which has the potential of becoming one of the foundations for the future of IoT. In this chapter, we present different techniques for modeling semantic rich data using ontologies. We highlight the benefits of semantic modeling in terms of ease of data integration. We then discuss approaches of querying semantically rich data using various techniques aimed at users with different levels of expertise. We present this discussion in the context of how the suite of technologies that have been developed for Semantic Web can facilitate in effective handling of IoT infrastructure.

Keywords

Semantic Web; Semantic Web of Things (SWoT); Automatic Query Formulation; Ontologies; SPARQL; RDF; Smart Cities; Smart Oilfields

Chapter Points

• Introducing key concepts of Semantic Web.

• Applying the principles of Semantic Web to Internet of Things (IoT) and Web of Things (WoT) to build Semantic Web of Things (SWoT).

• Using SWoT to achieve the vision of Smart Oilfields and Smart Cities.

9.1 Introduction

Recent developments in embedded devices and sensors have led to smart devices increasingly becoming a part of daily activities. These smart devices include sensors, mobile devices, vehicles, appliances and other devices which are uniquely addressable and possess the capability of collecting and exchanging data. A network of such devices is conceptualized as Internet of Things (IoT) [2]. The research in this area mainly focuses on exploring efficient ways to connect such smart devices for creating data networks in a variety of challenging and constrained networking environments [3]. Due to various vendors involved in manufacturing smart devices and multiple hardware and software architectures, developing multi-platform applications on top of this network is a challenging task as it requires expertise across vendors and systems. A promising next step is to build the Web of Things (WoT) on top of the basic network of devices [4].

To understand this, consider the example of Internet which is a scalable network of networks that successfully connects heterogeneous devices. On top of the Internet, at the application layer, the Web provides an information exchange (using open protocols e.g. HTTP) where uniquely identifiable documents and web resources can be accessed by users worldwide using URLs, irrespective of the underlying networking infrastructure. Similarly, the Web of Things (WoT) aims to integrate real-world smart devices into the realm of Web so that the devices, their services and data become searchable [4,5]. Multiple web-oriented service standards have been proposed to integrate physical devices and their services into information systems, allowing access through APIs, resulting in a web of heterogeneous data generating sources [6–8]. However, there is an inherent drawback of the Web itself. Currently, the Web can more appropriately be called the web of documents [9]. Like real files, users skim through web documents to extract relevant information which is cumbersome to do manually, given the huge number of documents on the Web. Without giving a meaningful structure to web documents (or resources), it is difficult for the machines to understand their contents automatically. Similarly in WoT, the Web APIs can be used to acquire data which often is presented as documents (JSON, XML). The selection of relevant APIs and identifying required data to poll can be difficult and time-consuming, if done manually. There is a need for a way to model the information produced or consumed, so that it can be automatically gathered, integrated and shared [10].

An improvement for the Web has been proposed as Semantic Web [11] which aims to build a web of data on top of the existing web of documents. The objective is to give meaning to data and provide the flexibility of linking multiple data sources together. This enables machines to comprehend information and rapidly integrate multiple data sources. Based on the principles of Semantic Web, Semantic Web of Things (SWoT) has been proposed for Web of Things (WoT) [1]. The goal of SWoT is to associate semantically rich and easily accessible information to real-world objects, locations and events by modeling and integrating the data acquired from underlying sensor networks.

The purpose of this chapter is to introduce Semantic Web technologies as an enabler of Semantic Web of Things and discuss how different techniques developed as part of Semantic Web suite can address challenges related to discovering, modeling, integrating and accessing data from networks of devices. In Section 9.2, we present a brief introduction of Semantic Web and its key concepts and briefly discuss the challenges faced in realizing SWoT. In that context, in Section 9.3, we discuss various Semantic Web based techniques that can address those challenges. Finally, in Section 9.4, we discuss how Semantic Web technologies through SWoT can pave way for smart applications such as Smart Oilfields and Smart Cities.

9.2 Building the Semantic Web of Things (SWoT)

According to an estimate,¹ there will be 21 billion IoT devices by 2020. The goal then becomes to provide capability of interlinking massive streams of data and services provided by such devices and to make them discoverable and queryable by users (machines and humans). The term “Semantic Web of Things” (henceforth referenced by its abbreviation SWoT) has been recently coined to formally lay the groundwork for applying Semantic Web technologies to the domain of IoT (and WoT) [12]. Before we discuss the challenges in realizing SWoT, we present a brief introduction of Semantic Web along with some of its key concepts.

9.2.1 What is Semantic Web?

As discussed in Section 9.1, the Web can be categorized as a web of documents and hence is geared more towards direct human consumption. However, the next version of Web, labeled “Semantic Web” [11], aims at creating a web of data that is understandable by machines. The main idea behind Semantic Web is to augment HTML with more suitable language such that some structure (context) can be added to the content of a web document. This way, in addition to carrying content and formatting information, the web documents are able to carry information about their content. Such representation is easier to process for machines. The information about content is referred to as metadata – data about data. This is where the term semantic comes from in Semantic Web i.e., capturing meaning of data [13].

As an example, consider the sentence “Barrack Obama was born in Hawaii” appearing in a web document. Although this sentence is readily understandable by human readers but holds no specialized meaning for a computer. If we add metadata identifying different entities within this sentence, we can add context to it for a machine. We define Barrack Obama as an instance of class Person, Hawaii as an instance of class State and link the two instances using a property, say, born-in. This makes it easier for a machine to understand the context of the information in the web document and be able to answer more complex queries. For example, by annotating multiple similar sentences (or web documents), a machine can answer query asking for list of all people born in Hawaii (i.e., all instances of class Person that are linked through the property born-in with the entity Hawaii). To achieve this way of representing information we require (i) identification and modeling of concepts (Person, City, State, Movie, Actor, Vehicle etc.) and properties (acted, directed, born-in, lives-in, works-at, founded etc.) in the document and (ii) a set of formal languages to do so. This understanding of the domain is captured through the use of ontologies.

9.2.1.1 Conceptualizing Domains through Ontologies

In the domain of Semantic Web, ontologies (or vocabularies) define the concepts and relationships used to describe and represent an area of concern. The role of ontologies in Semantic Web is to facilitate data organization and integration [14]. This integrated data (known as Linked Data) which can be used for reasoning or simply querying is the main strength of the Semantic Web. Most of the applications employing Semantic Web technologies are essentially based on the accessibility and integration of Linked Data at various levels of complexities. Ontologies can play a crucial role in enabling automatic knowledge processing, sharing, and reuse among applications. An ontology typically contains a hierarchy of concepts (or classes) along with their attributes (or properties) related to an area of interest. From our example regarding birthplace of Barrack Obama, we need to create (or reuse) an ontology that models such biographical content using classes such as Person, City, State and Country (to name a few) along with relationships among them. Moreover, literals such as age, salary etc. can also be modeled through ontological constructs called data (or datatype) properties and linked to respective classes.

9.2.1.2 Resource Description Framework (RDF)

An ontology is a formal conceptualization of a particular domain and is described using RDF² syntax. The Resource Description Framework (RDF) is used for expressing information about resources, their attributes and relationships with other resources. A resource can be anything, including a document, person, object or an abstract concept. RDF allows us to make statements about these resources. These statements always follow a simple structure:

<subject> <predicate> <object>

For example: A statement that shows the entities “Barrack Obama” and “Hawaii” linked through the property “birthPlace” is formally represented in RDF as:

<http://example.org/BarrackObama>
<http://schema.org/birthPlace>
<http://example.org/Hawaii>.

Moreover, we can add more context about the entities “Barrack Obama” and “Hawaii” such as:

PREFIX rdf:
<http://www.w3.org/1999/02/22-rdf-syntax-ns#>

<http://example.org/BarrackObama> rdf:type
<http://schema.org/Person>.

<http://example.org/Hawaii> rdf:type
<http://schema.org/State>.

<http://example.org/BarrackObama>
<http://schema.org/name>
"Barrack Obama".

<http://example.org/Hawaii> <http://schema.org/name>
"Hawaii".

In RDF subjects, predicates and objects are represented using URIs (Uniform Resource Identifiers). URIs are used to identify resources irrespective of their location on the web. Readers interested in understanding more about Semantic Web and its constituent concepts should consult some of the introductory material such as [13] and [15].

9.2.2 Challenges in Realization of SWoT

Barnaghi et al. [1] present the challenges in fully realizing the semantic-oriented vision towards IoT. We present some of those challenges here. Later, in Section 9.3, we see how they can be addressed through Semantic Web technologies.

9.2.2.1 IoT Data Modeling and Integration

One of the key visions of Semantic Web is to model information on the Web so that it can be automatically understood, organized and retrieved by machines. The IoT devices exchange data with each other and with users on the Web. In order to facilitate automated interactions among devices and users, some context is needed to be built around data. In the domain of Semantic Web, context is provided through use of ontologies. Ontologies encode domain knowledge in terms of concepts (or classes), attributes (or properties) and rules to model a given domain. Using ontologies, we can model how data is being generated, how it interacts with other sources and how metadata and other attributes are managed. Ontologies can facilitate identification of different types of data streams. In essence, ontologies are reference maps that are used to model similar entities across domains, promoting standardization. Moreover, heterogeneous data from multiple sources can be integrated together to create complete view of an environment or an entity (e.g. energy consumption, temperature and occupancy data can be used to characterize a building). In an effort to standardize publishing of ontologies and data, W3C (World Wide Web Consortium) has provided some guidelines [16] which are summarized here:

• Publish the domain knowledge (ontology, dataset and rules) online.

• Write labels or comments at least in English.

• Dereferencing URIs: If a URI does not point to a document, then it should point to an ontology (not to 404 error page).

• Use semantic validation tools to fix errors in the ontology.

In order to promote standardization, there needs to be an agreement among the different stakeholders over ontological definitions which, in real world, is seldom the case due to the decentralized design of the Web. For example, museums around the world have built databases about the artworks they host using different schemas without consideration for interoperability with other museums [17]. If, for instance, each of these museums model similar entities such as a Painting or an Artist using classes from independently developed ontologies, it will be difficult to query paintings by the same artist across museums without first performing the process of ontology alignment [18] and record linkage [19]. In Section 9.3.1, we discuss more about how ontologies can be used to model and integrate heterogeneous data.

9.2.2.2 IoT Resources Search and Discovery

As more and more IoT data become available through web-oriented services, it becomes difficult to manually keep track of and locate relevant services. Semantic annotation of IoT resources and services are key to supporting the search and discovery of such resources. In Section 9.3.2, we discuss different approaches of semantic modeling of web services and linking to LOD (Linked Open Data).

9.2.2.3 IoT Data Querying

Once the data has been modeled through ontologies, the users require knowledge of Semantic Web and query languages to access it. In the literature, several approaches have been proposed that allow non-expert users to access information without having the pre-requisite knowledge of Semantic Web principles. In Section 9.3.3, we present multiple techniques to facilitate querying over semantic repository.

9.3 Semantic Web as Enabler of SWoT

The realization of SWoT essentially relies on multiple concepts related to Semantic Web. In this section, we discuss different techniques built as part of the Semantic Web suite and how they can be applied to sensors and data (and IoT in general) to address the challenges presented in Section 9.2.2.

9.3.1 Ontology-Based Data Modeling and Integration

The purpose of creating the ontologies and integrating data is to organize heterogeneous data sources for simplifying on-demand information access and enable complex analytics to be performed on the integrated knowledge bases. In the domain of IoT, data from various devices along with metadata can be integrated using the principles of Semantic Web. In the particular example of sensor networks, Semantic Web technologies have been applied to model oceanographic measurements [20], ecological surveys [21], external corrosion detection [22] and smart grids [23]. For example, in [23] authors propose an extensible model that caters to the information diversity in Smart Grids with provision to integrate new information sources and concepts. It is shown that such a model can facilitate dynamic Demand Response (DR) planning for the utilities as it is capable of presenting and integrating electric consumption data, weather data, building occupancy data etc., to create a unified view of the environment needed for decision making. Such studies show the benefits of the approach in terms of integration and contextualization of data. These representations can not only explicitly reveal relationships between facts but also be used to drive methods to infer hidden or implicit relationships between entities. For instance, it is shown by authors in [24] that reasoning capabilities over semantic data allows certain queries to be issued that cannot be composed using SQL.

In the domain of Semantic Web everything is uniquely identifiable through URIs and hence can be made accessible via HTTP protocol. Using RDF statements, these URIs can be linked to other URIs using different properties. This linking can provide intra-domain as well as inter-domain integration. However, in reality, researchers modeling data in their areas of interest seldom follow standardized vocabularies and create their own custom ontologies. To solve this issue, ontology mapping/matching techniques have been proposed in literature. Such techniques link together same concepts modeled differently in two ontologies on the basis of (i) names of the concepts and attributes and (ii) comparison of the instances of the concepts under review [25–27]. For the sake of discussion in this chapter we limit ourselves to standardized ontologies only, used in the domain of sensors. One commonly used such ontology is Semantic Sensor Network (SSN) ontology which is presented in next section.

9.3.1.1 Semantic Sensor Network (SSN) Ontology and Sensor Cloud Ontology (SCO)

Semantic Sensor Network (SSN) ontology [28], shown in Fig. 9.1, focuses on domain independent sensing applications by integrating sensor data (measurements) and sensor specific data (sensing principles, quality etc.). The ontology provides a number of concepts and attributes to model sensors and sensor observations and measurements. The ontology can describe sensors, accuracy, ranges, frequency, units and hierarchies of sensors etc. Users can derive their own custom domain ontologies by creating subclasses of classes described by SSN ontology. Another related ontology is the Sensor Cloud Ontology (SCO) [21] which extends SSN ontology by adding provisions for the parameters being sensed as separate entities instead of just metadata and explicitly introduces the concept of time series. The ontology provides a unified view of data collected from multiple sensor networks and organizes sensors hierarchically i.e., Network → Platform → Sensor → Phenomenon → Observation. This means that a network can have multiple platforms. Each platform has multiple sensors attached to it and each sensor measures a particular phenomenon and the measured results are stored as observations.

Figure 9.1 **Overview of the Semantic Sensor Network (SSN) Ontology** **[28]**. Some of the key concepts of SSN ontology are shown, which are split into conceptual modules. These concepts model sensors, measurements, features being measured etc. The concepts not depicted here are related to metadata about devices and observations such as operating ranges, accuracy, precision, resolution etc.

Müller et al. [21] have used SCO ontology for (offline³) modeling and integration of sensor data acquired through REST API. The Commonwealth Scientific and Industrial Research Organization (CSIRO⁴) provides access to data of a large number of terrestrial and marine sensors through a Web API implemented using RESTful principles. These sensors measure parameters such as temperature, salinity and rainfall. The data is available as JSON documents. The URL of each document depicts the hierarchical organization (network, platform names etc. e.g. network1/platform1/sensor1/data.json etc.) for a given sensor. By creating instances of classes such as Network, Platform and Sensor from the SCO ontology, the hierarchical information can be mapped to corresponding concepts in RDF format. The data itself as well as the measured phenomenon (rainfall, in this case) can be modeled using ObservedPhenomenon, ObservationResult and TimeSeriesObservedValue. This allows to create a single linked RDF graph from a set of JSON documents acquired through REST API. Once the data is linked, users can extract relevant data by issuing SPARQL queries like the one, shown in Box 9.1, that retrieves a list of sensors and their observations along with observation quality information.

Box 9.1

SPARQL Query Example

SELECT DISTINCT ?sensingDevice ?observation ?obsMetadata

?qualityInfo ?methodType WHERE

{

?sensingDevice a ssn:Sensor.

?sensingDevice sco:hasObservedPhenomenon ?observation.

?observation sco:hasMetadata ?obsMetadata.

?obsMetadata md:dataQualityInfo ?qualityInfo.

?qualityInfo dq:report ?report.

?report dq:evaluationMethodType ?methodType.

}

–From RESTful to SPARQL: A Case Study on Generating Semantic Sensor Data [21]

9.3.1.2 Modeling of Time Series Data

A time series is a temporally ordered sequence of observations. Usually, the observations in the sequence have the same feature of interest and observed property. For example, time-stamped sequence of kWh values obtained from an electricity meter (sensor) forms a time series representing energy consumption (observed property) for a particular house (feature of interest for the sensor). In order to ensure that the observations that constitute a time series are populated correctly in an RDF graph, usage of OWL (Web Ontology Language) based rules have been proposed [29]. Such rules ensure that all the observations in a single time series have the restriction of having same feature of interest and the same observed property. One such example is shown here:

Box 9.2

Using OWL Property Restrictions

Assuming an ontology that models kWh consumption data for residential buildings, the OWL syntax fragment below represents that House2717KWhObservation is subclass of Observation with a restriction that all instances of the former will have energyConsumption as value of the property observedProperty and House2717 as value of the property featureOfInterest. This is to ensure that when the RDF graph is created from raw data, only those observations that meet these two criteria become instances of the class House2717KWhObservation.

<owl:Class rdf:about=

"http://www.energydata.org#House2717KWhObservation">

<rdfs:subClassOf rdf:resource="#Observation"/>

<rdfs:subClassOf>

<owl:Restriction>

<owl:onProperty rdf:resource="#observedProperty"/>

<owl:hasValue rdf:resource=

"http://www.energydata.org#energyConsumption"/>

</owl:Restriction>

</rdfs:subClassOf>

<rdfs:subClassOf>

<owl:Restriction>

<owl:onProperty rdf:resource="#featureOfInterest"/>

<owl:hasValue rdf:resource=

"http://www.energydata.org#House2717"/>

</owl:Restriction>

</rdfs:subClassOf>

</owl:Class>

9.3.1.3 Dealing with Heterogeneous Data

Nowadays, data is not only being generated by sensor networks, but also a large amount of it is coming from human and machine users. A sensor observation has so far been modeled keeping in mind its numeric nature. However, due to new forms of data generating entities, data may comprise of images, videos, text (structured and unstructured), drawings etc. In such scenarios it is useful to abstract the data generating entities and focus on the data itself. Due to recent focus on data sciences, complex analytical models are commonly being applied to observed data to detect patterns, learn hidden correlations and find new facts. This derived data needs to be integrated with the raw data for querying and future analysis. SOFOS (Smart Oil Field Ontology) [22] is designed for data and event-centric modeling. Entities are viewed as data sources (including both raw data and metadata). Like SSN and SCO, discussed earlier, there is a Measurement class, but the nature of measurement is not limited to numeric data. It includes anything from an image to a descriptive observation (e.g. comments). The observed data is linked to analyzed data through inter-class relationships. Based on critical thresholds, events (e.g. low/high alarms, warning/critical alarms etc.) can be generated based on observed and derived data. The events are modeled using classes derived from PoEM (Process-oriented Event Model) [30] for event modeling. Bridging the two ontological models together results in linking of data streams, event detection and event-based goal planning, promoting the re-usability aspect of Semantic Web.

9.3.2 Discovering and Modeling Web Services

So far we have discussed how different techniques have been applied to model raw data and devices to build semantic data. Other than raw data, IoT data is also available through WoT approach, which builds a layer of web services on top of the IoT devices so that the devices, their services and data become browsable. There are examples, in the literature, of modeling data acquired from web services [20,31,32]. However, creating a solution or scheme that models only a particular web service is inefficient. Also some of the approaches extract all available data from such services and convert them into RDF (offline approaches such as [21], discussed in Section 9.3.1.1). There were more than “9000 Web APIs that were available in 2013, up from 105 in 2005”.⁵ Offline processing is not a viable solution for such a large number of web services and also can lead to data freshness issues, as the process of data extraction and modeling may need to be repeated to keep the data up-to-date. This necessitates having mechanisms for automatic and generalized way of discovering and modeling services and making them available for querying. Before we discuss some of the proposed approaches, it will be worthwhile to understand the concept of Lifting and Lowering. It is possible that a user application that issues SPARQL queries (that match RDF graph patterns) may have to access web service that provides data as XML or JSON. As shown in Fig. 9.2, the process of Lowering converts the user provided RDF graph pattern into the input needed by the web service. The process of Lifting converts the XML/JSON result into RDF format [33]. This should happen seamlessly so that from the user's perspective he only deals with RDF data.

Figure 9.2 **Lifting and Lowering.** Conversion of information related to the relationship between two entities Alice and Bob from XML to RDF (Lifting) and back (Lowering)

Several approaches have been developed to integrate Web APIs with the Linked Data cloud. One approach is to semantically annotate attributes provided by the web services using known ontologies and make them available as Linked Data (on a Semantic Service registry). Users can query the registry using SPARQL queries to find out relevant web services that provide the data they need [34–36]. An example of a SPARQL query for service discovery is:

Box 9.3

Service Discovery SPARQL Query Example

This example is for services which have been annotated using SAWSDL^a and MSM (Minimal Service Model) [35] ontologies. The query finds services that take dbpedia:City as input and return dbpedia:State as output.

SELECT ?url WHERE

{

?s rdf:type msm:Service; msm:hasOperation ?o.

?o rdf:type msm:Operation; rest:hasAddress ?url;

msm:hasInput ?in; msm:hasOutput ?out.

?in rdf:type msm:MessageContent; msm:hasPart ?par.

?par rdf:type msm:MessagePart; sawsdl:modelRefrence ?m.

?m rdf:type dbpedia:City.

?out rdf:type msm:MessageContent; msm:hasPart ?par2.

?par2 rdf:type msm:MessagePart; sawsdl:modelRefrence ?m2.

?m2 rdf:type dbpedia:State.

}

^a http://www.w3.org/TR/sawsdl-guide/.

Another approach [37] proposes to build an RDF template (using known ontologies) on the server side that links together different attributes provided by the service. This is essentially a wrapper built on the server side. In this way, there will be no lowering and lifting, since the HTTP requests will be accepting and replying in RDF patterns. For existing services, automatic modeling and conversion of XML/JSON data into RDF format has been proposed. For instance, in [38] authors present an approach that first invokes a Web API several times to get a list of attributes along with examples of different invocations of the service. Using ontology matching, it matches the names of the fields in the data recieved to concepts and attributes in well-known or user-specified ontologies. Once the model is finalized, the software acts as an intermediary between the client and the server and performs automatic lowering and lifting for subsequent queries.

9.3.3 Querying Semantic-Rich Heterogeneous Data

The Resource Description Framework (RDF) and the SPARQL query language have been recognized as two of the key technologies of the Semantic Web. An RDF repository is a collection of triples, denoted as <subject, predicate, object>, and can be represented as a graph. The vertices of this graph denote subjects and objects, and edges denote predicates. SPARQL allows users to write queries against data repositories that follow the RDF specification of the World Wide Web Consortium (W3C) by creating queries that consist of triples, conjunctions, disjunctions, and optional patterns. Although SPARQL is a standard way to access RDF data, it remains tedious and difficult for non-expert users because of the complexity of the SPARQL syntax and the RDF schema [39]. To automatically generate SPARQL query, a system would have to (i) separate the user input into syntactic markers and tokens, (ii) map tokens to concepts in the ontology, (iii) link identified concepts based on relationships in the ontology, and (iv) issue the query to collect the results.

In this section, we present a discussion on some of the systems that aim to convert user query intention into formal query syntax through different approaches.

9.3.4 State of the Art

Part of the Semantic Web vision is to provide Web-scale access to semantically annotated content [11,40]. This implies understanding users' information needs accurately enough to allow for retrieving a precise answer using semantic technologies. An ideal system would allow end-users to benefit from the expressive power of Semantic Web standards while at the same time hiding their complexity behind an intuitive and easy-to-use interface [41,42]. For this reason, NLP and ontology-based approaches for translation of end-user queries into formal queries have been significantly explored [39,43–46]. Systems based on natural language can imply semantic relationships between keywords using a whole sentence [47]. Additionally, some existing natural language based approaches limit input to a subset of natural language rules by introducing a pre-specified vocabulary, grammar or sentence structure that must be followed while constructing a query [48].

Approaches that avoid the challenges of natural language processing rely on controlled environments by guiding the user step by step with suggestions of terms that are connected in the ontology [49,50], formulating queries interactively. Querix [50] translates natural language queries into SPARQL. In case if the NL query translates into multiple semantic queries, Querix relies on clarification from the users via dialog boxes. Ginseng (Guided Input Natural language Search Engine) [49] allows users to query RDF knowledge bases using a controlled input language. The system provides suggestions through pop-up lists for each word in the user entry. These pop-up menus offer suggestions on how to complete the current user-entered word or present the options for the next word. The possible choices get reduced as the user continues to type. Entries that are not part of these suggested lists are not accepted by the system. Ginseng translates the entry to SPARQL query, executes it against the ontology model using Jena⁶ and displays the SPARQL query and answer to the user.

Another approach [39] requires predefined SPIN (SPARQL Inferencing Notation) rules to be stored in the semantic repository. SPIN rules are essentially SPARQL statements stored as part of the RDF graph. Approaches that rely on predefined rules limit the ability of users to formulate new queries on demand and rely on involvement of IT experts or database admins to add new queries to the library [51]. In [52] authors perform offline processing to map natural language phrases to top-k possible predicates in an RDF dataset in order to form a paraphrase dictionary which is later used in the online phase of semantic query formulation. This approach of maintaining pre-processed mappings (called templates) of NL phrases to RDF predicates and later using them for the basis of RDF Q/A systems is also called template-based approach [53,54].

Since a user query does not need to be syntactically correct but must contain a minimum set of “relevant” concepts, a keyword-based interface eliminates the need of pre-processing natural language phrases into discernible tokens before matching such keywords to concepts and attributes in the ontology. The keyword-based approach is the basis for Automatic SPARQL Query Formulation (ASQFor) technique [55] which takes the <key, value> pairs approach to solve the problem of querying a semantic data repository. This is similar to the way arguments are passed to functions in high level programming languages such as Java. ASQFor framework is a reusable and extendable domain independent approach that requires virtually no end-user training to facilitate semantic querying over a knowledge base represented in RDF. ASQFor's simple and intuitive tuple-based interface accepts <key, value> inputs and translates them into a formal language query (currently SPARQL). Generated queries are then executed against the semantic repository and the results are returned to the user. The main objective of this approach is to develop a domain independent framework that provides a simple but powerful way of specifying complex queries and automatically translates them into formal queries on the fly (i.e., does not rely on predefined rules and can instantaneously adapt to changes in the ontology).

9.3.4.1 ASQFor: Automatic SPARQL Query Formulation

The main goal of ASQFor is to enable end-users to formulate queries over semantic data in terms of classes and properties while being oblivious to the actual structure of the data. To achieve that, ASQFor generates SPARQL queries in three steps. First, the user provided keywords are mapped to concepts and attributes in the ontology O. Each key in the user provided input maps to a node u in O. Second, the algorithm determines the minimum number of nodes and edges from O needed to connect all the nodes u to form the query subgraph Q. A semantic relationship between two nodes (or keys from the input) can be simple (i.e., a single triple) or complex (i.e., represented by a path). To construct the semantic query graph Q, the lowest common ancestor r of all the mapped nodes u is computed. This step is necessary to establish the smallest set of relationships between concepts and attributes in the query that lie on different branches of O. Then the paths that connect all nodes u to r are traced (r is the root of the query subgraph Q). The SPARQL statements are generated while traversing these paths to r by populating statements that correspond to semantic relations and intermediate nodes at each step. Finally, the SPARQL query is executed on the semantic repository and results are returned to the user.

9.3.4.2 Semantic Search of 1990 US Census Data

To show how ASQFor can be incorporated into a semantic search system, we developed a simple user interface (Fig. 9.3) that supports a combination of semantic search and exploration queries. From the user's perspective, he only needs to know what kind of information is available in the database irrespective of how it is organized. This has led to our minimalist design which allows users to pick and choose the concepts that are relevant for the query, specify filtering values and get the desired results. After selecting the required attributes, the user can optionally specify filtering values. The filtering values can be entered concatenated with comparison operators e.g. <500. The results of the query are returned in CSV format. The selection and filtering values in Fig. 9.3 correspond to a query asking for people with more than 16 years of education who are employed and making more than $100,000/yr. Unlike other visual query interfaces, our primary focus is to abstract the details of SPARQL and schema ontologies from the end users, providing them only the data attributes to choose from. Furthermore, this interface can be dynamically generated from a schema ontology, resulting in a portable application that only requires access to the semantic repository (which must contain schema ontology along with data) to build a functional-to-SPARQL query translator and a GUI on the fly.

Figure 9.3 **ASQFor based Application.** US Census 1990 – Database Search: The users simply selects from the attributes that are available in the database. The formal query is generated by the system on the fly, executed and the results are returned to the user

9.4 Case Studies: Smart Applications

9.4.1 Smart Oilfields

One environment characterized by data heterogeneity, where Semantic Web technologies are being used for discovering, modeling, integrating and sharing knowledge is the Smart (or Digital) Oilfield. Multiple processes on oil and gas facilities generate vast volumes of data. The data can be generated by variety of sensors and process controllers or manually gathered through inspections. This gathered data is often located in multiple independent repositories and undergoes further processing by different users by performing analytics and other domain-specific activities that result in generation of derived data. For example, engineers working in the oilfields often create or re-use different simulation models [56] such as geographic models, reservoir simulation models, network models, integrated (coupled) simulation models etc., all of which produce and consume vast amounts of data. Another area where effective decision making is critical is Asset Integrity Management, where assets are continuously monitored to ensure that they perform their required functions effectively while operating within defined safe operating ranges. Asset integrity is affected by many parameters [22] including normal wear and tear, weather, production, repair, and human factors etc. In a rare operating condition, these seemingly independent parameters may collectively trigger a fault that could lead to a potentially disastrous loss of containment (LOC) event. The oil and gas industry always seeks to prevent LOC events. To prevent such incidents, engineers rely on inputs from various asset databases and software tools to make important safety-related assessments and decisions on a daily basis. Due to heterogeneity of these data sources, providing on-demand access to information with an integrated view can be challenging. A unified view of current data sources is desirable for decision making as it could lead to identification of telltale signatures of LOC events. However, manually cross-referencing and analyzing such data sources is labor intensive. Another challenge is knowledge management, which refers to a systematic way to capture the results of various engineering analyses and prediction models. To summarize, there are three key challenges that must be addressed to facilitate all such decision making processes [56]:

• Integrated view of the information: For effective decision making, there needs to be a system that presents a comprehensive and continuous view of the assets and processes. Useful information may reside in multiple databases or files. Having scattered information makes it difficult to connect the dots and come up with actionable information. Hence, integration of multiple data sources is a crucial step.

• Knowledge management: As the models (e.g. for modeling production or external corrosion) are constantly being used and improved with passage of time (due to availability of more data), the rationale behind the changes and decisions are generally lost. Such knowledge could be extremely useful for auditing and training purposes.

• Efficient access to information: It is critical for the decision maker (production engineer, asset integrity manager etc.) to have access to relevant pieces of information required to make an informed decision. For example, an asset integrity manager who is looking for solution to a problem can benefit from a database of surveys conducted on other facilities to look for similar problems and associated recommendations or solutions.

9.4.1.1 Semantic Model of Asset Integrity Data

Semantic Web technologies can be used for an expressive representation of various heterogeneous data sources to deal with aforementioned challenges. First of all, the data streams need to be modeled using appropriate ontological models that capture the relevant domain knowledge, as is done through SOFOS (Smart Oilfield Ontology) and ECD (External Corrosion Detection) ontologies in [22]. The ontologies capture physical entities and their inter-relationships as well as the associated observed data, metadata and derived data. Once all the entities in the data streams have been identified, the next step is to perform record linkage across data streams. Different types of data may be recorded for same assets in different databases. The key idea is to integrate and present a unified view of the environment. Fig. 9.4 shows data organized as an integrated RDF graph where raw data was originally stored as multiple CSV files, databases, images and text files. Data from multiple sources is integrated into a central repository serving as a single endpoint for maintaining and retrieving knowledge. Asset integrity managers can query previously separate databases to issue meaningful queries e.g. all work orders for assets that have been labeled severely corroded in the most recent survey. This query requires data from work orders database and inspection database, which have now been linked together after the integration process. This way data from disjoint repositories can be combined to provide actionable information. This is, essentially, an end goal for any enterprise to not just store and manage vast amounts of data, but also get actionable insights faster for robust decision making process. Other automatic and semi-automatic approaches of using ontologies for organizing data and metadata into semantic repositories are discussed in [56] and [57].

Figure 9.4 **Integrated asset integrity data.** Data from one oil and gas facility is organized as a graph where the center node represents the facility. Each group of nodes represents all information related to a single equipment organized in a hierarchical way. A hierarchy of data for a single equipment is shown in the expanded view

9.4.1.2 Accessing Integrated Information

Once we have the integrated repository, the data become available for querying. However, non-expert users require IT experts to build applications or create queries to be used. Our definition of a non-expert user is someone who is not familiar with the concepts of databases, querying, Semantic Web and ontologies but is an expert in his area of specialization e.g. an asset integrity manager or a production engineer. To do his job that person only requires access to data irrespective of underlying techniques of data organization and retrieval. For such users, ASQFor algorithm (discussed in Section 9.3.4.1) aims to abstract Semantic Web concepts (such as ontologies, RDF, SPARQL) and allows them to formulate queries at a higher level which the system then translates into formal SPARQL queries automatically.

9.4.2 Smart Cities

Currently, a wide variety of devices such as computers, mobile devices, surveillance equipment, sensors, actuators, displays, vehicles, home appliances and so on, have the capability of generating and exchanging data. The IoT framework (along with its application layers: WoT and SWoT) enables the development of applications that harness that information to provide new services to individuals and organizations (both public and private) [58]. In this context, the beneficial role of Semantic Web technologies becomes even clearer in linking together variety of data. For example, Intelligent Transport Systems can automatically create alerts to be sent out or displayed along the highways in case of flood, fog or snow forecast. Based on traffic data, such systems can, in real-time, update the arrival times of buses and taxis. Smart Buildings can adjust their air-conditioning systems based on the ambient conditions. Smart Neighborhoods or a group of Smart Buildings can optimize their energy consumption based on occupancy, weather and short and long-term consumption patterns. Healthcare Systems can issue alerts (flu season etc.) based on impending weather conditions. Health Monitoring Systems through remote-sensing equipment at ports of entry or popular areas can be used as predictor of epidemics [59]. The realization of such applications is only possible if IoT data can be harnessed effectively in order to enable intelligent systems (that constitute a Smart City) to make such data-driven decisions.

9.4.2.1 Challenges

IoT devices, which form the basis of a Smart City, offer heterogeneous data. For instance, in an IoT network, the devices can be measuring speed, light, temperature, sound etc. and usually have varying degrees of sampling frequencies. Different sensors provide different forms of data at varying rates and even with varying levels of quality and availability. The applications need to be built in a way that they interact with devices that are trustworthy and discard the noisy data. In these types of large-scale highly dynamic environments, the process of discovering, modeling, integrating and efficiently querying data are complex tasks [60]. Various aspects of Semantic Web technologies (discussed in Section 9.3) can help to some extent, enabling valuable services towards the goal of achieving Smart Cities. One such approach is described next.

9.4.2.2 Automated Complex Event Implementation System (ACEIS)

ACEIS [61] acts as a middleware between IoT data and Smart City applications and performs real time discovery and integration of data streams automatically. The data streams are modeled using SSN ontology (discussed in Section 9.3.1.1). User queries are translated into “semantically annotated complex event service requests”. These event requests are modeled using Complex Event Service (CES) ontology, which is an extension of OWL-S.⁷ The event requests are processed by ACEIS core to determine the data streams that are to be discovered and integrated. It uses an algorithm [62] to create composition plans to detect the complex events specified in event requests. Candidate sensors are determined by querying semantically annotated sensors capabilities (modeled using ObservedProperty and FeatureOfInterest from SSN ontology). The query transformer then transforms the composition plans into set of streaming queries to be executed on a streaming query engine. This mechanism can work with actual sensors as well as virtual sensors (social media streams) to detect complex events for users.

9.5 Conclusion

We have discussed some aspects of Semantic Web and its applications to data discovery, modeling, integration and querying. These principles have been applied to the IoT (and WoT) paradigm to pave path for Semantic Web of Things (SWoT). The purpose is to bring data from “Things” to the realm of the Web in fully integrated and discoverable fashion for consumption by humans and machines and build cross-domain applications. In this context, ontologies are needed to model the instances of “Thing”, its data, metadata, services and all other relevant information. The reasoning capabilities of the description languages along with other data analytics techniques can be used to infer implicit relationships and to fill the gaps in data. The role of standardized ontologies such as SSN is to facilitate semantic interoperability among various domains and data sources. Due to the scale of data being generated by IoT devices and the dynamic nature of IoT devices, we believe that Semantic Web technologies will play a key role in addressing the challenges faced in realization of smart environments such as Smart Oilfields and Smart Cities due to their ability of giving meaning to data for machine-readability, describing data using ontologies, re-using and sharing of domain knowledge and inferencing. However, as noted in [1], most of the Semantic Web technologies have been developed with Web in mind and, hence, do not adequately address the dynamic, resource-constrained and distributed environment of physical devices. To address that, the future research work should focus on the dynamicity and scalability issues when adapting the principles of Semantic Web technology for the domain of IoT.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 9: Automatic Integration and Querying of Semantic Rich Heterogeneous Data

Create new playlist

Sign In

Sign Up