3
Chapter 1
Applied Semantic Web
Technologies: Overview
and Future Directions
Vijayan Sugumaran
Oakland University, Rochester, MI; Sogang University, Seoul, Korea and
Jon Atle Gulla
Norwegian University of Science and Technology, Trondheim, Norway
Contents
1.1 Introduction .................................................................................................4
1.2 History .........................................................................................................5
1.3 Semantic Web Layers ...................................................................................7
1.3.1 Data and Metadata Layer..................................................................7
1.3.2 Semantics Layer ................................................................................8
1.3.3 Enabling Technology Layer ..............................................................9
1.3.4 Environment Layer .........................................................................10
1.4 Future Research Directions ........................................................................11
1.5 Organization of Book .................................................................................13
1.5.1 Part I: Introduction .........................................................................13
1.5.2 Part II: Ontologies ..........................................................................13
4 ◾  Vijayan Sugumaran and Jon Atle Gulla
1.1 Introduction
Since Tim Berners-Lee’s original idea for a global system of interlinked hypertext
documents from 1989, the World Wide Web has grown into the worlds biggest
pool of human knowledge. Over the past few years, the Web has changed the way
people communicate and exchange information. It has created new business oppor-
tunities and obliterated old business practices. As a borderless source of informa-
tion, it has been instrumental in globalization and cooperation among people and
nations. Importantly, it has also helped individuals join virtual communities and
take part in social networks that cross physical, cultural, and organizational barri-
ers. e rapid growth of information on the World Wide Web has, however, created
a new set of challenges and problems.
Information overload—In 1998, the size of the Web was estimated to exceed
300 million pages with a growth rate of about 20 million per month (Baeza-Yates
and Ribeiro-Neto, 1999). e real size of the Web today is dicult to measure,
although Web search indices cite a lower band number of unique and meaningful
Web pages. e Google search index was measured around 500 million pages in
2000, 8 billion in 2004, and more than 27 billion today. is constitutes an enor-
mous amount of information about almost any conceivable topic. While the early
Web often suered from a lack of high-quality relevant pages, the present Web now
contains far too many relevant pages for any user to review. As an example, at the
time of this writing, Google is returning about 18.6 million pages for the World
Wide Websearch phrase. If you fail to mark it as a phrase, an astonishing 113 mil-
lion pages are found to be relevant and presented on the result page. In addition,
the deeper Web generates information dynamically based on users’ queries.
Poor retrieval and aggregation—e explosion of Web documents and
services would not be so critical if users could easily retrieve and combine the
information needed. Since Web documents are at best semi-structured in simple
natural language text, they are vulnerable to obstacles that prevent ecient content
retrieval and aggregation. An increasing problem is the number of languages used
on the Web. Studies of Langer (2001) suggested that almost 65% of Web pages
were in English in 1999; data from Internet World Stats* indicate a more balanced
use of languages. e English using population at the end of 2009 constituted
only 27.7% of total online users. e plethora of languages now used on the Web
prevents search applications from applying language-specic strategies, and they
still depend on content-independent statistical models. In a similar vein, the many
*
www.internetworldstats.com
1.5.3 Part III: Ontology Engineering and Evaluation ..............................14
1.5.4 Part IV: Semantic Applications .......................................................15
Acknowledgment ................................................................................................16
References ...........................................................................................................16
Applied Semantic Web Technologies ◾  5
misspellings and general syntactic variations in documents hamper the reliability
of statistical scores of document relevance.
Stovepipe systemAll components of a stovepipe system application are hard-
wired to work only together (Daconta et al., 2003). Information ows only inside
an application and cannot be exchanged with other applications or organizations
without access to the stovepipe system. Many enterprises and business sectors suer
from stovepipe systems that use their own particular database schemas, terminolo-
gies, standards, etc. and prevent people and organizations from collaborating e-
ciently because one system cannot understand the data from another system.
1.2 History
e Semantic Web term was popularized by Tim Berners-Lee and later elaborated in
2001. e rst part of his vision for the Semantic Web was to turn the Web into a
truly collaborative medium—to help people share information and services and make
it easier to aggregate data from dierent sources and dierent formats. e second
part of his vision was to create a Web that would be understandable and processable
by machines. While humans can read and comprehend current Web pages, Berners-
Lee envisioned new forms of Web pages that could be understood, combined, and
analyzed by computers, with the ultimate goal of enabling humans and computers to
cooperate in the same manner as humans do among each other. Berners-Lee did not
think of the Semantic Web as a replacement of the current Web. It was intended as
an extension for adding semantic descriptions of information and services. Central to
the Semantic Web vision is the shift from applications to data. e key to machine-
processable data is to make the data smarter. As seen from Figure1.1, data progress
along a continuum of increased intelligence as described below.
Text and databasesIn this initial stage, most data is proprietary to an appli-
cation. e application is responsible for interpreting the data and contains the
intelligence of the system.
Ontology and
automated reasoning
Increasing
intelligence
XML taxonomies and
docs with mixed vocabularies
Text documents and
database records
XML documents using
single vocabularies
Figure 1.1 Smart data continuum.
6 ◾  Vijayan Sugumaran and Jon Atle Gulla
XML documents for single domainse second stage involves domain-
specic XML schemas that achieve application independence within the domain.
Data can ow between applications in a single domain but cannot be shared out-
side the system.
TaxonomiesIn this stage, data can be combined from dierent domains
using hierarchical taxonomies of the relevant terminologies. Data is now smart
enough to be easily discovered and combined with other data.
Ontologies and automated reasoning—In the nal stage, new data can be
inferred from existing data and shared across applications with no human involve-
ment or interpretation. Data is now smart enough to understand its denitions and
relationships to other data.
In the Semantic Web these smart data are assumed to be application- independent,
composable, classied, and comprise parts of a larger terminological structure.
Ontologies play a very important role in the Semantic Web community. According
to Gruber (1995), an ontology is an explicit specication of a conceptualization.
It represents a common understanding of a domain and its relevant terminology.
Technically, ontologies describe concepts and their taxonomic and nontaxonomic
relationships. For the Semantic Web, ontologies enable us to dene the terminol-
ogy used to represent and share data within a domain. As long as the applications
dene their data with reference to the same ontology, they can interpret and reason
others’ data and collaborate without manually dening any mapping between the
applications. Figure1.2 illustrates language.
e W3C consortium devised a number of standards for dening and using
ontologies. e Resource Description Framework (RDF) was available already in
1999 as a part of W3C’s Semantic Web eort. It uses triplets (subjects, property,
objects) to describe resources and their simple relationships to other resources. It
is used as a simple ontology language for many existing applications for content
management, digital libraries, and e-commerce.
DAML (DARPA Agent Markup Language) was proposed by DARPA, a U.S.
government research organization, as part of a research program started in 2000.
e denition of DAML is published at daml.org, which is run as part of the
WSDL
SPARQL
OWL2.0
SKOS
RDF
DAML + OIL
OWL1.0
20062002200120001999 20052004 20082003 2007 2009 2010
Figure 1.2 Important language standards of Semantic Web.
Applied Semantic Web Technologies ◾  7
DAML program. DAML is a semantic language targeting the Semantic Web,
although it can also be used as a general knowledge representation language. e
OIL (Ontology Inference Layer) semantic markup language is a European initiative
involving some of the continent’s best articial intelligence researchers. OIL is not
very dierent from DAML, and both languages provide powerful mechanisms for
dening complex ontologies. e DAML+OIL standard from 2001 is a markup
language for Web resources that tries to capture some of the best features of both
DAML and OIL.
In 2004, the Web Ontology Language (OWL), Version 1.0, was recommended
by the W3C consortium. It replaces DAML+OIL as a semantic language designed
for applications that need to process the content of information instead of simply
presenting information to humans. OWL facilitates greater machine interpretabil-
ity of Web content than that supported by XML and RDF by providing additional
primitives along with improved formal semantics from Description Logic. e
three increasingly expressive sublanguages of OWL are OWL Lite, OWL DL, and
OWL Full. OWL DL is the most commonly used language today. Version 2.0 was
introduced in 2009. OWL is now the dominant language for describing formal
ontologies in enterprises and on the Web.
Several supporting standards have emerged recently. e Web Service Denition
Language (WSDL), Version 2.0, from 2007, is one of many languages for specify-
ing Web services. SPARQL (2008) is an RDF-based query language for access-
ing information in ontologies. SKOS (Simple Knowledge Organization System;
2009) is a light weight data model for sharing and linking knowledge organization
systems via the Web. ese and similar standards intend to simplify the use of
Semantic Web technologies in practical applications.
1.3 Semantic Web Layers
As noted above, several standard specications and technologies are contributing to
the realization of the Semantic Web. It is evolving based on a layered approach, and
each layer provides a set of functionalities (Breitman et al., 2007). e assortment of
tools, technologies, and specications that lay the foundation for the Semantic Web
can be broadly organized into four major layers: (1) data and metadata, (2)seman-
tics, (3) enabling technology, and (4) environment. Figure1.3 illustrates some of
the essential specications and technologies that contribute to each layer. e four
layers and key components are briey described below.
1.3.1 Data and Metadata Layer
e data and metadata layer is the lowest; it provides standard representations for
data and information and facilitates the exchanges among various applications
and systems. e Unicode provides a standard representation for character sets in
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset