347
Chapter 13
Semantics and Search
Jon Atle Gulla
Norwegian University of Science and Technology, Trondheim, Norway
Jin Liu
T-Systems, Bonn, Germany
Felix Burkhardt, Jianshen Zhou, and Christian Weiss
Deutsche Telekom Laboratories, Bonn, Germany
Per Myrseth, Veronika Haderlein, and Olga Cerrato
Det Norske Veritas, Oslo, Norway
Contents
13.1 Introduction ............................................................................................ 348
13.2 Semantic Representations .........................................................................350
13.3 Layered Model of Semantic Search...........................................................352
13.4 Index Matching Approaches .....................................................................353
13.4.1 Named Entity Matching ...............................................................355
13.4.2 Graph Traversal ............................................................................355
13.4.3 Conceptual Matching ...................................................................356
13.4.4 Reasoning .....................................................................................357
348 ◾  Jon Atle Gulla et al.
13.1 Introduction
Web search applications today are crucial for the ecient management and exploi-
tation of information on the Internet. People use searches to nd web sites of com-
panies or services and check on items from tonight’s movies to recommendations
on books and products. A study by Kumar and Tomkins (2009) revealed that 9.0%
of all page views on the Internet are visits to various search sites, including multi-
media searches and searches of databases; 8.9% are direct referrals from searches;
and another 3.5% are indirect referrals, resulting in a total of 21.4% of overall page
views that are based on search.
Searching now supports a wide range of activities on the Internet. Whereas
early use of web search engines concentrated on nding static data on a restricted
range of topics, current information on the Internet is constantly changing and is
useful for almost any activity or interest. A survey conducted by Careerbuilder.
com in 2006 found that one in four employers used web searches to screen poten-
tial employees, and more than half the managers interviewed decided against hir-
ing individuals after investigating their online activities. As shown in Figure13.1,
people now use search applications to retrieve information about very specic
objects like persons, products, events, and places. is is not surprising because the
Internet is now used actively and dynamically to publish information relevant to
what is going on around us. e Internet responds faster to events than traditional
news channels, allows a wider range of opinions and perspectives to be published,
encourages communication and discussion, and oers almost innite amounts of
information.
At the same time, enterprises also observe the increasing importance of good
search facilities for managing their internal activities and resources. To handle the
13.5 Querying Approaches ...............................................................................357
13.5.1 Query Disambiguation .................................................................357
13.5.2 Controlled Semantic Querying .................................................... 360
13.5.3 Semantic Query Reformulation ................................................... 360
13.5.4 Syntactic Query Reformulation ....................................................362
13.5.5 Complex Constraint Queries ........................................................363
13.6 Results ..................................................................................................... 364
13.6.1 Metadata ...................................................................................... 364
13.6.2 Conceptual Summarization ..........................................................365
13.7 Navigation .............................................................................................. 366
13.7.1 Hierarchical Renement .............................................................. 366
13.7.2 Faceted Search ..............................................................................367
13.7.3 Ontology Rules and Navigators ....................................................367
13.8 Temporal Aspects of Search .....................................................................369
13.9 Conclusions ..............................................................................................374
References .........................................................................................................375
Semantics and Search ◾  349
complexity of their businesses and take full advantage of their own competence,
they need appropriate tools for documenting and retrieving business-critical infor-
mation. is is a particular concern in evolving domains in which organizations
change constantly and must relate to new market needs, procedures, technology,
and stang.
e sheer amount of information is one of the most challenging aspects of
Internet and enterprise search applications, but other aspects also hamper the eec-
tiveness of current search technology. e reliance of standard search applications
on simple keywords is satisfactory for users who know exactly which words would
appear in the documents they request. However, most users will not know in detail
the wording of all documents, and the exibility of natural languages makes it
dicult to guess how words and phrases are used to describe phenomena. If termi-
nologies change over time, a suitable keyword today may fail to identify relevant
documents from the past. Semantic search applications address this language prob-
lem of current search technology. ey oer mechanisms for dealing with docu-
ment content rather than keywords and try to capture the variety and instability of
terms used in documents and queries.
is chapter surveys prominent approaches to semantic search. After explain-
ing the principles of semantic search in Sections 13.2 and 13.3, we discuss semantic
Category
Percent
Queries
Organization 33.94
Notable person 13.08
Specific product 11.35
Media title 10.10
General product 8.56
Business category or service 7.69
Places 4.9
Ordinary person 4.42
Event 2.31
Health 1.35
Games 1.15
Real estate 1.15
Figure 13.1 Search categories on Internet. (Source; Kumar, R. and A. Tomkins,
IEEE Data Engineering Bulletin, 32: 311, 2009.)
350 ◾  Jon Atle Gulla et al.
indexing techniques in Section 13.4. Most research on semantic search relates to
query processing and interpretation—the topic of Section 13.5. Section 13.6 is
devoted to the use of semantics on the search result page, and Section 13.7 pres-
ents techniques for semantic navigation of result sets. e temporal or evolution-
ary dimension of search is discussed in Section 13.8, followed by conclusions in
Section 13.9.
13.2 Semantic Representations
Words in natural language may be analyzed along syntactic and semantic lines.
From a syntactic perspective, we note that words must be spelled and inected
according to certain morphological rules and combine into phrases or sentences
following grammatical principles. In the syntactic realm we can decide whether
the text is well formed with respect to morphological or grammatical rules, but not
whether it is meaningful to the reader. Semantics refers to the meanings of words
and compositions of words into phrases, sentences, and larger pieces of text. Every
word is assumed to carry semantic content that combines with other words into
meaningful messages to readers. We say that a word—or term—in the syntactic
realm refers to some concept in the realm that constitutes the meaning of the
term.
Figure 13.2 shows how the correspondence between syntactic terms and
semantic concepts is modeled in WordNet, a semantic lexicon from Princeton
University (Miller, 1995). A term may be used to refer to several concepts and
a concept may form the meanings of several terms. e term car has ve dier-
ent meanings in WordNet, of which two are shown in the gure. e term may
refer to a four-wheeled (specialized) motor vehicle and a generalization of more
specialized cars like ambulances and station wagons. However, it may also refer
to railroad cars that may be further specialized into luggage cars, cabin cars, and
other types. A car may also be referred to by terms like rail car and railway car.
Concepts in the semantic realm are semantically related by means of relations
like has_parts and is_member_of. Semantic search applications go beyond tra-
ditional search engines by allowing users to search for documents on the basis
of content rather than keyword matching. With traditional search applications
based on the vector space model, query terms are matchedas they areagainst
inverted indices of terms appearing in the documents of the collection. e degree
of match between the search terms and the document terms decides whether a
particular document is deemed relevant to the query. is is often referred to as
syntactic search because no attempt is made to analyze the structures of the terms
or their semantic contents. With morpho-syntactic search, stemmers or lemmatizers
are used to normalize query and/or index terms so that any inection of a search
term will match any inection of the same lexical word in the index. is means
Semantics and Search ◾  351
that the cars search term will match the car index term; written will match the
write index term, etc. Many current commercial search applications oer some
form of morpho-syntactic search, often in combination with spell checking of
query terms.
e next step up from traditional syntactic search is using dictionaries, thesauri,
or ontologies to be match search terms against synonyms, instances, or other related
terms in an index. A search for cars would return documents citing only automobiles
because the dictionary informs the search application that cars and automobiles
are synonyms. Since ontologies depict the semantic structures of a domain, we are
now in a position to relate search terms to index terms based on semantic content
and to reason about document content. Ultimately, this semantic approach has the
capacity to help search engines match search terms against information that is not
explicitly stated in the documents but can be inferred from them via the ontology.
is is shown on the right side of Figure13.3, in which the system realizes that
cars are related to vehicles that run by means of engines. is type of human-like
interpretation and reasoning is one of the main objectives of the Semantic Web
initiative, although we still face fundamental and unsolved challenges in extracting
semantic content from text and reasoning.
Machine
Automobile
Motorcar
Auto
Car
Railcar
Railwaycar
Car1:
4-wheeledmotorvehicle;usuallypropelled
byan internalcombustionengine
Motorvehicle
Self-propelled
vehicle
Compartment
Car2:
Wheeledvehicleadaptedtotherailsof
railroad
Ambulance
Vehicle
Wheeledvehicle
Accelerator,air
bag,…
Suspension,…
Train, railroad
system,…
has_
parts
has_
parts
Is_
Member_of
CabincarLuggagecar
Syntactic realm
Semantic realm
Stationwagon
Figure 13.2 Correspondence between terms and WordNet semantics.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset