Chapter 13: Semantics and Search (4/7)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

362 ◾ Jon Atle Gulla et al.

13.5.4 Syntactic Query Reformulation

With syntactic query reformulation, we map a syntactic query onto semantic con-

cepts, interpret the query semantically, and map the result back to a syntactic query

of weighted terms (Solskinnsbakk and Gulla, 2008). Unlike semantic query refor-

mulation, this approach uses a standard inverted index based on term–document

matrices and requires that ontologies be trained on a representative set of docu-

ments in the document collection. For every concept in the ontology, we allocate

a set of relevant documents from this set and generate a concept signature for con-

cept i, C

= (t

, …, t

), as shown in Figure13.9. e resulting concept signature

for SCOPE PLANNING: C

= (scope planning:0.0097, project scope:0.0047,

product:0.0043, project work:0.0008, project:0.0001), tells us that the scope plan-

ning term is the most relevant reference to the concept, but terms like project scope

and product may also be used to describe aspects of the concept. is signature

is a vector of weighted nouns and noun phrases characteristic of the concepts as

described in the documents.

Technically, the weights are calculated using tf-idf scores of stemmed terms in

the documents. From the query vector Q = (tqi, …, tqn), we compute the cosine

similarity with all concepts C

of the ontology and build a new query vector, Q′ =

(c1, …, cK), that includes all concepts C

that had similarity scores above threshold

α. Q′ is the semantic interpretation of Q and has already dealt with possible ambi-

guities of the query. Due to the similarity calculations, the top ranked concepts are

semantically close to the whole set of terms of Q and not only to individual terms.

A syntactic query Q″ is nally generated by adding the concept signatures of all

concepts included in Q′:

Figure 13.9 Generating the signature of SCOPE_PLANNING concept.

Semantics and Search ◾ 363

Q″’ = C

+ … + C

= (t

, …, t

Figure13.10 demonstrates the approach with a small example. From the concept

signatures, we know that the bank term is related to the FINANCIAL_BANK

concept with a weight of 0.5 and to RIVER_BANK with a weight of 0.2. Since this

query contains no other query terms, the term will be mapped onto FINANCIAL_

BANK due to the higher weight. After the term is disambiguated in this manner

and mapped onto suitable concepts, the concepts are mapped back to query terms

to form a new expanded syntactic query. In the gure, the semantic query Q′ =

(FINANCIAL_BANK) is mapped to the weighted query Q″ = (bank:0.5 “bank-

ing company”:0.9 “credit union”:0.6).

In this way, we managed to both disambiguate the query and add other seman-

tically related terms for better recall. e approach has the advantage of being used

on top of a standard search engine that supports weighted search terms. Since it

tends to add in more terms though, it tends to be better for recall than precision.

Formica et al. (2008) have a similar approach, SenSim, that depends on a reference

ontology with an ISA hierarchy weighted on the basis of a probability distribution.

Burton-Jones et al. (2003) use hypernyms from WordNet and the DAML ontol-

ogy library to expand queries. Similar strategies are used by Pinheiro et al. (2004),

Revuri et al. (2006) and Rocha et al. (2004). e Inquirus2 metasearch engine

expands queries with the users’ information need category (Glover et al., 2001).

13.5.5 Complex Constraint Queries

Some semantic systems allow users to formulate precise semantic queries using

complex semantic operators and references to ontology concepts and instances. In

the SemSearch system, for example, a user may specify to which RDFS or OWL

class a result should belong (Lei et al., 2006). Another example is GRQL, which

allows users to build graph pattern queries by navigating the ontology graphically

(Athanasis et al., 2004).

Formal constraint queries may accurately represent user information needs,

though they are complex and time-consuming to formulate. Even when graphical

FINANCIAL_BANK

Bank

RIVER_BANK

Concepts

Terms

0.5

0.2

0.6

0.9

Relationships to other concepts

Credit unionBanking company

Figure 13.10 Correspondence between concepts and terms.

364 ◾ Jon Atle Gulla et al.

formalisms are used, special training is needed for users to compose such queries.

Furthermore, modeling the user’s information needs so accurately only makes sense

if the exact content of all documents can be extracted and modeled correspondingly

at indexing time.

13.6 Results

In addition to ranked lists of documents, semantic search applications may display

metadata of recognized concepts in queries and result sets or provide conceptual

summaries of documents or whole result sets.

13.6.1 Metadata

Many semantic search systems divide the result page into two sections: one for the

standard list of ranked documents and the other for a structured list of metadata.

is requires interpretation of a query as referring to a particular entity or topic

that is structurally described in the ontology. e metadata section can also be

used to rene the query by changing the attributes of one or more of the metadata

presented.

Squirrel is an experimental search engine that uses metadata with drill-down

facilities as an integral part of the result page (Duke et al., 2007). As seen from

Figure13.11, every document is displayed with relevant metadata, and important

Figure 13.11 Retrieved documents presented with metadata and annotations.

(Source: Duke, A., T. Glover, and J. Davies. In Proceedings of Fourth European

Semantic Web Conference, Innsbruck, 2007, pp. 341–355.)

Semantics and Search ◾ 365

terms of the documents are annotated with explanatory text. A related strategy in

which user-centered facets are used to cluster result documents is implemented in

Suominen et al. (2007).

13.6.2 Conceptual Summarization

Conceptual summarization techniques are applied to extract ontological con-

cepts, instances, and relationships characteristic to the documents and present

them as structured semantic summaries. e summaries may cover a search as

a whole or each individual document in the result set. For most systems, this

means presenting a list of conceptual keywords or tags. is is a text mining

technique that can be done on-the-y on the result page. However, a list of three

or four keywords does not necessarily give a satisfactory summary to a domain

nonexpert.

SenseBot is an example of a commercial search engine that generates tag cloud

summaries of the document result set. e top part of Figure13.12 tells us that

Figure 13.12 Result page from SenseBot (www.sensebot.net).

366 ◾ Jon Atle Gulla et al.

Eiel Tower and Paris are the top two concepts describing the documents retrieved

for query eiel tower.

If no semantics are used, the summaries typically consist of the terms ranked

on the basis of statistical scores like tf-idf. More advanced semantic approaches

maintain ontologies in which each concept is given a vector representation, like

the scope-planning concept signature in Figure13.8 or the centroid representation

used in the Rocchio text classication technique. Conceptual summarization is

the process of categorizing text with respect to the concepts in the ontology using

standard categorization techniques like Rocchio, k nearest neighbor, and naive

Bayes (Manning et al., 2008). With the Rocchio approach, we build a centroid

vector for each concept of the ontology. is centroid, A

, is the vector average

or center of mass of documents that were manually determined earlier to describe

the concept:

V d

d D

∈

∑

( )

where D

is the set of documents describing concept C, and V(d) is the normal-

ized vector representation of document d in D

. A document or document set is

represented by a vector R. e conceptual summary of the document (set) is given

by the ranked list of concepts C

that has a cosine similarity with R above a certain

threshold α:

similarity(R, A

) = cos (θ) > α

In Figure13.12, the document set retrieved for the Eiel Tower query shows the

highest cosine similarity with the Eiel Tower, Paris, city, construction, and restau-

rant ontology concepts.

13.7 Navigation

Several search applications add navigational features to result pages for easy selec-

tion of the most interesting parts of a result set.

13.7.1 Hierarchical Reﬁnement

A search application may suggest generalizing or specializing a query by providing

concept names that are hierarchically related to the concepts identied in the query.

e strategy is basically similar to automatic query specialization except that the

user is left to choose. e functionality is supported by several systems (Duke et al.,

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 13: Semantics and Search (4/7)

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 13: Semantics and Search (4/7)