362 ◾  Jon Atle Gulla et al.
13.5.4 Syntactic Query Reformulation
With syntactic query reformulation, we map a syntactic query onto semantic con-
cepts, interpret the query semantically, and map the result back to a syntactic query
of weighted terms (Solskinnsbakk and Gulla, 2008). Unlike semantic query refor-
mulation, this approach uses a standard inverted index based on term–document
matrices and requires that ontologies be trained on a representative set of docu-
ments in the document collection. For every concept in the ontology, we allocate
a set of relevant documents from this set and generate a concept signature for con-
cept i, C
i
= (t
i1
, …, t
in
), as shown in Figure13.9. e resulting concept signature
for SCOPE PLANNING: C
SP
= (scope planning:0.0097, project scope:0.0047,
product:0.0043, project work:0.0008, project:0.0001), tells us that the scope plan-
ning term is the most relevant reference to the concept, but terms like project scope
and product may also be used to describe aspects of the concept. is signature
is a vector of weighted nouns and noun phrases characteristic of the concepts as
described in the documents.
Technically, the weights are calculated using tf-idf scores of stemmed terms in
the documents. From the query vector Q = (tqi, …, tqn), we compute the cosine
similarity with all concepts C
i
of the ontology and build a new query vector, Q =
(c1, …, cK), that includes all concepts C
i
that had similarity scores above threshold
α. Q is the semantic interpretation of Q and has already dealt with possible ambi-
guities of the query. Due to the similarity calculations, the top ranked concepts are
semantically close to the whole set of terms of Q and not only to individual terms.
A syntactic query Q is nally generated by adding the concept signatures of all
concepts included in Q:
Figure 13.9 Generating the signature of SCOPE_PLANNING concept.
Semantics and Search ◾  363
Q’ = C
i
+ … + C
K
= (t
Qi
, …, t
Qn
).
Figure13.10 demonstrates the approach with a small example. From the concept
signatures, we know that the bank term is related to the FINANCIAL_BANK
concept with a weight of 0.5 and to RIVER_BANK with a weight of 0.2. Since this
query contains no other query terms, the term will be mapped onto FINANCIAL_
BANK due to the higher weight. After the term is disambiguated in this manner
and mapped onto suitable concepts, the concepts are mapped back to query terms
to form a new expanded syntactic query. In the gure, the semantic query Q =
(FINANCIAL_BANK) is mapped to the weighted query Q = (bank:0.5 bank-
ing company”:0.9 “credit union:0.6).
In this way, we managed to both disambiguate the query and add other seman-
tically related terms for better recall. e approach has the advantage of being used
on top of a standard search engine that supports weighted search terms. Since it
tends to add in more terms though, it tends to be better for recall than precision.
Formica et al. (2008) have a similar approach, SenSim, that depends on a reference
ontology with an ISA hierarchy weighted on the basis of a probability distribution.
Burton-Jones et al. (2003) use hypernyms from WordNet and the DAML ontol-
ogy library to expand queries. Similar strategies are used by Pinheiro et al. (2004),
Revuri et al. (2006) and Rocha et al. (2004). e Inquirus2 metasearch engine
expands queries with the users’ information need category (Glover et al., 2001).
13.5.5 Complex Constraint Queries
Some semantic systems allow users to formulate precise semantic queries using
complex semantic operators and references to ontology concepts and instances. In
the SemSearch system, for example, a user may specify to which RDFS or OWL
class a result should belong (Lei et al., 2006). Another example is GRQL, which
allows users to build graph pattern queries by navigating the ontology graphically
(Athanasis et al., 2004).
Formal constraint queries may accurately represent user information needs,
though they are complex and time-consuming to formulate. Even when graphical
FINANCIAL_BANK
Bank
RIVER_BANK
Concepts
Terms
0.5
0.2
0.6
0.9
Relationships to other concepts
Relationships to other concepts
Credit unionBanking company
Figure 13.10 Correspondence between concepts and terms.
364 ◾  Jon Atle Gulla et al.
formalisms are used, special training is needed for users to compose such queries.
Furthermore, modeling the user’s information needs so accurately only makes sense
if the exact content of all documents can be extracted and modeled correspondingly
at indexing time.
13.6 Results
In addition to ranked lists of documents, semantic search applications may display
metadata of recognized concepts in queries and result sets or provide conceptual
summaries of documents or whole result sets.
13.6.1 Metadata
Many semantic search systems divide the result page into two sections: one for the
standard list of ranked documents and the other for a structured list of metadata.
is requires interpretation of a query as referring to a particular entity or topic
that is structurally described in the ontology. e metadata section can also be
used to rene the query by changing the attributes of one or more of the metadata
presented.
Squirrel is an experimental search engine that uses metadata with drill-down
facilities as an integral part of the result page (Duke et al., 2007). As seen from
Figure13.11, every document is displayed with relevant metadata, and important
Figure 13.11 Retrieved documents presented with metadata and annotations.
(Source: Duke, A., T. Glover, and J. Davies. In Proceedings of Fourth European
Semantic Web Conference, Innsbruck, 2007, pp. 341–355.)
Semantics and Search ◾  365
terms of the documents are annotated with explanatory text. A related strategy in
which user-centered facets are used to cluster result documents is implemented in
Suominen et al. (2007).
13.6.2 Conceptual Summarization
Conceptual summarization techniques are applied to extract ontological con-
cepts, instances, and relationships characteristic to the documents and present
them as structured semantic summaries. e summaries may cover a search as
a whole or each individual document in the result set. For most systems, this
means presenting a list of conceptual keywords or tags. is is a text mining
technique that can be done on-the-y on the result page. However, a list of three
or four keywords does not necessarily give a satisfactory summary to a domain
nonexpert.
SenseBot is an example of a commercial search engine that generates tag cloud
summaries of the document result set. e top part of Figure13.12 tells us that
Figure 13.12 Result page from SenseBot (www.sensebot.net).
366 ◾  Jon Atle Gulla et al.
Eiel Tower and Paris are the top two concepts describing the documents retrieved
for query eiel tower.
If no semantics are used, the summaries typically consist of the terms ranked
on the basis of statistical scores like tf-idf. More advanced semantic approaches
maintain ontologies in which each concept is given a vector representation, like
the scope-planning concept signature in Figure13.8 or the centroid representation
used in the Rocchio text classication technique. Conceptual summarization is
the process of categorizing text with respect to the concepts in the ontology using
standard categorization techniques like Rocchio, k nearest neighbor, and naive
Bayes (Manning et al., 2008). With the Rocchio approach, we build a centroid
vector for each concept of the ontology. is centroid, A
C
, is the vector average
or center of mass of documents that were manually determined earlier to describe
the concept:
A
D
V d
c
C
d D
C
=
1
( )
where D
C
is the set of documents describing concept C, and V(d) is the normal-
ized vector representation of document d in D
C
. A document or document set is
represented by a vector R. e conceptual summary of the document (set) is given
by the ranked list of concepts C
i
that has a cosine similarity with R above a certain
threshold α:
similarity(R, A
C
i
) = cos (θ) > α
In Figure13.12, the document set retrieved for the Eiel Tower query shows the
highest cosine similarity with the Eiel Tower, Paris, city, construction, and restau-
rant ontology concepts.
13.7 Navigation
Several search applications add navigational features to result pages for easy selec-
tion of the most interesting parts of a result set.
13.7.1 Hierarchical Refinement
A search application may suggest generalizing or specializing a query by providing
concept names that are hierarchically related to the concepts identied in the query.
e strategy is basically similar to automatic query specialization except that the
user is left to choose. e functionality is supported by several systems (Duke et al.,
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset