280 ◾  Michael Elhadad, David Gabay, and Yael Netzer
but they do not help assess the correctness or robustness of the codes. Beyond such
metrics, we wish to dene functional quality criteria for search ontologies. Gulla et
al. [8] dene the following desirable properties in a search ontology:
Concept familiarity: the terminology introduced by the ontology is strongly
connected to user terms in search queries.
Document discrimination: the concept granularity in the ontology is compat-
ible with the granularity of users’ queries. is granularity compatibility
allows good grouping of search results according to the ontology concept
hierarchy.
Query formulation: the depth of the hierarchy in the ontology and the com-
plexity and length of user queries should be compatible.
Domain volatility: the ontology should be robust in the presence of frequent
updates.
is classication of functional quality criteria is conceptually useful, but does not
provide a methodology or concrete tools to evaluate a given ontology. is is the
task addressed in this chapter, specically for exploratory search. e evaluation
method we introduce relies on the fact that given an ontology individual (in our
domain, a movie), we can automatically retrieve large quantities of textual docu-
ments (movie reviews) associated to the individuals. On the basis of this automati-
cally acquired textual corpus, we can perform automatic linguistic analysis that
determines whether the ontology reects the information we mine in the texts.
Note that we focus on evaluating the ontology and its adequacy to the domain
as a search ontology. We do not simulate the search process or measure speci-
cally how the ontology aects steps in search operation (such as indexing, query
expansion, result set clustering). Accordingly, the evaluation we suggest, although
informed by the task (we specically evaluate a search ontology), is not a task-based
evaluation (we do not evaluate the ontology on a search benchmark).
10.4 Experimental Setting: Ontology for
Semantic Search in the Entertainment
Domain and Test Corpus
We illustrate our ontology evaluation method in the context of the entertainment
domain. We rst describe the experimental setup. Our objective is to support
exploratory search over a set of documents describing movies, actors, and related
information in the domain. e ontology we evaluate is automatically acquired
from semi-structured data sources (IMDb, Wikipedia, and other similar sources).
Table10.1 shows the size of the ontology we used. As is appropriate for a search
ontology, the ontology is wide and shallow.
Automatic Evaluation of Search Ontologies ◾  281
e rst step of our ontology evaluation method was compiling from the
domain a corpus of texts distinct from the documents used for acquisition of the
ontology. We then used standard natural language processing (NLP) techniques
to evaluate the ontology by testing various hypotheses on the collected corpus and
report on three experiments:
Measuring coverage and term alignment: we attempted to test the adequacy of
the ontology with respect to concept familiarity (cf. Section 10.3). is cover-
age experiment is discussed in Section 10.5.
Measuring classication tness on movie genres: we attempted to test the poten-
tial of the ontology to properly organize movies into genres. Genre (com-
edy, drama, etc.) is a critical metadata attribute of movies. is classication
experiment is described in Section 10.6.
Measuring topic identication tness: we assessed the capacity of the ontology
to capture the notion of movie topics that describe what a movie is about;
topics are distinct from genres. ey are most often described by keywords.
is topic experiment is described in Section 10.7.
Each experiment exploits a dierent NLP technique. For the coverage and term
alignment experiment, we used fuzzy string matching techniques and named entity
recognition (NER). For the classication experiment, we used text classication,
and for the topic experiment, LDA topic modeling [18]. We used the same test
corpus for all experiments and constructed it such that documents were aligned
to ontology individuals. We constructed the corpus automatically by mining
movie reviews from the Web. We collected professional, edited reviews taken from
Robert Ebert’s Web site,* additional professional and user reviews published on
the Metacritic Web site,
and 13 similar Web sources. e key metadata collected
for each document is a unique identier indicating the movie to which the text is
associated. e corpus we constructed contained 11,706 reviews (of 3,146 movies)
and 8.7 million words (average of 749 words per review).
*
http://rogerebert.suntimes.com
http://www.metacritic.com
Table10.1 Size of Movies Ontology
Classes 33
Class individuals 351,066
Relations 27
Movies 8,446
Persons 116,770
282 ◾  Michael Elhadad, David Gabay, and Yael Netzer
10.5 Evaluating Ontology Coverage of Query Terms
10.5.1 Objective
Consider a fact-nding search scenario. A user seeks precise results and knows what
results should be attained. e main services expected from the search ontology to
support this scenario are: (1) production of highly precise results and wide coverage
for terms used in the queries; (2) providing entity recognition functionality to allow
fuzzy string matching and identifying terminological variations; and (3) identifying
anchors, i.e., minimal facts that identify a movie (for example, title, publication year,
main actors, main keywords). For a populated ontology, we want to assure that the
individuals in the ontology match the entities in the actual domain (the ontology
describes the correct set of individuals). We also want to verify that the way the ontol-
ogy individuals are retrieved corresponds to the terminology used in actual search
queries (the ontology uses the correct terms to refer to individuals). Our experiment
assesses these two dimensions by testing hypotheses on the test corpus.
10.5.2 Hypothesis
We wanted to test whether the terminology used in the ontology to refer to entities
(movies, actors, directors) corresponded to the terminology used in queries. We had
no access to a search query log or manually annotated data that would match terms
in a query log to ontology individuals. Instead, we used our test corpus as a proxy;
we considered that naturally occurring text about movies is similar (as far as refer-
ences to movies and actors) to queries users would submit.
e task we addressed was unsupervised; the corpus was not annotated manu-
ally (an expensive operation). We wanted to assess the extent to which a named
entity in text could be mapped to a term in the ontology. e hypothesis we for-
mulated is that if an ontology has good term coverage, named entities found in text
will be found in the ontology (high coverage) and mapping from a named entity
found in text to an individual in the ontology will be accurate (no ambiguity).
10.5.3 Assessing Ontology Coverage
To assess the ontology coverage, we measured the overlaps between named entities
that appeared in the corpus and the terms that appeared in the ontology. We rst
gathered a collection of potential named-entity labels in the corpus. In professional
reviews, named entities are generally marked in the html source. User reviews are
not edited nor formatted. For such reviews, we relied on the OpenCalais* named
entity recognizer (NER) to tag named entities in the corpus. is system recog-
nizes multiword expressions that refer to proper names of people, organizations,
*
http://www.opencalais.com/
Automatic Evaluation of Search Ontologies ◾  283
and products. We then extracted all person names from the textual corpus and
searched the labels for each entity in the ontology.
Results show that 74% of the named-entities that appear in professional reviews
also appear as terms in our ontology. For user reviews (nonedited), the gure is 50%.
e main reasons for mismatches lay in orthography variations (accents or trans-
literation dierences), mentions of people not related to a movie in the reviews, and
aliasing or spelling variations (mostly in user reviews). We conclude that the cover-
age of people entities in the ontology is satisfactory. However, whether a search for
these entities will nd them or will nd the intended individuals in the ontology is
not certain. is fuzziness is caused by term variation (as observed especially in user
reviews) and term ambiguities.
10.5.4 Assessing Terminological Precision
To investigate terminological variation, we measured the ambiguity levels of named-
entity labels. By ambiguity, we refer to the possibility that a single name refers to
more than one ontology individual. Variation relates to the opposite caseone
ontology individual can be described by various terms in text.
We measured the level of terminological variation for each ontology individual,
i.e., given a single ontology individual (e.g., an actor), how many variations of the
name are found in the corpus? Bilenko and Mooney [19] used a similar method
in a dierent setting. To identify variations in the text, we used the StringMetrics
similarity matching library.* We experimented with the Levenstein, Jaro-Winkler,
and q-gram similarity measures. For example, using such similarity measures, we
could match Bill Jackson (a name often used in blogs to informally refer to the
actor) with William Jackson (the name under which the actor is described in the
ontology). Such exibility in aligning query terms with ontology terms increases
search system recall but also risk of precision loss is introduced when two dis-
tinct individuals in the ontology can be named by the same term. For example,
if an ontology contains an actor named Bill Johnson and another named William
Johnson, a fuzzy string matching would confuse the two actors.
To measure the practical impacts of the name variability and ambiguity factors,
we extracted information from the corpus of movie reviews we collected. We rst
developed a NER specialized to the Movies domain. (e OpenCalais NER we
used above properly tags person names, but cannot distinguish actors and directors
or identify movie names.) We manually tagged a corpus of 200 movie reviews from
the Ebert corpus, to indicate the occurrences of movie names and actor names.
We then applied the YAMCHA
package to train an automatic NER system on
our corpus. YAMCHA uses a support vector machine (SVM) classier to recog-
nize named entities in text based on features describing each word. We used two
*
http://www.dcs.shef.ac.uk/sam/stringmetrics.html
http://chasen.org/~taku/software/yamcha/
284 ◾  Michael Elhadad, David Gabay, and Yael Netzer
dierent sets of features to train the system: (1) whether words started with capital
letters (strong indications of a proper name) and (2) whether names occurred in a
gazette (manually compiled list of proper names). We also used contextual features
(properties of words around the words to be classied).
e main challenges a statistically trained NER system addresses are (1) iden-
tifying a sequence of several words as a single named expression (for example, Bill
Gates 3rd) and (2) recognizing through generalization that words that do not
appear in a predened gazette of names are similar in their distribution to known
names, so that terms not observed earlier as proper names are properly recognized
as such. A NER system must also use contextual information to avoid tagging a
word that does appear in the gazette but is not used as a proper name (for example,
Bill paid the bill). Finally, a NER system must distinguish between proper names
referring to actors and movies.
Table10.2 summarizes the results of the trained NER system. e system on
average is capable of properly identifying 92% of the movies named in the reviews
and 94% of the persons.
Let us now consider a given text as a search query. If we apply our NER system
to the text of the query, we will properly tag named movies or actors. e issue we
now address is how successful we will be in aligning a named entity in the query
with the corresponding individual in the ontology? For a version of the ontology
including 117,556 individuals referring to persons, taking into account surnames
only, we found that 83% of the names may refer to more than one instance of the
ontology. We also found that over 18 name variations on average for each ontology
instance actually occurred in the corpus.
10.5.5 Conclusion: Ontology Terminological Coverage
In conclusion, while the coverage of the ontology originally looked promising,
we found that based on name variability and ambiguity, aligning a query with
Table10.2 Performance of Named Entity Recognizer on Movies
Domain
Precision Recall F
Movie exact match 91.56 92.43 91.87
Movie boundary match 91.80 92.66 92.10
Person exact match 97.68 93.76 95.67
Person boundary match 97.84 93.92 95.83
Average accuracy 99.32
Average boundary accuracy 99.34
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset