Chapter 10: Automatic Evaluation of Search Ontologies in the Entertainment Domain Using Natural Language Processing (2/5)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

280 ◾ Michael Elhadad, David Gabay, and Yael Netzer

but they do not help assess the correctness or robustness of the codes. Beyond such

metrics, we wish to dene functional quality criteria for search ontologies. Gulla et

al. [8] dene the following desirable properties in a search ontology:

◾ Concept familiarity: the terminology introduced by the ontology is strongly

connected to user terms in search queries.

◾ Document discrimination: the concept granularity in the ontology is compat-

ible with the granularity of users’ queries. is granularity compatibility

allows good grouping of search results according to the ontology concept

hierarchy.

◾ Query formulation: the depth of the hierarchy in the ontology and the com-

plexity and length of user queries should be compatible.

◾ Domain volatility: the ontology should be robust in the presence of frequent

updates.

is classication of functional quality criteria is conceptually useful, but does not

provide a methodology or concrete tools to evaluate a given ontology. is is the

task addressed in this chapter, specically for exploratory search. e evaluation

method we introduce relies on the fact that given an ontology individual (in our

domain, a movie), we can automatically retrieve large quantities of textual docu-

ments (movie reviews) associated to the individuals. On the basis of this automati-

cally acquired textual corpus, we can perform automatic linguistic analysis that

determines whether the ontology reects the information we mine in the texts.

Note that we focus on evaluating the ontology and its adequacy to the domain

as a search ontology. We do not simulate the search process or measure speci-

cally how the ontology aects steps in search operation (such as indexing, query

expansion, result set clustering). Accordingly, the evaluation we suggest, although

informed by the task (we specically evaluate a search ontology), is not a task-based

evaluation (we do not evaluate the ontology on a search benchmark).

10.4 Experimental Setting: Ontology for

Semantic Search in the Entertainment

Domain and Test Corpus

We illustrate our ontology evaluation method in the context of the entertainment

domain. We rst describe the experimental setup. Our objective is to support

exploratory search over a set of documents describing movies, actors, and related

information in the domain. e ontology we evaluate is automatically acquired

from semi-structured data sources (IMDb, Wikipedia, and other similar sources).

Table10.1 shows the size of the ontology we used. As is appropriate for a search

ontology, the ontology is wide and shallow.

Automatic Evaluation of Search Ontologies ◾ 281

e rst step of our ontology evaluation method was compiling from the

domain a corpus of texts distinct from the documents used for acquisition of the

ontology. We then used standard natural language processing (NLP) techniques

to evaluate the ontology by testing various hypotheses on the collected corpus and

report on three experiments:

◾ Measuring coverage and term alignment: we attempted to test the adequacy of

the ontology with respect to concept familiarity (cf. Section 10.3). is cover-

age experiment is discussed in Section 10.5.

◾ Measuring classication tness on movie genres: we attempted to test the poten-

tial of the ontology to properly organize movies into genres. Genre (com-

edy, drama, etc.) is a critical metadata attribute of movies. is classication

experiment is described in Section 10.6.

◾ Measuring topic identication tness: we assessed the capacity of the ontology

to capture the notion of movie topics that describe what a movie is about;

topics are distinct from genres. ey are most often described by keywords.

is topic experiment is described in Section 10.7.

Each experiment exploits a dierent NLP technique. For the coverage and term

alignment experiment, we used fuzzy string matching techniques and named entity

recognition (NER). For the classication experiment, we used text classication,

and for the topic experiment, LDA topic modeling [18]. We used the same test

corpus for all experiments and constructed it such that documents were aligned

to ontology individuals. We constructed the corpus automatically by mining

movie reviews from the Web. We collected professional, edited reviews taken from

Robert Ebert’s Web site,* additional professional and user reviews published on

the Metacritic Web site,

†

and 13 similar Web sources. e key metadata collected

for each document is a unique identier indicating the movie to which the text is

associated. e corpus we constructed contained 11,706 reviews (of 3,146 movies)

and 8.7 million words (average of 749 words per review).

http://rogerebert.suntimes.com

†

http://www.metacritic.com

Table10.1 Size of Movies Ontology

Classes 33

Class individuals 351,066

Relations 27

Movies 8,446

Persons 116,770

282 ◾ Michael Elhadad, David Gabay, and Yael Netzer

10.5 Evaluating Ontology Coverage of Query Terms

10.5.1 Objective

Consider a fact-nding search scenario. A user seeks precise results and knows what

results should be attained. e main services expected from the search ontology to

support this scenario are: (1) production of highly precise results and wide coverage

for terms used in the queries; (2) providing entity recognition functionality to allow

fuzzy string matching and identifying terminological variations; and (3) identifying

anchors, i.e., minimal facts that identify a movie (for example, title, publication year,

main actors, main keywords). For a populated ontology, we want to assure that the

individuals in the ontology match the entities in the actual domain (the ontology

describes the correct set of individuals). We also want to verify that the way the ontol-

ogy individuals are retrieved corresponds to the terminology used in actual search

queries (the ontology uses the correct terms to refer to individuals). Our experiment

assesses these two dimensions by testing hypotheses on the test corpus.

10.5.2 Hypothesis

We wanted to test whether the terminology used in the ontology to refer to entities

(movies, actors, directors) corresponded to the terminology used in queries. We had

no access to a search query log or manually annotated data that would match terms

in a query log to ontology individuals. Instead, we used our test corpus as a proxy;

we considered that naturally occurring text about movies is similar (as far as refer-

ences to movies and actors) to queries users would submit.

e task we addressed was unsupervised; the corpus was not annotated manu-

ally (an expensive operation). We wanted to assess the extent to which a named

entity in text could be mapped to a term in the ontology. e hypothesis we for-

mulated is that if an ontology has good term coverage, named entities found in text

will be found in the ontology (high coverage) and mapping from a named entity

found in text to an individual in the ontology will be accurate (no ambiguity).

10.5.3 Assessing Ontology Coverage

To assess the ontology coverage, we measured the overlaps between named entities

that appeared in the corpus and the terms that appeared in the ontology. We rst

gathered a collection of potential named-entity labels in the corpus. In professional

reviews, named entities are generally marked in the html source. User reviews are

not edited nor formatted. For such reviews, we relied on the OpenCalais* named

entity recognizer (NER) to tag named entities in the corpus. is system recog-

nizes multiword expressions that refer to proper names of people, organizations,

http://www.opencalais.com/

Automatic Evaluation of Search Ontologies ◾ 283

and products. We then extracted all person names from the textual corpus and

searched the labels for each entity in the ontology.

Results show that 74% of the named-entities that appear in professional reviews

also appear as terms in our ontology. For user reviews (nonedited), the gure is 50%.

e main reasons for mismatches lay in orthography variations (accents or trans-

literation dierences), mentions of people not related to a movie in the reviews, and

aliasing or spelling variations (mostly in user reviews). We conclude that the cover-

age of people entities in the ontology is satisfactory. However, whether a search for

these entities will nd them or will nd the intended individuals in the ontology is

not certain. is fuzziness is caused by term variation (as observed especially in user

reviews) and term ambiguities.

10.5.4 Assessing Terminological Precision

To investigate terminological variation, we measured the ambiguity levels of named-

entity labels. By ambiguity, we refer to the possibility that a single name refers to

more than one ontology individual. Variation relates to the opposite case—one

ontology individual can be described by various terms in text.

We measured the level of terminological variation for each ontology individual,

i.e., given a single ontology individual (e.g., an actor), how many variations of the

name are found in the corpus? Bilenko and Mooney [19] used a similar method

in a dierent setting. To identify variations in the text, we used the StringMetrics

similarity matching library.* We experimented with the Levenstein, Jaro-Winkler,

and q-gram similarity measures. For example, using such similarity measures, we

could match Bill Jackson (a name often used in blogs to informally refer to the

actor) with William Jackson (the name under which the actor is described in the

ontology). Such exibility in aligning query terms with ontology terms increases

search system recall but also risk of precision loss is introduced when two dis-

tinct individuals in the ontology can be named by the same term. For example,

if an ontology contains an actor named Bill Johnson and another named William

Johnson, a fuzzy string matching would confuse the two actors.

To measure the practical impacts of the name variability and ambiguity factors,

we extracted information from the corpus of movie reviews we collected. We rst

developed a NER specialized to the Movies domain. (e OpenCalais NER we

used above properly tags person names, but cannot distinguish actors and directors

or identify movie names.) We manually tagged a corpus of 200 movie reviews from

the Ebert corpus, to indicate the occurrences of movie names and actor names.

We then applied the YAMCHA

†

package to train an automatic NER system on

our corpus. YAMCHA uses a support vector machine (SVM) classier to recog-

nize named entities in text based on features describing each word. We used two

http://www.dcs.shef.ac.uk/sam/stringmetrics.html

†

http://chasen.org/~taku/software/yamcha/

284 ◾ Michael Elhadad, David Gabay, and Yael Netzer

dierent sets of features to train the system: (1) whether words started with capital

letters (strong indications of a proper name) and (2) whether names occurred in a

gazette (manually compiled list of proper names). We also used contextual features

(properties of words around the words to be classied).

e main challenges a statistically trained NER system addresses are (1) iden-

tifying a sequence of several words as a single named expression (for example, Bill

Gates 3rd) and (2) recognizing through generalization that words that do not

appear in a predened gazette of names are similar in their distribution to known

names, so that terms not observed earlier as proper names are properly recognized

as such. A NER system must also use contextual information to avoid tagging a

word that does appear in the gazette but is not used as a proper name (for example,

Bill paid the bill). Finally, a NER system must distinguish between proper names

referring to actors and movies.

Table10.2 summarizes the results of the trained NER system. e system on

average is capable of properly identifying 92% of the movies named in the reviews

and 94% of the persons.

Let us now consider a given text as a search query. If we apply our NER system

to the text of the query, we will properly tag named movies or actors. e issue we

now address is how successful we will be in aligning a named entity in the query

with the corresponding individual in the ontology? For a version of the ontology

including 117,556 individuals referring to persons, taking into account surnames

only, we found that 83% of the names may refer to more than one instance of the

ontology. We also found that over 18 name variations on average for each ontology

instance actually occurred in the corpus.

10.5.5 Conclusion: Ontology Terminological Coverage

In conclusion, while the coverage of the ontology originally looked promising,

we found that based on name variability and ambiguity, aligning a query with

Table10.2 Performance of Named Entity Recognizer on Movies

Domain

Precision Recall F

Movie exact match 91.56 92.43 91.87

Movie boundary match 91.80 92.66 92.10

Person exact match 97.68 93.76 95.67

Person boundary match 97.84 93.92 95.83

Average accuracy 99.32

Average boundary accuracy 99.34

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 10: Automatic Evaluation of Search Ontologies in the Entertainment Domain Using Natural Language Processing (2/5)

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 10: Automatic Evaluation of Search Ontologies in the Entertainment Domain Using Natural Language Processing (2/5)