Data-Driven Evaluation of Ontologies ◾ 229
features that correspond to the nodes of the AVT (PROP-NBL). e classiers
generated by AVT-NBL are substantially more compact than those generated by
NBL and PROP-NBL. ese results hold across a wide range of missing attribute
values in data sets. Hence, the performance of naive Bayes classiers generated
by AVT-NBL when supplied with AVTs generated by AVT learner provide useful
measures of the eectiveness of AVT learner in discovering hierarchical groupings
of attribute values that are useful in constructing compact and accurate classiers
from data.
9.3.2 WTNBL-MN Algorithm
If you understand the underlying idea of AVT-guided variants of standard learning
algorithms, it is easy to understand the other taxonomy-aware algorithms because
they are similar. e problem of learning classiers from a word taxonomy and
sequence data is a natural generalization of the problem of learning classiers from
sequence data. An original data set D is a collection of labeled instances <I
j
, C
j
>
where IÎI. A classier is a hypothesis in the form of function h: I→C, whose domain
is the instance space I and range is the set of class C. A hypothesis space H is a set of
hypotheses that can be represented in some hypothesis language or by a parameter-
ized family of functions. e task of learning classiers from data set D is to induce
a hypothesis hÎH that satises given criteria.
Learning classiers from word taxonomy and data can be described by assum-
ing a word taxonomy T
Σ
over words Σ and a data set D. e aim is to induce a
classier h
γ*
: I
γ*
→C where γ* is a cut that maximizes given criteria. Note that
the resulting hypothesis space
H
ˆ
γ
of a chosen cut
γ
is ecient in searching for
both concise and accurate hypotheses. Word taxonomy-guided NBL has two major
components: (1) estimation of parameters of naive Bayes classiers based on a cut,
and (b) a criterion for rening a cut.
9.3.3 Aggregation of Class Conditional Frequency Counts
We can estimate the relevant parameters of a naive Bayes classier eciently by
aggregating class conditional frequency counts. For a particular node of a given
cut, parameters of the node can be estimated by summing the class conditional
frequency counts of its children (Zhang and Honavar, 2004).
Given word taxonomy T
Σ
, we can dene a tree of class conditional frequency
counts T
f
so that there is one-to-one correspondence between the nodes of word
taxonomy T
Σ
and the nodes of the corresponding T
f
. e class conditional frequency
counts associated with a nonleaf node of T
f
are aggregations of the corresponding
class conditional frequency counts associated with its children. Because a cut through
word taxonomy corresponds to a partition of the set of words, the corresponding cut
through T
f
species a valid class conditional probability table for words. erefore, to
estimate each node of T
f
, we simply estimate the class conditional frequency counts