264 ◾ Dae-Ki Kang
Comparison with data sets augmented with PAT—Zhang et al. (2006)
presented attribute value taxonomy guided naive Bayes learner (AVT-NBL), an
algorithm that exploits AVTs to generate naive Bayes classiers that are more com-
pact and often more accurate than classiers that do not use AVTs. Zhang et al.
included the comparison of their algorithm on original benchmark data sets with
the standard naive Bayes learner on propositionalized data sets. To generate the
propositionalized data sets, after taxonomy construction for each attribute in the
original data, they appended abstract attributes of the constructed taxonomy (all
nonleaf nodes in the taxonomy) to the original data as extra attributes. Unlike
their propositionalization, we propositionalized original data rst, then generated
one large propositionalized attribute taxonomy. One of main weaknesses of using
propositionalized data is that because of clear dependency relationships among the
attributes of the propositionalized data, the performance of the learning algorithm
degrades if it relies on an assumption that each attribute is independent of the
other attributes like naive Bayes learners. Unlike standard naive Bayes classiers on
propositionalized data, PAT-NBL can also perform regularization over a taxonomy
using model selection criteria to minimize over-tting from learning.
Table9.10 shows a comparison of PAT-NBL with CLL criteria and the standard
naive Bayes learning algorithm on propositionalized data sets. None of the learning
algorithms exhibited superior accuracy. However, PAT-NBL with CLL criteria gen-
erated more compact naive Bayes classiers (shown in the size columns in Table9.10)
than those from the standard NBL algorithm on propositionalized data sets.
9.5 Summary and Discussion
9.5.1 Summary
We represented ontologies in various data structures (list, tree or directed acyclic
graph, and directed/undirected graph). Graphs are very powerful data structures for
expressing ontology for machine learning algorithms, but inferences using graphs
for evaluation are widely believed to be intractable. Taxonomy is one of the most
common forms of ontology and is useful for constructing compact, robust, and
comprehensible classiers, although human-designed taxonomies are unavailable
in many application domains. We described data-driven evaluation of ontologies
using machine learning algorithms and introduced cutting-edge taxonomy-aware
algorithms for automated construction of taxonomies inductively from both struc-
tured (UCI repository data) and unstructured (text and biological sequence) data.
More precisely, we described taxonomy construction algorithms such as AVT
learner, an algorithm for automated construction of attribute value taxonomies
(AVTs) from data, word taxonomy learner (WTL) for automated construction of
word taxonomy from text and sequence data, and PAT learner, an algorithm for prop-
ositionalization of attributes and construction of taxonomy from propositionalized