Data-Driven Evaluation of Ontologies ◾  261
Sonar 85.58±4.77 164 86.06±4.71 162 83.65±5.03 70 84.13±4.97 72
Splice 95.52±0.72 864 95.64±0.71 861 91.88±0.95 213 51.58±1.73 21
Vehicle 62.65±3.26 296 62.29±3.27 292 59.34±3.31 188 61.35±3.28 200
Vote 90.11±2.80 66 88.51±3.00 64 88.74±2.97 52 88.51±3.00 64
Waveform-5000 80.74±1.09 393 81.24±1.08 390 80.14±1.11 159 80.54±1.10 168
Zoo 93.07±4.95 259 96.04±3.80 252 96.04±3.80 245 96.04±3.80 252
Accuracy estimated using 10-fold cross validation with 95% confidence interval.
262 ◾  Dae-Ki Kang
Balance Scale Weight & Distance Database (discretized)
left-weight
attribute-of
left-distance
attribute-of
right-weight
attribute-of
right-distance
attribute-of
class
class-of
value-of value-of value-of value-of value-of value-of value-of value-of
L
value-of
B
value-of
R
value-of
(a) Original relation of attributes and attribute values
(–∞..2.5] (–∞..2.5] (–∞..2.5] (–∞..2.5](2.5..∞) (2.5..∞) (2.5..∞) (2.5..∞)
(b) Propositionalized relation
Figure9.24 UCI’s Balance Scale Weight and Distance data set.
Data-Driven Evaluation of Ontologies ◾  263
(c) Propositionalized attribute taxonomy
Figure9.24 (Continued) UCI’s Balance Scale Weight and Distance data set.
264 ◾  Dae-Ki Kang
Comparison with data sets augmented with PATZhang et al. (2006)
presented attribute value taxonomy guided naive Bayes learner (AVT-NBL), an
algorithm that exploits AVTs to generate naive Bayes classiers that are more com-
pact and often more accurate than classiers that do not use AVTs. Zhang et al.
included the comparison of their algorithm on original benchmark data sets with
the standard naive Bayes learner on propositionalized data sets. To generate the
propositionalized data sets, after taxonomy construction for each attribute in the
original data, they appended abstract attributes of the constructed taxonomy (all
nonleaf nodes in the taxonomy) to the original data as extra attributes. Unlike
their propositionalization, we propositionalized original data rst, then generated
one large propositionalized attribute taxonomy. One of main weaknesses of using
propositionalized data is that because of clear dependency relationships among the
attributes of the propositionalized data, the performance of the learning algorithm
degrades if it relies on an assumption that each attribute is independent of the
other attributes like naive Bayes learners. Unlike standard naive Bayes classiers on
propositionalized data, PAT-NBL can also perform regularization over a taxonomy
using model selection criteria to minimize over-tting from learning.
Table9.10 shows a comparison of PAT-NBL with CLL criteria and the standard
naive Bayes learning algorithm on propositionalized data sets. None of the learning
algorithms exhibited superior accuracy. However, PAT-NBL with CLL criteria gen-
erated more compact naive Bayes classiers (shown in the size columns in Table9.10)
than those from the standard NBL algorithm on propositionalized data sets.
9.5 Summary and Discussion
9.5.1 Summary
We represented ontologies in various data structures (list, tree or directed acyclic
graph, and directed/undirected graph). Graphs are very powerful data structures for
expressing ontology for machine learning algorithms, but inferences using graphs
for evaluation are widely believed to be intractable. Taxonomy is one of the most
common forms of ontology and is useful for constructing compact, robust, and
comprehensible classiers, although human-designed taxonomies are unavailable
in many application domains. We described data-driven evaluation of ontologies
using machine learning algorithms and introduced cutting-edge taxonomy-aware
algorithms for automated construction of taxonomies inductively from both struc-
tured (UCI repository data) and unstructured (text and biological sequence) data.
More precisely, we described taxonomy construction algorithms such as AVT
learner, an algorithm for automated construction of attribute value taxonomies
(AVTs) from data, word taxonomy learner (WTL) for automated construction of
word taxonomy from text and sequence data, and PAT learner, an algorithm for prop-
ositionalization of attributes and construction of taxonomy from propositionalized
Data-Driven Evaluation of Ontologies ◾  265
Table9.10 Accuracy and Parameter Size of PAT-NBL with CLL on
UCI Benchmark Data Sets and NBL on Propositionalized UCI
Benchmark Data Sets
Data
PAT-NBL (CLL)
NBL
(Proposi tion alized)
Accuracy Size Accuracy Size
Anneal 89.87±1.97 54 89.31±2.02 2886
Autos 66.83±6.45 791 78.54±5.62 5187
Balance-scale 75.20±3.39 24 88.48±2.50 195
Breast-cancer 73.08±5.14 102 72.73±5.16 338
Breast-w 97.28±1.21 58 97.14±1.24 642
Dermatology 98.09±1.40 900 98.09±1.40 2790
Heart-statlog 84.07±4.36 44 83.70±4.41 482
Hepatitis 84.52±5.70 72 90.97±4.51 538
Hypothyroid 97.91±0.46 268 93.32±0.80 1276
Ionosphere 89.46±3.21 290 91.74±2.88 2318
Kr-vs-kp 85.01±1.24 148 87.80±1.13 306
Labor 92.98±6.63 70 89.47±7.97 546
Mushroom 94.25±0.51 250 95.55±0.45 682
Segment 91.04±1.16 1197 88.14±1.32 4193
Sonar 86.06±4.71 162 99.04±1.33 4322
Splice 95.64±0.71 861 95.92±0.69 2727
Vehicle 62.29±3.27 292 67.02±3.17 2596
Vote 88.51±3.00 64 90.11±2.81 130
Waveform-5000 81.24±1.08 390 63.62±1.33 4323
Zoo 96.04±3.80 252 94.06±4.61 567
Accuracy estimated using 10-fold cross validation with 95% confi-
dence interval.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset