Chapter 9: Data-Driven Evaluation of Ontologies Using Machine Learning Algorithms (11/13)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Data-Driven Evaluation of Ontologies ◾ 261

Sonar 85.58±4.77 164 86.06±4.71 162 83.65±5.03 70 84.13±4.97 72

Splice 95.52±0.72 864 95.64±0.71 861 91.88±0.95 213 51.58±1.73 21

Vehicle 62.65±3.26 296 62.29±3.27 292 59.34±3.31 188 61.35±3.28 200

Vote 90.11±2.80 66 88.51±3.00 64 88.74±2.97 52 88.51±3.00 64

Waveform-5000 80.74±1.09 393 81.24±1.08 390 80.14±1.11 159 80.54±1.10 168

Zoo 93.07±4.95 259 96.04±3.80 252 96.04±3.80 245 96.04±3.80 252

Accuracy estimated using 10-fold cross validation with 95% conﬁdence interval.

262 ◾ Dae-Ki Kang

Balance Scale Weight & Distance Database (discretized)

left-weight

attribute-of

left-distance

attribute-of

right-weight

attribute-of

right-distance

attribute-of

class

class-of

value-of value-of value-of value-of value-of value-of value-of value-of

value-of

(a) Original relation of attributes and attribute values

(–∞..2.5] (–∞..2.5] (–∞..2.5] (–∞..2.5](2.5..∞) (2.5..∞) (2.5..∞) (2.5..∞)

(b) Propositionalized relation

Figure9.24 UCI’s Balance Scale Weight and Distance data set.

Data-Driven Evaluation of Ontologies ◾ 263

Figure9.24 (Continued) UCI’s Balance Scale Weight and Distance data set.

264 ◾ Dae-Ki Kang

Comparison with data sets augmented with PAT—Zhang et al. (2006)

presented attribute value taxonomy guided naive Bayes learner (AVT-NBL), an

algorithm that exploits AVTs to generate naive Bayes classiers that are more com-

pact and often more accurate than classiers that do not use AVTs. Zhang et al.

included the comparison of their algorithm on original benchmark data sets with

the standard naive Bayes learner on propositionalized data sets. To generate the

propositionalized data sets, after taxonomy construction for each attribute in the

original data, they appended abstract attributes of the constructed taxonomy (all

nonleaf nodes in the taxonomy) to the original data as extra attributes. Unlike

their propositionalization, we propositionalized original data rst, then generated

one large propositionalized attribute taxonomy. One of main weaknesses of using

propositionalized data is that because of clear dependency relationships among the

attributes of the propositionalized data, the performance of the learning algorithm

degrades if it relies on an assumption that each attribute is independent of the

other attributes like naive Bayes learners. Unlike standard naive Bayes classiers on

propositionalized data, PAT-NBL can also perform regularization over a taxonomy

using model selection criteria to minimize over-tting from learning.

Table9.10 shows a comparison of PAT-NBL with CLL criteria and the standard

naive Bayes learning algorithm on propositionalized data sets. None of the learning

algorithms exhibited superior accuracy. However, PAT-NBL with CLL criteria gen-

erated more compact naive Bayes classiers (shown in the size columns in Table9.10)

than those from the standard NBL algorithm on propositionalized data sets.

9.5 Summary and Discussion

9.5.1 Summary

We represented ontologies in various data structures (list, tree or directed acyclic

graph, and directed/undirected graph). Graphs are very powerful data structures for

expressing ontology for machine learning algorithms, but inferences using graphs

for evaluation are widely believed to be intractable. Taxonomy is one of the most

common forms of ontology and is useful for constructing compact, robust, and

comprehensible classiers, although human-designed taxonomies are unavailable

in many application domains. We described data-driven evaluation of ontologies

using machine learning algorithms and introduced cutting-edge taxonomy-aware

algorithms for automated construction of taxonomies inductively from both struc-

tured (UCI repository data) and unstructured (text and biological sequence) data.

More precisely, we described taxonomy construction algorithms such as AVT

learner, an algorithm for automated construction of attribute value taxonomies

(AVTs) from data, word taxonomy learner (WTL) for automated construction of

word taxonomy from text and sequence data, and PAT learner, an algorithm for prop-

ositionalization of attributes and construction of taxonomy from propositionalized

Data-Driven Evaluation of Ontologies ◾ 265

Table9.10 Accuracy and Parameter Size of PAT-NBL with CLL on

UCI Benchmark Data Sets and NBL on Propositionalized UCI

Benchmark Data Sets

Data

PAT-NBL (CLL)

NBL

(Proposi tion alized)

Accuracy Size Accuracy Size

Anneal 89.87±1.97 54 89.31±2.02 2886

Autos 66.83±6.45 791 78.54±5.62 5187

Balance-scale 75.20±3.39 24 88.48±2.50 195

Breast-cancer 73.08±5.14 102 72.73±5.16 338

Breast-w 97.28±1.21 58 97.14±1.24 642

Dermatology 98.09±1.40 900 98.09±1.40 2790

Heart-statlog 84.07±4.36 44 83.70±4.41 482

Hepatitis 84.52±5.70 72 90.97±4.51 538

Hypothyroid 97.91±0.46 268 93.32±0.80 1276

Ionosphere 89.46±3.21 290 91.74±2.88 2318

Kr-vs-kp 85.01±1.24 148 87.80±1.13 306

Labor 92.98±6.63 70 89.47±7.97 546

Mushroom 94.25±0.51 250 95.55±0.45 682

Segment 91.04±1.16 1197 88.14±1.32 4193

Sonar 86.06±4.71 162 99.04±1.33 4322

Splice 95.64±0.71 861 95.92±0.69 2727

Vehicle 62.29±3.27 292 67.02±3.17 2596

Vote 88.51±3.00 64 90.11±2.81 130

Waveform-5000 81.24±1.08 390 63.62±1.33 4323

Zoo 96.04±3.80 252 94.06±4.61 567

Accuracy estimated using 10-fold cross validation with 95% conﬁ-

dence interval.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 9: Data-Driven Evaluation of Ontologies Using Machine Learning Algorithms (11/13)

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 9: Data-Driven Evaluation of Ontologies Using Machine Learning Algorithms (11/13)