Chapter 9: Data-Driven Evaluation of Ontologies Using Machine Learning Algorithms (10/13)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

256 ◾ Dae-Ki Kang

Figure9.21 Propositionalized attribute taxonomy of Balance Scale Weight and Distance (balance scale) data. Dotted lines show

original attribute–value relationship.

Data-Driven Evaluation of Ontologies ◾ 257

(a) Cleveland Clinic Foundation’s Heart Disease Data

(b) Cleveland Clinic Foundation’s Heart Disease Data (propositionalized)

Figure 9.22 Attribute–value relationships for subset of attributes in Cleveland Clinic Foundation’s Heart Disease (heart-c)

data.

258 ◾ Dae-Ki Kang

Figure9.23 Propositionalized attribute taxonomy of Cleveland Clinic Foundation’s Heart Disease (heart-c) data. Jensen–Shannon

divergence measure was used to estimate dissimilarity.

Data-Driven Evaluation of Ontologies ◾ 259

e leaf nodes in Figure9.23 are propositionalized attributes and each internal

node represents merging of all of its descendant leaf nodes. e taxonomy is gen-

erated automatically by PAT learner and may not be easily readable for humans.

However, in many applications, taxonomies specied by human experts are unavail-

able. Manual construction of taxonomies requires a great deal of domain expertise,

and in case of large data sets with many attributes and many values for each attri-

bute, manual generation of PATs is extremely tedious and not feasible in practice.

Considering this drawback, PATs generated automatically by PAT learner are use-

ful in constructing concise and accurate classiers when used with PAT-DTL.

9.4.3.2 Experimental Results from PAT-NBL

Comparison with naive Bayes learner and model selection criteria—To assess

how PAT-NBL algorithms evaluate taxonomies, we conducted experiments on data

sets from the UCI Machine Learning Repository (Blake and Merz, 1998) with the

following learning algorithm settings:

1. Naive Bayes learner

2. PAT-NBL algorithm with conditional log likelihood (CLL; Friedman et al.,

1997) criterion

3. PAT-NBL algorithm with conditional minimum description length (CMDL)

criterion

4. PAT-NBL algorithm with conditional Akaike information criterion (CAIC,

a conditional version of Akaike information criterion; Akaike, 1973) repre-

sented as:

CAIC B D CLL B D size B( | ) ( | ) ( )= − +

To compare the performance of the algorithms, we adapted the t-test with 10-fold

cross-validation. Table9.9 shows classier accuracy and tree sizes on UCI data sets

for NBL on the original attributes, and PAT-NBL with CLL, CMDL, and CAIC.

e results described in this section reect that none of the algorithms showed the

highest accuracy over all data sets. As to sizes of the generated classiers (measures

of compactness), PAT-NBL coupled with PAT learner (Kang and Sohn, 2009) usu-

ally generated more concise naive Bayes classiers. e size of a naive Bayes classi-

er can be measured by Equation 9.4.

Figure9.24 illustrates one of the generated propositionalized attribute taxono-

mies of the UCI Balance Scale Weight and Distance data set using PAT learner (Kang

and Sohn, 2009). After the original data set (the relation is shown in Figure9.24(a))

is propositionalized (Figure9.24(b)), each propositionalized attribute is binary (true

or false). Using the class-conditional distribution of each attribute when it is true,

we can nd a pair of attributes whose similarities are maximum (or whose diver-

gences are minimum). Based on the divergence measure, we repeated hierarchical

agglomerative clustering to generate the PAT shown in Figure9.24(c).

260 ◾ Dae-Ki Kang

Table9.9 Accuracy and Parameter Size of NBL on Original Data Sets and PAT-NBL with CLL, CMDL, and CAIC on UCI

Data Sets

Data

NBL (Original) PAT-NBL (CLL) PAT-NBL (CMDL) PAT-NBL (CAIC)

Accuracy Size Accuracy Size Accuracy Size Accuracy Size

Anneal 96.66±1.18 768 89.87±1.97 54 89.87±1.97 54 89.87±1.97 54

Autos 71.71±6.17 798 66.83±6.45 791 53.17±6.83 231 55.12±6.81 252

Balance-scale 70.72±3.57 27 75.20±3.39 24 75.20±3.39 24 75.20±3.39 24

Breast-cancer 71.68±5.22 104 73.08±5.14 102 72.73±5.16 66 72.73±5.16 66

Breast-w 97.00±1.27 60 97.28±1.21 58 97.28±1.21 58 97.28±1.21 58

Dermatology 97.81±1.50 906 98.09±1.40 900 98.36±1.30 564 98.09±1.40 582

Heart-statlog 83.33±4.45 46 84.07±4.36 44 84.07±4.36 44 84.07±4.36 44

Hepatitis 85.16±5.60 74 84.52±5.70 72 85.16±5.60 54 83.87±5.79 60

Hypothyroid 98.62±0.37 272 97.91±0.46 268 97.91±0.46 268 97.91±0.46 268

Ionosphere 90.60±3.05 292 89.46±3.21 290 92.31±2.79 110 92.02±2.83 112

Kr-vs-kp 87.89±1.13 150 85.01±1.24 148 77.72±1.44 96 81.88±1.34 100

Labor 91.23±7.34 72 92.98±6.63 70 89.47±7.97 48 89.47±7.97 48

Mushroom 95.83±0.43 252 94.25±0.51 250 96.66±0.39 156 94.76±0.48 182

Segment 91.52±1.14 1204 91.04±1.16 1197 88.83±1.28 651 88.83±1.28 658

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 9: Data-Driven Evaluation of Ontologies Using Machine Learning Algorithms (10/13)

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 9: Data-Driven Evaluation of Ontologies Using Machine Learning Algorithms (10/13)