Chapter 9: Data-Driven Evaluation of Ontologies Using Machine Learning Algorithms (9/13)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Data-Driven Evaluation of Ontologies ◾ 251

= False = True

(b) PAT-DTL with reﬁnement (accuracy = 82.84 ± 0.06, tree size = 5)

(ca > 0.5) (thal = reversable_defect)

<50 (103.0/14.0)

<50 (38.0/11.0)

>50_1 (102.0/16.0)

(thalach ≤ 147.5) (cp = asympt)

PSfrag replacemnts

Figure9.20(b) Decision tree learned by C4.5 and PAT-DTL (with specialization) for Cleveland Clinic Foundation’s Heart Disease

(heart-c) data.

252 ◾ Dae-Ki Kang

Table9.7 Running Times (Minutes:Seconds) of DTL (C4.5 Decision Tree

Learner) for Original and Propositionalized Data and PAT-DTL with

Abstraction and Reﬁnement on UCI Data Sets

Data

DTL

(Original)

DTL

(Propositionalized)

PAT-DTL

(Abstraction)

PAT-DTL

(Reﬁnement)

Anneal 00:01.69 00:02.91 22:55.76 03:07.08

Audiology 00:01.60 00:02.24 18:31.92 00:27.70

Autos 00:01.29 00:01.81 10:45.95 00:24.68

Balance-scale 00:00.95 00:00.85 00:06.30 00:04.86

Breast-cancer 00:01.06 00:06.90 04:27.15 00:40.29

Breast-w 00:01.20 00:01.13 01:23.71 00:13.82

Car 00:01.31 00:01.75 01:12.98 00:27.00

Colic 00:01.05 00:01.35 05:54.09 00:17.09

Credit-a 00:01.09 00:01.83 04:36.33 00:18.07

Credit-g 00:01.49 00:03.60 16:53.16 00:30.69

Dermatology 00:01.05 00:01.42 32:59.23 00:31.77

Diabetes 00:00.97 00:01.13 00:34.93 00:05.18

Glass 00:00.92 00:00.98 00:13.50 00:04.14

Heart-c 00:01.24 00:01.16 00:30.92 00:05.70

Heart-h 00:01.48 00:01.66 00:26.08 00:07.10

Heart-statlog 00:01.17 00:01.43 00:24.71 00:08.94

Hepatitis 00:01.05 00:01.04 00:43.84 00:06.34

Hypothyroid 00:03.26 00:05.63 33:05.69 08:54.18

Ionosphere 00:01.48 00:02.22 60:29.00 01:05.04

Iris 00:00.84 00:00.99 00:03.76 00:03.64

Kr-vs-kp 00:04.09 00:05.52 60:56.00 02:21.48

Labor 00:00.85 00:00.87 00:13.60 00:02.96

Letter 00:14.10 04:20.12 840:57.00 420:00.12

Lymph 00:00.80 00:00.86 00:49.24 00:03.65

Mushroom 00:02.84 00:22.05 563:58.00 11:07.54

Data-Driven Evaluation of Ontologies ◾ 253

Examination of the results of experiments shown in Table9.8 indicates that all

three divergence measures (JKL, JS, and AGM) yielded PATs that, when used by

PAT-NBL, produced classiers with similar accuracy. Of the 13 divergence mea-

sures we tested, Hellinger discrimination (Topsøe, 2000), symmetric diversion, and

triangular discrimination (Topsøe, 2000) showed similar performances. us, PAT

learner appears able to use a broad class of measures of similarity of attribute values

based on class distributions associated with the respective values to generate PATs

that are useful for constructing compact and accurate classiers from data.

Comprehensibility of hypothesis—Figure9.21 shows an example of PAT for

the subset of attributes in the UCI repository’s Balance Scale Weight and Distance

Database (balance scale). e leaf nodes (gray boxes) correspond to the original

attribute values of the balance scale data set and the dotted lines show the original

attribute–value relationships. After propositionalization, each leaf node is treated

as an attribute. e solid lines represent ISA relationships; therefore the nodes with

solid lines represent a taxonomy. If we remove all the dotted lines and nodes inside

the dotted box in the gure, we can see the taxonomy more clearly.

e balance scale data set has only four attributes, but most data sets have

many more attributes and their taxonomies are too big to t on one page. To aid

Table9.7 (Continued) Running Times (Minutes:Seconds) of DTL

(C4.5Decision Tree Learner) for Original and Propositionalized Data and

PAT-DTL with Abstraction and Reﬁnement on UCI Data Sets

Data

DTL

(Original)

DTL

(Propositionalized)

PAT-DTL

(Abstraction)

PAT-DTL

(Reﬁnement)

Nursery 00:05.09 00:12.90 21:03.24 00:47.45

Primary-tumor 00:01.54 00:01.76 01:18.51 00:09.23

Segment 00:01.91 00:13.05 04:38:55 02:55:53

Sick 00:04.58 00:07.67 43:01.48 02:53.22

Sonar 00:01.06 00:01.09 03:53.15 00:19.08

Soybean 00:01.70 00:02.48 10:16.57 04:15.50

Splice 00:01.50 00:05.77 803:08.00 22:07.00

Vehicle 00:00.77 00:01.12 09:38.08 01:27.03

Vote 00:00.65 00:00.47 00:17.29 00:03.47

Vowel 00:00.63 00:01.69 05:00.22 01:12.13

Waveform-5000 00:02.22 00:17.27 917:01.00 55:23.00

Zoo 00:00.46 00:00.42 00:04.47 00:02.63

254 ◾ Dae-Ki Kang

Table9.8 Accuracy and Tree Size of PAT-DTL with Reﬁnement Coupled

with Divergences* on Selected UCI Data Sets

Data

PAT-DTL(JKL) PAT-DTL(JS) PAT-DTL(AGM)

Acccuracy Size Accuracy Size Accuracy Size

Anneal 92.43±1.73 5 90.31±1.93 11 76.17±2.79 1

Audiology 47.35±6.51 7 46.46±6.50 3 46.46±6.50 3

Autos 71.22±6.20 13 44.88±6.81 3 45.37±6.82 9

Balance-scale 72.80±3.49 7 73.92±3.44 11 74.40±3.42 7

Breast-cancer 70.28±5.30 1 73.43±5.12 7 69.23±5.35 5

Breast-w 97.14±1.24 5 96.85±1.29 5 97.14±1.24 5

Car 81.13±1.84 7 85.47±1.66 23 94.16±1.11 61

Colic 85.22±3.63 3 86.41±3.50 3 86.41±3.50 3

Credit-a 85.22±2.65 3 85.36±2.64 3 84.93±2.67 5

Credit-g 73.50±2.74 7 73.00±2.75 5 73.80±2.73 15

Dermatology 36.89±4.94 5 30.60±4.72 1 30.60±4.72 1

Diabetes 78.39±2.91 5 73.70±3.11 3 73.70±3.11 3

Glass 71.96±6.02 19 55.14±6.66 9 63.08±6.47 13

Heart-c 83.50±4.18 5 82.84±4.25 5 83.50±4.18 5

Heart-h 84.01±4.19 5 82.65±4.33 3 82.99±4.29 3

Heart-statlog 80.00±4.77 5 82.22±4.56 5 82.96±4.48 5

Hepatitis 83.23±5.88 5 83.23±5.88 7 83.23±5.88 5

Hypothyroid 99.13±0.30 11 97.00±0.54 7 97.22±0.52 7

Ionosphere 96.01±2.05 3 92.88±2.69 3 92.31±2.79 3

Iris 94.00±3.80 5 88.67±5.07 5 96.00±3.14 5

Kr-vs-kp 66.05±1.64 3 72.84±1.54 5 85.83±1.21 5

Labor 87.72±8.52 3 89.47±7.97 3 85.96±9.02 5

Letter 55.91±0.56 527 68.30±0.64 2047 67.22±0.65 3261

Lymph 53.38±8.04 3 77.03±6.78 7 73.65±7.10 3

Mushroom 99.70±0.12 3 98.52±0.26 3 99.41±0.17 3

Data-Driven Evaluation of Ontologies ◾ 255

understanding of the interpretation of the taxonomy and the generated decision

tree, we show the Cleveland Clinic Foundation’s Heart Disease (heart-c) data set

from the UCI repository. We already showed the decision trees of the heart-c data

set generated by C4.5 and PAT-DTL in Figure9.20. Figure9.22 shows the attri-

bute–value relationships for the subset of attributes in heart-c. e data set has 13

attributes (age, sex, cp, trestbps, chol, fbs, restecg, thalach, exang, oldpeak, slope, ca, and

thal). It is hard to show clearly the attribute–value relationships and the taxonomy

for the original data set with 13 attributes on a page, so we choose 4 (cp, thalach, ca,

and thal) from the 13 for presentation based on the results in Figure9.20(b)

Figure9.22(a) shows the original attribute-value relationship of the subsets of the

heart-c data set. After propositionalization (Figure9.22(b)), each attribute value will

be considered as having 0 and 1 values. If the original attribute has a certain value,

the propositionalized attribute will have a 1 value. Figure9.23 shows the taxonomy

generated from the propositionalized attributes of heart-c shown in Figure9.22(b).

Table9.8 (Continued) Accuracy and Tree Size of PAT-DTL with Reﬁnement

Coupled with Divergences* on Selected UCI Data Sets

Data

PAT-DTL(JKL) PAT-DTL(JS) PAT-DTL(AGM)

Acccuracy Size Accuracy Size Accuracy Size

Nursery 66.25±0.81 3 66.25±0.81 3 70.97±0.78 5

Primary-tumor 33.63±5.03 9 38.64±5.18 19 30.68±4.91 1

Segment 84.07±1.49 23 78.10±1.69 21 87.10±1.37 41

Sick 96.85±0.56 3 97.61±0.49 7 97.14±0.53 3

Sonar 75.48±5.85 3 76.44±5.77 3 76.44±5.77 13

Soybean 87.41±2.49 63 70.72±3.41 31 68.08±3.50 53

Splice 80.31±1.38 17 87.93±1.13 55 67.67±1.62 5

Vehicle 68.79±3.12 35 62.41±3.26 33 66.43±3.18 17

Vote 95.63±1.92 3 95.63±1.92 3 95.63±1.92 3

Vowel 55.76±3.09 133 46.46±3.11 135 49.90±3.11 185

Waveform-5000 79.88±1.11 63 68.76±1.28 37 71.38±1.25 39

Zoo 92.08±5.27 15 85.15±6.94 9 85.15±6.94 13

# of wins 19 20 11 24 14 21

Error rates estimated using 10-fold cross validation with 95% conﬁdence

interval.

Jeffreys–Kullback–Liebler divergence = JKL. Jensen–Shannon divergence = JS.

Arithmetic and Geometric Mean divergence = AGM.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 9: Data-Driven Evaluation of Ontologies Using Machine Learning Algorithms (9/13)

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 9: Data-Driven Evaluation of Ontologies Using Machine Learning Algorithms (9/13)