Data-Driven Evaluation of Ontologies ◾ 255
understanding of the interpretation of the taxonomy and the generated decision
tree, we show the Cleveland Clinic Foundation’s Heart Disease (heart-c) data set
from the UCI repository. We already showed the decision trees of the heart-c data
set generated by C4.5 and PAT-DTL in Figure9.20. Figure9.22 shows the attri-
bute–value relationships for the subset of attributes in heart-c. e data set has 13
attributes (age, sex, cp, trestbps, chol, fbs, restecg, thalach, exang, oldpeak, slope, ca, and
thal). It is hard to show clearly the attribute–value relationships and the taxonomy
for the original data set with 13 attributes on a page, so we choose 4 (cp, thalach, ca,
and thal) from the 13 for presentation based on the results in Figure9.20(b)
Figure9.22(a) shows the original attribute-value relationship of the subsets of the
heart-c data set. After propositionalization (Figure9.22(b)), each attribute value will
be considered as having 0 and 1 values. If the original attribute has a certain value,
the propositionalized attribute will have a 1 value. Figure9.23 shows the taxonomy
generated from the propositionalized attributes of heart-c shown in Figure9.22(b).
Table9.8 (Continued) Accuracy and Tree Size of PAT-DTL with Refinement
Coupled with Divergences* on Selected UCI Data Sets
Data
PAT-DTL(JKL) PAT-DTL(JS) PAT-DTL(AGM)
Acccuracy Size Accuracy Size Accuracy Size
Nursery 66.25±0.81 3 66.25±0.81 3 70.97±0.78 5
Primary-tumor 33.63±5.03 9 38.64±5.18 19 30.68±4.91 1
Segment 84.07±1.49 23 78.10±1.69 21 87.10±1.37 41
Sick 96.85±0.56 3 97.61±0.49 7 97.14±0.53 3
Sonar 75.48±5.85 3 76.44±5.77 3 76.44±5.77 13
Soybean 87.41±2.49 63 70.72±3.41 31 68.08±3.50 53
Splice 80.31±1.38 17 87.93±1.13 55 67.67±1.62 5
Vehicle 68.79±3.12 35 62.41±3.26 33 66.43±3.18 17
Vote 95.63±1.92 3 95.63±1.92 3 95.63±1.92 3
Vowel 55.76±3.09 133 46.46±3.11 135 49.90±3.11 185
Waveform-5000 79.88±1.11 63 68.76±1.28 37 71.38±1.25 39
Zoo 92.08±5.27 15 85.15±6.94 9 85.15±6.94 13
# of wins 19 20 11 24 14 21
*
Error rates estimated using 10-fold cross validation with 95% confidence
interval.
Jeffreys–Kullback–Liebler divergence = JKL. Jensen–Shannon divergence = JS.
Arithmetic and Geometric Mean divergence = AGM.