256 ◾  Dae-Ki Kang
Figure9.21 Propositionalized attribute taxonomy of Balance Scale Weight and Distance (balance scale) data. Dotted lines show
original attribute–value relationship.
Data-Driven Evaluation of Ontologies ◾  257
(a) Cleveland Clinic Foundation’s Heart Disease Data
(b) Cleveland Clinic Foundation’s Heart Disease Data (propositionalized)
Figure 9.22 Attribute–value relationships for subset of attributes in Cleveland Clinic Foundations Heart Disease (heart-c)
data.
258 ◾  Dae-Ki Kang
Figure9.23 Propositionalized attribute taxonomy of Cleveland Clinic Foundations Heart Disease (heart-c) data. JensenShannon
divergence measure was used to estimate dissimilarity.
Data-Driven Evaluation of Ontologies ◾  259
e leaf nodes in Figure9.23 are propositionalized attributes and each internal
node represents merging of all of its descendant leaf nodes. e taxonomy is gen-
erated automatically by PAT learner and may not be easily readable for humans.
However, in many applications, taxonomies specied by human experts are unavail-
able. Manual construction of taxonomies requires a great deal of domain expertise,
and in case of large data sets with many attributes and many values for each attri-
bute, manual generation of PATs is extremely tedious and not feasible in practice.
Considering this drawback, PATs generated automatically by PAT learner are use-
ful in constructing concise and accurate classiers when used with PAT-DTL.
9.4.3.2 Experimental Results from PAT-NBL
Comparison with naive Bayes learner and model selection criteriaTo assess
how PAT-NBL algorithms evaluate taxonomies, we conducted experiments on data
sets from the UCI Machine Learning Repository (Blake and Merz, 1998) with the
following learning algorithm settings:
1. Naive Bayes learner
2. PAT-NBL algorithm with conditional log likelihood (CLL; Friedman et al.,
1997) criterion
3. PAT-NBL algorithm with conditional minimum description length (CMDL)
criterion
4. PAT-NBL algorithm with conditional Akaike information criterion (CAIC,
a conditional version of Akaike information criterion; Akaike, 1973) repre-
sented as:
CAIC B D CLL B D size B( | ) ( | ) ( )= +
To compare the performance of the algorithms, we adapted the t-test with 10-fold
cross-validation. Table9.9 shows classier accuracy and tree sizes on UCI data sets
for NBL on the original attributes, and PAT-NBL with CLL, CMDL, and CAIC.
e results described in this section reect that none of the algorithms showed the
highest accuracy over all data sets. As to sizes of the generated classiers (measures
of compactness), PAT-NBL coupled with PAT learner (Kang and Sohn, 2009) usu-
ally generated more concise naive Bayes classiers. e size of a naive Bayes classi-
er can be measured by Equation 9.4.
Figure9.24 illustrates one of the generated propositionalized attribute taxono-
mies of the UCI Balance Scale Weight and Distance data set using PAT learner (Kang
and Sohn, 2009). After the original data set (the relation is shown in Figure9.24(a))
is propositionalized (Figure9.24(b)), each propositionalized attribute is binary (true
or false). Using the class-conditional distribution of each attribute when it is true,
we can nd a pair of attributes whose similarities are maximum (or whose diver-
gences are minimum). Based on the divergence measure, we repeated hierarchical
agglomerative clustering to generate the PAT shown in Figure9.24(c).
260 ◾  Dae-Ki Kang
Table9.9 Accuracy and Parameter Size of NBL on Original Data Sets and PAT-NBL with CLL, CMDL, and CAIC on UCI
Data Sets
Data
NBL (Original) PAT-NBL (CLL) PAT-NBL (CMDL) PAT-NBL (CAIC)
Accuracy Size Accuracy Size Accuracy Size Accuracy Size
Anneal 96.66±1.18 768 89.87±1.97 54 89.87±1.97 54 89.87±1.97 54
Autos 71.71±6.17 798 66.83±6.45 791 53.17±6.83 231 55.12±6.81 252
Balance-scale 70.72±3.57 27 75.20±3.39 24 75.20±3.39 24 75.20±3.39 24
Breast-cancer 71.68±5.22 104 73.08±5.14 102 72.73±5.16 66 72.73±5.16 66
Breast-w 97.00±1.27 60 97.28±1.21 58 97.28±1.21 58 97.28±1.21 58
Dermatology 97.81±1.50 906 98.09±1.40 900 98.36±1.30 564 98.09±1.40 582
Heart-statlog 83.33±4.45 46 84.07±4.36 44 84.07±4.36 44 84.07±4.36 44
Hepatitis 85.16±5.60 74 84.52±5.70 72 85.16±5.60 54 83.87±5.79 60
Hypothyroid 98.62±0.37 272 97.91±0.46 268 97.91±0.46 268 97.91±0.46 268
Ionosphere 90.60±3.05 292 89.46±3.21 290 92.31±2.79 110 92.02±2.83 112
Kr-vs-kp 87.89±1.13 150 85.01±1.24 148 77.72±1.44 96 81.88±1.34 100
Labor 91.23±7.34 72 92.98±6.63 70 89.47±7.97 48 89.47±7.97 48
Mushroom 95.83±0.43 252 94.25±0.51 250 96.66±0.39 156 94.76±0.48 182
Segment 91.52±1.14 1204 91.04±1.16 1197 88.83±1.28 651 88.83±1.28 658
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset