Data-Driven Evaluation of Ontologies ◾ 243
Correlation coefficient =
× − ×
+
TP TN FP FN
TP FN( )(
TTP FP T N FP T N FN+ + +)( )( )
Accuracy =
TP + TN
Sensiti vity
TP
+
=
Specificity
TP
+
=
TP is the number of true positives, FP is the number of false positives, TN is the
number of true negatives, and FN is the number of false negatives. Figure9.19
shows the amino acid taxonomy constructed for the prokaryotic protein sequences.
Table9.5 shows the results for the two protein sequences. For both data sets, the
classiers generated by WTNBL were more concise and performed more accurately
than the classier generated by NBL based on the measures reported.
9.4.3 Experiments for PAT
9.4.3.1 Experimental Results from PAT-DTL
In this section, we explore certain performance issues of the proposed algorithms
through various experimental settings: (1) performance of PAT-DTL compared
with that of C4.5 decision tree learner to see whether taxonomies (as ontologies)
can help the algorithm to improve the performance; (2) dissimilarity measures for
comparing two probability distributions to see whether the algorithm assesses tax-
onomies from dierent disciplines; and (3) comprehensibility of the hypothesis to
see whether humans can comprehend the generated hypothesis.
Comparison with C4.5 decision tree learner—We conducted experiments
on 37 data sets from the UCI Machine Learning Repository (Blake and Merz,
1998). We tested four settings: (1) C4.5 (Quinlan, 1993) decision tree learner on
the original attributes, (2) C4.5 decision tree learner on propositionalized attri-
butes, (3) PAT-DTL with abstraction, and (4) PAT-DTL with renement. Ten-fold
cross-validation was used for evaluation. Taxonomies were generated using PAT
learner and a decision tree was constructed using PAT-DTL on the resulting PAT
and data.
e results of experiments indicate that none of the algorithms showed the high-
est accuracy over most data sets. Table9.6 shows classier accuracy and tree size on
UCI data sets for C4.5 decision tree learner on the original attributes, C4.5 decision