Decision tree algorithms are an active area of research in the field of data mining. They incorporate multi-stage decision processes. Because of the rapidity of decision tree generation and the clarity and comprehensibility of their structures, it is easy to extract commercially valuable information that enables decision makers to make the right decisions. Thus, decision trees are widely used to good effect in the military, medicine, securities and investment, exploration, and corporate governance. In medicine, a decision tree model for pancreatic cancer analysis successfully concluded that a patient should first have an endoscopy; in securities, decision trees are applied more widely, such as the decision tree model used in the ISCRMS system [50] (ISCRMS is a customer intelligence analysis solution developed by applying data mining/online analytical techniques in the security field); and customer classification has been applied to data mining in the banking industry. In other respects, the decision tree model has been widely used in company management, exploration, and so on.

Although the domain of decision tree models varies, different applications have the same essence – using a decision tree algorithm to process and classify data and providing a friendly, easy-to-use data mining application environment. After datasets have been obtained for different problems, the attributes can be processed according to the DFDT processing method described in the preceding sections of this chapter, and the decision tree can be constructed and pruned. The following sections compare our DFDT with the C4.5 decision tree algorithm.

3.5.1Comparison of algorithm execution

A decision tree algorithm consists of three steps: (1) data preprocessing, that is, data discretization (or dynamic fuzzification), (2) decision tree construction in accordance with the attribute selection algorithm to build the decision tree, and (3) pruning to prevent overfitting.

The purpose of data preprocessing is to apply to the attribute selection algorithm. In the data preprocessing stage, DFDT uses different processing methods according to the situation and treats the attributes for discretization and dynamic fuzzification. The fuzzified attribute values are dynamic fuzzy numbers within the interval [(0.0),(1,1)], as presented in Tab. 3.1. The fuzzified attribute-value pairs are <Effort = high, 0.7 >.

As absolute clarity does not exist in the real world, it is more natural and reasonable to describe the characteristics of the instance after applying fuzzification to the attributes.

3.5.2Comparison of training accuracy

A dataset from the UCI-Machine-Learning database was tested using DFDT and C4.5. The raw data with a test instance set was processed, the training instance set was combined with the test instance set, and cross-validation was used to test the constructed decision tree. The number of cross-validated directories was set to 10. The information in the data table is presented in Tab. 3.11.

The C4.5 algorithm and DFDT algorithm were applied to the UCI dataset, which was divided into a pruned decision tree and an unpruned decision tree. We recorded the number of leaf nodes generating the decision tree, the size of the tree, the number of instances of the correct classification on the training instance set, and the number of instances of correctly categorized cross-validation. Table 3.12 presents the results obtained using the C4.5 algorithm for the UCI dataset, and Tab. 3.13 presents those from the DFDT algorithm.

Tab. 3.11: Information of UCI.

Tab. 3.12: Results of C4.5 on UCI.

Tab. 3.13: Results of DFDT on UCI.

Fig. 3.8: Classification accuracy comparison of unpruned tree on training instance set.
Fig. 3.9: Classification accuracy comparison of pruned tree on training instance set.

Figures 3.83.11 were obtained by comparing Tab. 3.11 with Tab. 3.12. Figure 3.8 compares the classification accuracy of the unpruned decision tree established by the C4.5 method and the DFDT with the training instance set. Figure 3.9 compares the classification accuracy of the pruned decision tree established by the C4.5 method and the DFDT and Fig. 3.10 shows a comparison of the accuracy of the cross-validation of the unpruned decision trees on the training instance set. Figure 3.11 compares the classification accuracy of the pruned decision trees in C4.5 and DFDT.

Analysis of experimental results:

(1)DFDT uses the dynamic fuzzy preprocessing method, which fully considers the dynamics and fuzziness of the real problem. As a result, DFDT has better reasoning and learning abilities in the environment of dynamic fuzziness. Compared with the C4.5 method, the prediction ability is improved, and its improvement is reflected in the classification accuracy of the cross-validation of the instance set.

(2)As the attribute selection algorithm considers the decision attribute’s contribution to the whole decision-making instance set, the example sets achieve better classification with DFDT. In DFDT, the leaf node number and size of tree are less than those of the decision tree given by the C4.5 algorithm. Using the principle of Occam’s razor, a small decision tree has better generalization ability. Thus, the DFDT instances in the training set have better classification accuracy, and so the DFDT has high generalization ability.

Fig. 3.10: Classification accuracy comparison of unpruned tree on cross-validation.
Fig. 3.11: Classification accuracy comparison of pruned tree on cross-validation.

3.5.3Comprehensibility comparisons

Real life is a dynamic and fuzzy, and dynamic data from fuzzy operations reflect this dynamic uncertainty. DFDT handles such properties, whether semantic or numerical. Handling attributes of different types is easier to understand than the discretization of numerical attributes in C4.5.

As the recognition of certain attributes, especially semantic attributes, is ambiguous in reality, the use of clear boundaries to describe attributes cannot describe the characteristics of data very well and is not consistent with people’s thinking. Dynamic data blurring in DFDT not only contains the original attributes but also assigns a dynamic fuzzy number to describe the degree of each attribute in the tree. Therefore, DFDT contains more information than the C4.5 decision tree does, and this dynamic fuzzification enables the current level of development to be extrapolated to future trends. This is more in line with reality, and is easy to understand.


The decision tree method is derived from the concept learning system, which developed into ID3 and then the C4.5 algorithm, and can deal with continuous-valued attributes. In view of the dynamic fuzziness of reality, further studies are needed for the representation of dynamic fuzziness in decision tree learning and to identify whether there is a better attribute selection algorithm. Based on this, this chapter has studied the basic problem of decision tree learning based on a dynamic fuzzy lattice.

The innovations of this chapter are as follows:

(1)We analysed the dynamic fuzziness in decision tree learning and proposed a DFDT and DFBDT based on a dynamic fuzzy lattice.

(2)By studying the original decision tree learning algorithm, an attribute selection algorithm based on classification accuracy was proposed.

(3)By introducing the partitioned lattice into the DFDT, the basic concept of a dynamic fuzzy partitioning grid was developed. Based on this, the relationship between the DFDT and the dynamic fuzzy binary decision tree was analysed. The relevant theorems and proofs were given. Using the dynamic fuzzy partitioning grid as a basis, the discretization of single attributes in the fuzzy decision tree and the discretization of multiple attributes was considered.

(4)Based on the information contribution of attributes to classified information, a method of dealing with missing values in DFDTs was proposed.


