Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

S. V. Gorbachev

5Model of intellectual analysis of multidimensional semi-structured data based on deep neuro-fuzzy networks

S. V. Gorbachev, National research Tomsk State University, Tomsk, Russia, email:[email protected]

Abstract: A new structure of deep neuro-fuzzy networks is proposed and studied, combining through a layer of fuzzy clustering a Kohonen fuzzy cellular neural network self-organizing map (FCNN-SOM) and a radial basis function network (RBFN). The proposed model has a high degree of self-organization of neurons, improving the separation properties of the network in the case of overlapping clusters; automatic adjustment of the parameters of radially symmetric functions; a single hidden layer sufficient for modelling pronounced non-linear dependencies; simplicity of the algorithm for optimizing weight coefficients; and a fast learning speed. It can be used to solve a wide range of problems – clusterization, approximation and classification (recognition) of multidimensional semi-structured data.

Keywords: Semi-structured data, Analysis, Kohonen fuzzy cellular neural network, Radial basis function network

Introduction

In recent years, the field of big data has emerged as one of the most promising in computer technology. Taking into account the specifics of the growth in technical progress, it is easily explained – digital technologies accompany many aspects of modern life and constantly gather information. The amount of data on various aspects of life is increasing, along with growing possibilities of information storage. According to a study by IDC Digital Universe, in 2020 the amount of data in the world can reach 40 ZB (zettabytes), which is the equivalent of 5,200 GB for every person on the planet (Figure 5.1).

At this rate of growth, the amount of data in the world, according to forecasts by researchers, will double annually. The big data paradigm identifies three main types of tasks:

Storing and managing hundreds of terabytes or petabytes of data; a relational database is not able to work effectively with this amount of information;
Organizing unstructured information consisting of texts, images, video and other data types;
Analysing big data, which raises the question of how to work with unstructured and substructural information, including clustering methods, approximation and classification (recognition) of multidimensional semi-structured data, the generation of analytical reports and implementation of predictive models.

Fig. 5.1: Statistical data volumes are increasing, with a forecast up to 2020

Despite the apparent simplicity of the approaches to big data, their application has several features. One of the main factors that hinders the implementation of big data projects, in addition to the high cost, is the problem of selecting the data to be processed, i.e., determining which data to retrieve, store and analyse and which to leave out. Another problem of data processing is that the high computational complexity of algorithms is associated with a large number of parameters and attributes.

This chapter examines one of the possible ideas about applying the paradigm of big data – the ability to create high-performance intellectual models for multidimensional semi-structured data based on soft computing algorithms.

This work’s purpose is to develop and research a high-performance and at the same time compact deep neuro-fuzzy classifier that is able to perform a wide range of tasks in the fuzzy clustering, approximation and classification (recognition) of multidimensional semi-structured data.

To achieve this goal it is necessary to solve the following problems:

Analysis of methods and approaches to solving problems of clustering, approximation and classification (recognition), identifying their advantages and disadvantages;
Development of a conceptual model of analysis;
Research of neural network classifier based on a radial basis function network (RBFN);
Development architecture and learning algorithms of Kohonen fuzzy cellular neural network self-organizing map (FCNN-SOM).
Development of a model of a generalized deep fuzzy-neural classifier;
Effective justification of results – qualitative and quantitative assessment of the effectiveness of the model and approach in general, analysis comparing the model with classical neural network models.

This chapter contains seven sections. The first section provides an overview of the subject area, an analysis of the publications on the research topic, a comparison of the existing approaches to the analysis of data, identifying their advantages and disadvantages. The second section provides a justification of the chosen conceptual model of analysis and the corresponding neural net and fuzzy methods. The third, fourth and fifth sections discuss the implementation details of a new deep neuro-fuzzy model based on the proposed theoretical approach and provide a description and mathematical foundation for the algorithms and methods. The sixth section is devoted to the experimental verification of the obtained results on the test data. The seventh section describes the practical results of applying the developed model to build the trajectory of world economic and technological development and a calculation of forecast parameters of the level and pace of economic and technological development in several countries. The conclusion summarizes the results of the work performed.

5.1Literature review

Recognition is the problem of constructing and applying formal operations on numeric or symbolic representations of objects, in a real or ideal world; the outcomes of decisions of the objects reflect the relationship of equivalence between them. The relations of equivalence express the membership of the evaluated objects in any of the classes, considered as independent semantic units.

The classes of equivalence can be defined by a researcher, while constructing recognition algorithms, who uses his own substantive views or uses external information about the similarities and differences between objects in the context of the problem being solved. In this case, the term “pattern recognition with a teacher” is used [1]. Otherwise, when an automated system solves the classification problem without engaging external teaching information, it is referred to as automatic classification or “pattern recognition without a teacher”.

Various authors (e.g., Y. L. Barabash, V. I. Vasiliev, A. L. Gorelik, V. A. Skripkin, R. Duda, P. Hart, L. T. Kuzin, F. I. Peregudov, F. P. Tarasenko, F. E. Temnikov, John. Tu, R. Gonzalez, P. Winston, K. Fu, Y. Z. Tsypkin) have proposed different classification methods of pattern recognition. Some authors distinguish between parametric, nonparametric and heuristic methods, while others isolate groups of methods based on historical schools and streams in this area. For example, in the work of V. A. Duke [2], in which an academic overview of recognition methods is given, the following typology is used:

–Methods based on the principle of separation;

–Statistical methods;

–Methods based on “potential functions”;

–Methods of computing ratings (voting);

–Methods based on propositional calculus, in particular on the unit of the algebra of logic.

This classification is based on the difference in formal methods of pattern recognition and therefore the heuristic approach to recognition, which has received full and adequate development in expert systems, is not considered here. The heuristic approach is based on knowledge that is difficult to formalize and on the researcher’s intuition. The researcher decides which information will be used and how the system should use it to achieve the desired effect of recognition.

Such a typology of recognition methods with varying degrees of detail is found in many works on recognition. At the same time, well-known typologies do not take into account one very significant characteristic that reflects the specificity of the method for representing domain knowledge with the help of any formal algorithm of image recognition.

D. A. Pospelov in [3] distinguishes two main ways of representing knowledge:

–Intensional, in the form of diagrams of relationships between attributes (characteristics);

–Extensional, with specific facts (objects, examples).

Intensional representation captures patterns and connections that explain the structure of data. In the context of diagnostic tasks, such capture is to define operations on the attributes (characteristics) of objects, leading to the desired diagnostic result. Intensional representations are implemented by operations on attribute values and are not intended to operate on particular information with facts (objects).

Conversely, extensional representation of knowledge is associated with a description and fixation of particular objects of the subject area and implemented in operations whose elements are objects as a whole system.

One can draw an analogy between intensional and extensional knowledge and the mechanisms underlying the activities of the left and right hemispheres of the human brain. If the right hemisphere is characterized by the prototype of a holistic representation of the world, the left hemisphere deals with patterns that reflect the relationship between the attributes of this world.

Based on the two foregoing fundamental ways of representing knowledge, Lutsenko has suggested the following classification of methods of pattern recognition [4, 5]:

–Intensional methods based on operation characteristics;

–Extensional techniques based on operations with objects.

It should be stressed that the existence of these two (and only two) groups of methods of recognition (operating with characteristics and operating with objects) is deeply regular. From this point of view, neither of these methods, taken separately from the other, provides an adequate reflection of the subject area. According to the author, between these methods there is a relation of complementarity in the sense of Niels Bohr [6], so a promising recognition system needs to provide an implementation of both methods, but not just one of them.

Thus, it is possible to draw the following conclusions:

The basis of classification of pattern-recognition methods is built on fundamental laws underlying the human method of cognition in general, which puts it in a very special (privileged) position compared to other classification methods, which seem more lightweight and artificial.
An overview of recognition methods reveals the following contradiction. Currently, a number of different methods of pattern recognition and classification is theoretically developed and described in the literature. However, a software implementation of most methods is lacking and it is highly regular, it might even be said to be predefined by characteristics of the recognition methods themselves. This is evidenced by the fact that such systems receive little coverage in the literature and other sources of information.
Consequently, the question of the practical applicability of certain theoretical methods of pattern recognition, which are designed to solve practical problems in real (i.e., quite large) dimensions of data and on real, modern computers remains insufficiently explored.

The aforementioned facts can be understood if it is recalled that the complexity of mathematical models exponentially increases the complexity of software implementation of a system and, to the same extent, reduces the chances that the system will work in practice. This means that one can really implement on the market only those software systems that are based on rather simple and transparent mathematical models. Therefore, developers interested in the replication of their software should consider the choice of mathematical model not only from a scientific point of view but from a pragmatic one as well, given the capabilities of the software being implemented. Models should be as simple as possible and, thus, be developed at less cost and more efficiently and work (be practically effective).

In this regard, the task of actuating the mechanism of generalizing descriptions of objects belonging to a certain class in recognition systems is particularly relevant, i.e., the mechanism of forming compact generalized images. Obviously, such a generalization mechanism will make it possible to “compress” any dimension of the training sample to a previously known of dimension generalized image basis. It will also pose and solve a number of tasks that cannot even be formulated in the following recognition methods – the method of comparison with the prototype, the method of k-nearest neighbours and the algorithm of calculating estimates.

That are the tasks of:

–Determine the information contribution of the signs in the information portrait of a generalized image;

–Perform a cluster-constructive analysis of generalized images;

–Determine the semantic load of attributes;

–Perform a cluster semantic-structural analysis of signs;

–Make a meaningful comparison of the generalized images of classes with each other and signs with each other (cognitive chart, including the chart Merlin).

In practice, original data are often difficult to formalize and often have an inhomogeneous structure with deliberately overlapping classes of images. Another problem is the incompleteness of training samples and their strong noise. In a discriminant analysis in cases of uncertainty about responses on images belonging to a certain class of images, the answer can be obtained in the form of a probability of belonging to each of the image classes. However, the previously described features of data in many cases do not allow one to construct adequate probabilistic and statistical models that motivate the creation of empirical methods and approaches, and during their development methods and models from the field of soft computing become convenient.

Methods of fuzzy logic and neural network technology currently relate to the perspective of adaptive technologies of information processing and solving problems of pattern recognition and prediction, allowing one to create high-quality intelligent systems in conditions of multidimensional objects of intersecting classes analysis [7–9].

A significant scientific contribution to the theory and practice of creating diagnostic test systems built on fuzzy logic and neural networks was introduced by the following scientists: L. Zadeh, D. A. Pospelov, Y. A. Bortsov, F. Wasserman, N. Hassoun, A. P. Rotshtein, V. I. Gostev, A. I. Galushkin, V.V. Kruglov, D. Rutkowski, M. Pilinsky, S. Omatu and others [10–15].

An artificial neural network based on radial-symmetric (radial basis) functions (Figure 5.2), or an RBFN network, can be used to solve a wide range of tasks, among which the most commonly used are approximation, classification and data clustering [9, 16, 17].

The known methods for constructing RBFNs [12, 18] can be divided into two groups. The methods of the first group assume that the number of RBFN neurons in the first layer is set by the user, and the number of neurons in the second layer is determined by the network output dimension. After that, the RBFN is treated as a special case of a multilayer neural network of a direct signal propagation (multilayer perceptron), which is taught using gradient methods of multidimensional non-linear unconditional optimization [18], and the partial derivatives of the objective function are determined on the basis of a reverse error propagation technique [18]. In this case, there is uncertainty in the choice of the number of neurons in the first layer, which can lead to an overflow network or, conversely, to the impossibility of constructing a model with the required accuracy. Another disadvantage of these methods is the uncertainty in the choice of the initial values of the network weights, which can lead to an inability to solve the task of training the RBFN for a limited time.

The methods of the second group assume that the training sample is displayed in the RBFN structure. This, as a rule, is done by simple memorization of the entire sample or based on cluster analysis, which determines the number and coordinates of the cluster centres that are stored in the RBFN memory. The number of neurons in the first layer of the RBFN is set equal to the number of clusters, and the coordinates of the cluster centres are entered in the weights of the first layer’s neurons. Further, the RBFN is trained to correct the weights of neurons of the second layer on the basis of the methods of multidimensional non-linear unconditional optimization [12, 18]. If the network is obtained by displaying the sample, then, as a rule, it does not provide a generalization. If a cluster analysis is used, the quality of the obtained neuro-model essentially depends on the quality of the results of the cluster analysis; the network can be superfluous and exhibit low generalizing properties due to excessive granularity of the partition of the feature space. To solve this urgent problem, we developed a special architecture for the Kohonen FCNN.

The traditional approach to the performance of these tasks assumes that each observation can belong to only one cluster. In this case, the most widely used are Kohonen neural networks [1], which have a single-layer architecture with lateral connections and which are trained on the basis of so-called the winner takes all (WTA) or winner takes most (WTM) rule. The more natural situation is one where the processed feature vector with different levels of conditioning (probability, possibility) can belong to multiple classes. This situation is the subject of fuzzy cluster analysis, which developed a self-learning deep neuro-fuzzy model, which is a generalization of the Kohonen neural network and has, due to the use of special algorithms for the adjustment of its semantic weights, more functional opportunities. Thus, a modification of Kohonen network was introduced in [19, 20] based on fuzzy rules. This network has shown its effectiveness in a number of tasks associated with recognition; however, the associated numerical complexity hinders its practical use. In [21, 22] was proposed a fuzzy Kohonen network for clustering, which essentially uses a C-means clustering method. In [10], a Kohonen network with fuzzy inference trained on the basis of the combination of Kohonen and Grossberg rules was proposed. The main disadvantage of this design is the dependence of the results on the choice of the free parameters of the training procedure. In [23] the possibility of training a Kohonen artificial neural network by a cellular automaton (CNN-SOM) was proved, which can greatly enhance the quality and speed of training; however, the issues related to separating the properties of CNN-SOMs in the case of overlapping clusters have not been resolved.

5.2A conceptual model of analysis

In this chapter we introduce a new two-layer Kohonen (FCNN-SOM) that is an adaptive modification of the FKSN [19] and the further development of the composition in [23] and the algorithms for its learning.

Next, on the basis of combining FCNN-SOM and a RBFN, we describe a new deep neuro-fuzzy model for clustering and classification (pattern recognition). The concept of this model is based on the classification by Pospelov [1]; the two ways of representing knowledge are extensional (the formation of clusters of generalized patterns) and intensional (identifying integral characteristics in the selected clusters of classes).

Note that most pattern recognition algorithms require the involvement of large computing power, which can only be achieved on high-performance computer equipment. Therefore, an important task is to build a compact network architecture.

5.3Research on neural network classifier based on RBFN

An artificial neural network based on radial-symmetric (radial basis) functions (Figure 5.2), or RBFN, can be used to perform a wide range of tasks, the most frequent of which are approximation, classification and data clustering.

Fig. 5.2: Architecture of neural network based on radial basis functions

The main property of radially symmetric functions that they are monotone and symmetric about some vertical axis of symmetry and their changing (decrease or increase) responses. An example of such a function is the expression of the Gaussian function

f (s) = \exp (- α {(s - T)}^{2}) . (5.1)

$f (s) = \exp (- α {(s - T)}^{2}) . (5.1)$

This function is most often used in neural network architectures, but mainly in a multidimensional case:

\vec{h} (x) = \exp (- α {‖ x - \vec{c} ‖}^{2}), (5.2)

$\vec{h} (x) = \exp (- α {‖ x - \vec{c} ‖}^{2}), (5.2)$

where c⃗ is the vector of the centres (the coordinates of the vertical axes of symmetry) of the set of radially symmetric functions;

‖x − c⃗‖ is the norm of the deviation vector of the input variable from the centres of radially symmetric functions. The parameter α is related to the scattering radius r of the input variables and can be replaced by the corresponding ratio:

α = \frac{1}{2 r^{2}} . (5.3)

$α = \frac{1}{2 r^{2}} . (5.3)$

The norm of the vector difference is calculated as the Euclidean distance:

‖ x - \vec{c} ‖ = \sqrt{{(x - c_{1})}^{2} + \dots + {(x - c_{m})}^{2}} . (5.4)

$‖ x - \vec{c} ‖ = \sqrt{{(x - c_{1})}^{2} + \dots + {(x - c_{m})}^{2}} . (5.4)$

The RBFN architecture contains three layers of neurons (Figure 5.2). The first layer is the input layer. Outputs of the second (hidden) layer are activated by a set of radially symmetric functions. In fact, they process the vector of input values, determining the degree of proximity of each of them to the centres of radially symmetric functions. Outputs of neurons of the third layer (i.e., outputs of the entire neural network) are linear combinations of the outputs of the second layer.

The composition and number of inputs and outputs are determined by the class of the problem being solved. When data are approximated, the inputs are the arguments of the approximating relationship, and the outputs are the values returned by it. When clustering or classifying data, inputs are characteristics that distinguish objects classified as clusters or classes, and outputs indicate a cluster or class corresponding to inputs.

The number of hidden elements also depends on the task being solved. If this is an approximation of the data, it can be any number. In the case of clustering or classifying data, it should correspond to the number of clusters or reference images of classes.

The life cycle of artificial neural networks based on radially symmetric functions, as for most other architectures, includes two stages: training and practical use. In turn, at the learning stage, two stages can also be distinguished: tuning the neural network and optimizing the synaptic coefficients of the linear output layer.

At the stage of tuning the neural network under consideration, it is necessary to determine the centres c and radius r of the radial elements (neurons of the hidden layer). The following settings are possible.

1.In the presence of a small number of reference samples for training as centres of radially symmetric functions, the vectors corresponding to them are usually chosen. If the volume of the training sample is large enough, the following can be used as centres:

–The centres of potential clusters, by which you can distribute all examples of the training sample manually or using additional clustering algorithms, including other neural network architectures;

–Individual random examples of the training sample.

It should be noted that the second option is better to use with a large number of neurons in the hidden layer.

2.The choice of the radius of the radial elements is determined by the required form of the radially symmetric function. For large values of the parameter α, the graph of the function is too sharp, whichmeans that the network will not correctly interpolate the data between known points at a sufficiently large distance from them, as it loses the ability to generalize the training data. Conversely, if the parameter α is too small, the network becomes unresponsive to the individual parts.

In view of the foregoing, the radius can be specified in the following ways:

–By a user of the neural network in an explicit form based on the heuristic selection;

–By automatically calculating it as the average distance to a few (depending on the total training sample size and the number of hidden neurons) of the nearest examples.

RBFN networks have the property of fast convergence and can be trained more quickly on the basis of many examples in comparison with neural networks trained by the backpropagation algorithm of the error.

At the step of optimizing the weight coefficients of the linear output layer, the following steps are sequentially performed:

1.The characteristic matrix of the values of the radially symmetric elements of all training samples is calculated:

\bar{\bar{H}} = [\begin{array}{l} h_{1} (x^{(1)}) & h_{2} (x^{(1)}) & \dots & h_{m} (x^{(1)}) \\ h_{1} (x^{(2)}) & h_{2} (x^{(2)}) & \dots & h_{m} (x^{(2)}) \\ \dots & \dots & \dots & \dots \\ h_{1} (x^{(N)}) & h_{2} (x^{(N)}) & \dots & h_{m} (x^{(N)}) \end{array}]

$\bar{\bar{H}} = [\begin{array}{l} h_{1} (x^{(1)}) & h_{2} (x^{(1)}) & \dots & h_{m} (x^{(1)}) \\ h_{1} (x^{(2)}) & h_{2} (x^{(2)}) & \dots & h_{m} (x^{(2)}) \\ \dots & \dots & \dots & \dots \\ h_{1} (x^{(N)}) & h_{2} (x^{(N)}) & \dots & h_{m} (x^{(N)}) \end{array}]$

The number of rows of this matrix is equal to the number of examples of learning sample. The number of columns is the number of radial elements.

2.The methods of linear algebra are use to calculate the matrix of weight coefficients of the output layer neurons:

\bar{\bar{W}} = {({\bar{\bar{H}}}^{T} \cdot \bar{\bar{H}})}^{- 1} \cdot {\bar{\bar{H}}}^{T} \cdot \bar{\bar{Y}}, (5.6)

$\bar{\bar{W}} = {({\bar{\bar{H}}}^{T} \cdot \bar{\bar{H}})}^{- 1} \cdot {\bar{\bar{H}}}^{T} \cdot \bar{\bar{Y}}, (5.6)$

where the output matrix of training samples contains columns in an amount equal to the number of training samples, and the number of strings corresponding to the number of outputs of the neural network:

\bar{\bar{Y}} = [\begin{matrix} y_{11} & y_{21} & \dots & y_{k 1} \\ y_{12} & y_{22} & \dots & y_{k 2} \\ \dots & \dots & \dots & \dots \\ y_{1 N} & y_{2 N} & \dots & y_{k N} \end{matrix}] (5.7)

$\bar{\bar{Y}} = [\begin{matrix} y_{11} & y_{21} & \dots & y_{k 1} \\ y_{12} & y_{22} & \dots & y_{k 2} \\ \dots & \dots & \dots & \dots \\ y_{1 N} & y_{2 N} & \dots & y_{k N} \end{matrix}] (5.7)$

Among the advantages of the considered architecture of neural networks are the following:

–The presence of a single hidden layer, sufficient for modelling pronounced non-linear dependencies;

–Simplicity of the algorithm for optimizing weight coefficients;

–Guaranteed to find the global optimum of the error function when finding the weight coefficients of the neurons of the output layer;

–Fast learning speed.

The limitations or disadvantages of neural networks based on radially symmetric functions include the following:

–The need for special adjustment of the parameters of radially symmetric functions and the complexity of tuning with a large number of hidden radial elements;

–The impossibility of extrapolating the model beyond the initial interval of changing the input values of the training sample.

If the network is obtained by displaying a sample, then, as a rule, it does not provide a generalization. If a cluster analysis is used, the quality of the obtained neuro model essentially depends on the quality of the results of the cluster analysis, the network can be superfluous and exhibit low generalizing properties due to the excessive granularity of the partition of the feature space. To solve this urgent problem, we developed a special architecture for the Kohonen FCNN. Its output layer of fuzzy clustering simulates the symmetrical bell-shaped functions of the input example belonging to each of the generalized clusters. In the next section, we show the proposed architecture and describe algorithms for calculating and adjusting the parameters of the centres and the widths of these functions. The final result of this is our proposal to combine the output layer of the FCNN with the second (hidden) layer of the RBFN and thereby solve the most difficult stage of the RBFN setup – partitioning the input attribute spaces, selecting a view and setting the parameters of radially symmetric functions.

5.4Development of architecture and learning algorithms for Kohonen FCNN

Phase one: Self-organization

Within the extensional approach, we propose to perform a segmentation into clusters of certain classes of patterns by Kohonen SOMs [1] so as to solve the problem of separating properties in the case of overlapping clusters. This will make it possible to improve recognition quality.

A SOM has a simple architecture, including zero receptor layer and only one layer of Kohonen neurons, which are adaptive linear adders such as a rectangular grid in a plane (Figure 5.3). Each j-th neuron (j = 1, . . . , m) is characterized by an n-dimensional vector of synaptic weights Wj = (wj1, wj2, . . . , wjn)T.

The input vectors x(k) = (x1(k), . . . , xn(k))T, where k is the number of the example in the training set, are sequentially distributed from the receptive (zero) layer to all the neurons N j of the Kohonen layer; in addition, their synaptic weights wji(k) define the centroids m of overlapping clusters.

Fig. 5.3: Model of Kohonen self-organizing networks

The basis for the self-organization of neural networks is the observed regularity that the global ordering of networks becomes possible as a result of self-organizing operations, conducted independently of each other in various local segments of the networks. In accordance with the submitted signals, neurons are activated. Finally, one neuron becomes active in a network (or in a group). The output neuron that won the competition is called the winner neuron.

Neurons in the course of the competitive process, due to changes in the values of the synaptic weights, are selectively tuned to different input vectors or classes of input vectors. In the learning process there is a tendency to increase the values of the weights, because of which a peculiar positive feedback is created: more powerful exciting impulses – higher values of weights – greater activity of neurons.

In this case, there is a natural stratification of neurons into different groups; individual neurons or their groups cooperate with each other and they are activated in response to excitation created by specific training vectors, suppressing other neurons by their activity. One can speak about cooperation between neurons within a group and about competition between neurons within a group and between different groups.

Among the mechanisms of self-organization, there are two main classes: self-organization based on the associative Hebb’s rule, and the mechanism of neuron competition based on the generalized Kohonen rule. Henceforth, we will consider the mechanism of neuron competition.

The formation of self-organizing networks begins with the initialization of the synaptic weights of the network. Usually, synaptic weights are assigned small values that are generated by a random number generator. During this initialization, the network initially will have no order of signs of the input vectors. After the initialization of the network, three basic processes are implemented [1]:

Competition: For each input vector, network neurons calculate the relative values of the discriminant function;
Cooperation: The winning neuron determines the topological neighbourhood of a group of neurons, providing a basis for cooperation between them;
Synaptic adaptation: Correction of synaptic weights of excited neurons allows us to increase their eigenvalues of discriminant functions with respect to input vectors. The correction is made in such a way that the output signal of the winner neuron is increased in case of the subsequent application of similar input vectors.

Thus, the basis of the SOM training procedure is the competition between neurons based on the value of the activation function in response to an incoming signal.

Assume that the inputs and synaptic weights are prenormalized, and the Euclidean measure is used as the distance:

d (x, w_{i}) = ‖ x - w_{i} ‖ = \sqrt{\sum_{j = 1}^{N} {(x - w_{i j})}^{2}} . (5.8)

$d (x, w_{i}) = ‖ x - w_{i} ‖ = \sqrt{\sum_{j = 1}^{N} {(x - w_{i j})}^{2}} . (5.8)$

Then, using the Euclidean measure, partitioning the space into zones of neuron dominance is equivalent to splitting it into Voronoi domains. If we use another measure (scalar product, measure relative to the norm of L1 – Manhattan – or a measure relative to the norm of L∞), another division of neuron influence areas is formed.

With normalized input learning vectors, the weight vectors that strive for them are normalized automatically. It should be noted that when the weight vector is normalized, the Euclidean measure and the scalar product are equivalent to each other since

{‖ x - w_{i} ‖}^{2} = {‖ x ‖}^{2} + {‖ w_{i} ‖}^{2} - 2 x^{T} w_{i} . (5.9)

${‖ x - w_{i} ‖}^{2} = {‖ x ‖}^{2} + {‖ w_{i} ‖}^{2} - 2 x^{T} w_{i} . (5.9)$

Thus,

\min {‖ x - w_{i} ‖}^{2} = \max (x^{T} w_{i}) at ‖ w ‖ = const . (5.10)

$\min {‖ x - w_{i} ‖}^{2} = \max (x^{T} w_{i}) at ‖ w ‖ = const . (5.10)$

The winner neuron is determined with the minimum distance to the presented input vector:

D (x (k), w_{j}^{*} (k)) = \min_{j} {‖ x (k) - w_{j} (k) ‖}^{2}, (5.11)

$D (x (k), w_{j}^{*} (k)) = \min_{j} {‖ x (k) - w_{j} (k) ‖}^{2}, (5.11)$

or for normalized vectors, that is the same as

D (x (k), w_{j}^{*} (k)) = \max_{j} x^{T} (k) w_{j} (k) = \max_{j} \cos (x (k), w_{j} (k)) . (5.12)

$D (x (k), w_{j}^{*} (k)) = \max_{j} x^{T} (k) w_{j} (k) = \max_{j} \cos (x (k), w_{j} (k)) . (5.12)$

It is obvious that

- 1 \leq \cos (x (k), w_{j} (k)) \leq 1 and 0 \leq {‖ x (k) - w_{j} (k) ‖}^{2} \leq 4. (5.13)

$- 1 \leq \cos (x (k), w_{j} (k)) \leq 1 and 0 \leq {‖ x (k) - w_{j} (k) ‖}^{2} \leq 4. (5.13)$

A topological neighbourhood Sw(t) is formed around the winner neuron with a certain energy that decreases with time. The neuron-winner and all the neurons lying within its neighbourhood are subjected to adaptation, during which their weight vectors change in the direction of the vector x according to the Kohonen rule

w_{i} (t + 1) = w_{i} (t) + η_{i} (t) (x - w_{i} (t)) (5.14)

$w_{i} (t + 1) = w_{i} (t) + η_{i} (t) (x - w_{i} (t)) (5.14)$

for i ∈ Sw(t), where ηi(t) is the coefficient of training of the i-th neuron on the neighbourhood Sw(t) at time t.

The value ηi(t) decreases with increasing distance between the i-th neuron and the winner.

Weights of neurons that are outside the neighbourhood of Sw(t) do not change. The size of the neighbourhood and the training coefficients of neurons are functions whose values decrease with time. In [11] it is proved that adaptation according to the Kohonen rule is equivalent to the gradient method of training based on the minimization of the objective function,

E (w) = \frac{1}{2} S_{i} (x (t)) {[x_{j} (t) - w_{i j} (t)]}^{2}, (5.15)

$E (w) = \frac{1}{2} S_{i} (x (t)) {[x_{j} (t) - w_{i j} (t)]}^{2}, (5.15)$

and Si(x(t)) is a function of the definition of the neighbourhood, which varies during the learning process.

After the presentation of two different vectors x1 and x2, two network neurons are activated whose weights are closest to the coordinates of the corresponding vectors. These weights, denoted by w1 and w2, can be displayed in space as two points. Approximation of the vectors x1 and x2 causes a corresponding change in the arrangement of the vectors w1 and w2.

In the limit, the equality w1 = w2 is satisfied if and only if x1 and x2 coincide or are practically indistinguishable from each other. The network in which these conditions are met is called a topographic map or a Kohonen map.

Note that the tuning of the synaptic weights of the winner neuron can also occur in other algorithms: according to the WTA rule or with modified learning algorithms, giving better results than the WTA algorithm, for example, conscience WTA (CWTA), WTM or time-adaptive self-organizing map (TASOM) [1].

A common drawback of these algorithms’ SOM is the presence of heuristic parameters and procedures for the solution of the problem of “dead” neurons increases the training time, with the impossibility of separating overlapping clusters. When network weights are initialized randomly, some of the neurons may be in the region of space in which there are no or a negligible amount of data. These neurons have little chance of winning and adapting their scales, so they remain dead. Thus, the input data will be interpreted by a smaller number of neurons, and the error in the interpretation of the data will increase. Therefore, an important problem is the activation of all neurons in the network, which can be done if the training algorithm provides for the calculation of the number of victories of each neuron and the training process is organized so as to give a chance to less active neurons to win.

Phase two: Additional terms and restrictions

We require that the desired self-organizing of the neuro-fuzzy network was possible under the following conditions:

High degree of self-organization of neurons;
An ability to selectively control individual connections between neurons to solve the problem of “dead” neurons;
High flexibility, simplicity of realization and temporal efficiency of algorithm;
Raising of dividing network properties in the case of overlapping clusters with output of the membership degree of the signal to each cluster.

Phase three: Architecture of Kohonen FCNNs

To ensure these conditions, the class of cellular neural networks (CNNs) was considered; they were introduced by Chua and Yang [24, 25] and became effectively used in computational models of image processing. In [23] the possibility of learning of Kohonen’s neural nets by a cellular automaton (CA) that can significantly improve the quality and speed of learning was demonstrated; however, it did not resolve the issues dividing the network properties in the case of overlapping clusters.

We proposed a fuzzy cellular architecture of the Kohonen neural network (FCNN-SOM) that contains three layers (Figure 5.4):

Input (receptor) layer;
Layer of Kohonen neurons with lateral connections, trained by CA to identify centroids of overlapping clusters;
Additional (output) layer of fuzzy clustering calculating the degree of membership of the current vector in each cluster.

Fig. 5.4: Architecture of Kohonen FCNN-SOM

Consider the work of the FCNN-SOM in stages.

Phase four: Learning algorithm of Kohonen FCNNs

Source data – sample of observations, formed from N n-dimensional sign vectors x(k), k = 1, . . . , N.

The goal of learning is the division of learning sample data into m clusters with some levels of membership uj(k) of the k-th sign vector in the j-th cluster (j = 1, . . . , m).

Step 1: Data preprocessing

Input data are centred and standardized for all ordinates so that all observations belong to hypercube [−1; 1]n. Redefinition of the vector components is performed in accordance with the equation

x_{i,}_{new} = \frac{x_{i}}{\sqrt{\sum_{i = 1}^{N} x_{i}^{2}}} . (5.16)

$x_{i,}_{new} = \frac{x_{i}}{\sqrt{\sum_{i = 1}^{N} x_{i}^{2}}} . (5.16)$

Centring can be done in two ways:

1.Relative to the average, calculated using the equation

m_{i} (k) = m_{i} (k - 1) + \frac{1}{k} (x_{i} (k) - m_{i} (k - 1)); (5.17)

$m_{i} (k) = m_{i} (k - 1) + \frac{1}{k} (x_{i} (k) - m_{i} (k - 1)); (5.17)$

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 5 Model of intellectual analysis of multidimensional semi-structured data based on deep neuro-fuzzy networks

Create new playlist

Sign In