One of the interesting tasks in unsupervised learning is the profiling or clustering of information, in this chapter, customers and products. Given one dataset, one wants to find groups of records that share similar characteristics. Examples are customers that buy the same products or products that are usually bought together. This task results in a number of benefits for business owners because they are provided the information on which groups of customers and products they have, whereby they are enabled to address them more accurately.
As seen in Chapter 6, Classifying Disease Diagnosis transactional databases can contain both numerical and categorical data. Whenever we face a categorical unscaled variable, we need to split it into the number of values the variable may take, using the CategoricalDataSet
class. For example, let's suppose we have the following transaction list of customer purchases:
Transaction ID |
Customer ID |
Products |
Discount |
Total |
---|---|---|---|---|
1399 |
56 |
Milk, Bread, Butter |
0.00 |
4.30 |
1400 |
991 |
Cheese, Milk |
2.30 |
5.60 |
1401 |
406 |
Bread, Sausage |
0.00 |
8.80 |
1402 |
239 |
Chipotle Sauce, Spice |
0.00 |
6.70 |
1403 |
33 |
Turkey |
0.00 |
4.50 |
1404 |
406 |
Turkey, Butter, Spice |
1.00 |
9.00 |
It can easily be seen that the products are unscaled categorical data and for each transaction there is an undefined number of products purchased, the customer may purchase one or several. In order to transform that dataset into a numerical dataset, preprocessing is needed. For each product there will be a variable added to the dataset, resulting in the following:
Cust. Id |
Milk |
Bread |
Butter |
Cheese |
Sausage |
Chipotle Sauce |
Spice |
Turkey |
---|---|---|---|---|---|---|---|---|
56 |
1 |
1 |
1 |
0 |
0 |
0 |
0 |
0 |
991 |
1 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
406 |
0 |
1 |
1 |
0 |
1 |
0 |
1 |
1 |
239 |
0 |
0 |
0 |
0 |
0 |
1 |
1 |
0 |
33 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
In order to save space, we ignored the numerical variables and considered the presence of the product purchased by a client as 1 and the absence as 0. Alternative preprocessing may consider the number of occurrences of a value, therefore becoming no longer binary, but discrete.
In this chapter, we are going to explore the usage of Kohonen neural network applied to customer clustering based on customer information collected from Proben1 (Card dataset).
The card dataset is composed of 16 variables in total. 15 are inputs and one is output. For security reasons, all variable names have been changed to meaningless symbols. This dataset brings a good mix of variable types (continuous, categorical with small numbers of values, and categorical with a larger number of values). The following table shows a summary of data:
Variable |
Type |
Values |
---|---|---|
V1 |
OUTPUT |
0; 1 |
V2 |
INPUT #1 |
b, a |
V3 |
INPUT #2 |
continuous |
V4 |
INPUT #3 |
continuous |
V5 |
INPUT #4 |
u, y, l, t. |
V6 |
INPUT #5 |
g, p, gg |
V7 |
INPUT #6 |
c, d, cc, i, j, k, m, r, q, w, x, e, aa, ff |
V8 |
INPUT #7 |
v, h, bb, j, n, z, dd, ff, o |
V9 |
INPUT #8 |
continuous |
V10 |
INPUT #9 |
t, f |
V11 |
INPUT #10 |
t, f |
V12 |
INPUT #11 |
continuous |
V13 |
INPUT #12 |
t, f |
V14 |
INPUT #13 |
g, p, s |
V15 |
INPUT #14 |
continuous |
V16 |
INPUT #15 |
continuous |
For simplicity we didn't use the inputs v5-v8 and v14, in order to not inflate the number of inputs very much. We applied the following transformation:
Variable |
Type |
Values |
Conversion |
---|---|---|---|
V1 |
OUTPUT |
0; 1 |
- |
V2 |
INPUT #1 |
b, a |
b = 1, a = 0 |
V3 |
INPUT #2 |
continuous |
- |
V4 |
INPUT #3 |
continuous |
- |
V9 |
INPUT #8 |
continuous |
- |
V10 |
INPUT #9 |
t, f |
t = 1, f = 0 |
V11 |
INPUT #10 |
t, f |
t = 1, f = 0 |
V12 |
INPUT #11 |
continuous |
- |
V13 |
INPUT #12 |
t, f |
t = 1, f = 0 |
V15 |
INPUT #14 |
continuous |
- |
V16 |
INPUT #15 |
continuous |
- |
The neural net topology proposed is shown in the following figure:
The number of examples stored is 690, but 37 of them have missing values. These 37 records were discarded. Therefore, 653 examples were used to train and test the neural network. The dataset division was made as follows:
The Kohonen training algorithm used to cluster similar behavior depends on some parameters, such as:
It is important to consider that the Kohonen training algorithm is unsupervised. So, this algorithm is used when the output is not known. In the card example there are output values in the dataset and they will be used here only to attest clustering. But in traditional clustering cases, the output values are not available.
In this specific case, because output is known, as classification, the clustering quality may be attested by:
In Java projects, the calculations of these values are done through a class named NeuralOutputData
, previously developed in Chapter 6, Classifying Disease Diagnosis.
It is good practice to do many experiments to try to find the best neural net to cluster customers' profiles. Ten different experiments will be generated and each will be analyzed with the quality rates mentioned previously. The following table summarizes the strategy that will be followed:
Experiment |
Learning rate |
Normalization type |
---|---|---|
#1 |
0.1 |
MIN_MAX |
#2 |
Z_SCORE | |
#3 |
0.3 |
MIN_MAX |
#4 |
Z_SCORE | |
#5 |
0.5 |
MIN_MAX |
#6 |
Z_SCORE | |
#7 |
0.7 |
MIN_MAX |
#8 |
Z_SCORE | |
#9 |
0.9 |
MIN_MAX |
#10 |
Z_SCORE |
The ClusterExamples
class was created to run each experiment. In addition to processing data in Chapter 4, Self-Organizing Maps it was also explained how to create a Kohonen net and how to train it via the Euclidian distance algorithm.
The following piece of code shows a bit of its implementation:
// enter neural net parameter via keyboard (omitted) // load dataset from external file (omitted) // data normalization (omitted) // create ANN and define parameters to TRAIN: CompetitiveLearning cl = new CompetitiveLearning(kn1, neuralDataSetToTrain, LearningAlgorithm.LearningMode.ONLINE); cl.show2DData=false; cl.printTraining=false; cl.setLearningRate( typedLearningRate ); cl.setMaxEpochs( typedEpochs ); cl.setReferenceEpoch( 200 ); cl.setTestingDataSet(neuralDataSetToTest); // train ANN try { System.out.println("Training neural net... Please, wait..."); cl.train(); System.out.println("Winner neurons (clustering result [TRAIN]):"); System.out.println( Arrays.toString( cl.getIndexWinnerNeuronTrain() ) ); } catch (NeuralException ne) { ne.printStackTrace(); }
After running each experiment using the ClusteringExamples
class and saving the confusion matrix and total accuracy rates, it is possible to observe that experiments #4, #6, #8, and #10 have the same confusion matrix and accuracy. These experiments used z-score to normalize data:
Experiment |
Confusion matrix |
Total accuracy |
---|---|---|
#1 |
[[14.0, 21.0] [18.0, 17.0]] |
44.28% |
#2 |
[[11.0, 24.0] [34.0, 1.0]] |
17.14% |
#3 |
[[21.0, 14.0] [17.0, 18.0]] |
55.71% |
#4 |
[[24.0, 11.0] [1.0, 34.0]] |
82.85% |
#5 |
[[21.0, 14.0] [17.0, 18.0]] |
55.71% |
#6 |
[[24.0, 11.0] [1.0, 34.0]] |
82.85% |
#7 |
[[8.0, 27.0] [7.0, 28.0]] |
51.42% |
#8 |
[[24.0, 11.0] [1.0, 34.0]] |
82.85% |
#9 |
[[27.0, 8.0] [28.0, 7.0]] |
48.57% |
#10 |
[[24.0, 11.0] [1.0, 34.0]] |
82.85% |
So, neural nets built by experiments #4, #6, #8, or #10 may be used to reach accuracy more than 80% to cluster customers financially.
Using a transactional database provided with the code, we've compiled about 650 purchase transactions into a big matrix transactions x products, where in each cell there is the quantity of the corresponding product that has been bought on the corresponding transaction:
#Trns. |
Prd.1 |
Prd.2 |
Prd.3 |
Prd.4 |
Prd.5 |
Prd.6 |
Prd.7 |
… |
Prd.N |
---|---|---|---|---|---|---|---|---|---|
1 |
56 |
0 |
0 |
3 |
2 |
0 |
0 |
… |
0 |
2 |
0 |
0 |
40 |
0 |
7 |
0 |
19 |
… |
0 |
… |
… |
… |
… |
… |
… |
… |
… |
… |
… |
n |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
… |
1 |
Let's consider that this matrix is a representation in an N-dimensional hyperspace taking each product as a dimension and the transactions as points. For simplicity, let's consider an example on three dimensions. A given transaction with the quantities bought for each product will be placed in a point corresponding to the quantities at each dimension.
The idea is to cluster these transactions in order to find which products are usually bought together. So, we are going to use a Kohonen neural network in order to find the positions of the products that the clusters centers will be located at.
Our database consists of a clothing store and a sample of 27 registered products:
1 Long Dress A |
19 Overall with zipper |
43 Bermuda M |
3 Long Dress B |
22 Shoulder overall |
48 Stripped skirt |
7 Short Dress A |
23 Long stamped skirt |
67 Camisole shoulder strap |
8 Stamped Dress |
24 Stamped short dress |
68 Jeans M |
9 Women Camisole |
28 Pants M |
69 XL Short dress |
13 Pants S |
31 Sleeveless short dress |
74 Stripped camisole S |
16 Overall for children |
32 Short dress shoulder |
75 Stripped camisole M |
17 Shorts |
76 Stripped camisole L | |
18 Stamped overall |
42 Two blouse overall |
106 Straight skirt |
Sometimes it may be difficult to choose how many clusters to find in a clustering algorithm. Some approaches to determine an optimal choice include information criteria such as Akaike Information Criteria (AIC), Bayesian Information Criteria (BIC), and the Mahalanobis distance from the center to the data. We suggest to the reader to check the references if interested in further details on these criteria.
To make tests to product example, we also should use the ClusteringExamples
class. For simplicity, we run tests with three and five clusters. For each experiment, the number of epochs was 1000, the learning rate was 0.5, and the normalization type was MIN_MAX (-1; 1)
. Some results are shown in the following table:
Number of clusters |
Clusters of the first 15 elements |
Sum of products bought |
---|---|---|
3 |
0, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 0, 2, |
973, 585, 11, 5, 2, 4, 11, 6, 3, 2, 2, 2, 669, 672, 7, |
5 |
0, 1, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 0, 0, 4, |
973, 585, 11, 5, 2, 4, 11, 6, 3, 2, 2, 2, 669, 672, 7, |
Observing the preceding table, we note when the sum of products acquired is more than 600, then it's clustered together. Otherwise, when the sum is in the range of 500 to 599, another cluster is formed. Lastly, if the sum is low, a large cluster is created, because the dataset is compound by many cases that customers doesn't by more than 20 items.
As recommend in the previous chapter, we suggest you explore the ClusteringExamples
class and create a GUI to easily select the neural net parameters. You should try to reuse code through the inheritance concept.
Another tip is to further explore the product profiling example: varying the neural network training parameters, the number of clusters, and/or develop others ways of analyzing the clustering result.