Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Profiling

One of the interesting tasks in unsupervised learning is the profiling or clustering of information, in this chapter, customers and products. Given one dataset, one wants to find groups of records that share similar characteristics. Examples are customers that buy the same products or products that are usually bought together. This task results in a number of benefits for business owners because they are provided the information on which groups of customers and products they have, whereby they are enabled to address them more accurately.

Pre-processing

As seen in Chapter 6, Classifying Disease Diagnosis transactional databases can contain both numerical and categorical data. Whenever we face a categorical unscaled variable, we need to split it into the number of values the variable may take, using the CategoricalDataSet class. For example, let's suppose we have the following transaction list of customer purchases:

Transaction ID	Customer ID	Products	Discount	Total
1399	56	Milk, Bread, Butter	0.00	4.30
1400	991	Cheese, Milk	2.30	5.60
1401	406	Bread, Sausage	0.00	8.80
1402	239	Chipotle Sauce, Spice	0.00	6.70
1403	33	Turkey	0.00	4.50
1404	406	Turkey, Butter, Spice	1.00	9.00

It can easily be seen that the products are unscaled categorical data and for each transaction there is an undefined number of products purchased, the customer may purchase one or several. In order to transform that dataset into a numerical dataset, preprocessing is needed. For each product there will be a variable added to the dataset, resulting in the following:

Cust. Id	Milk	Bread	Butter	Cheese	Sausage	Chipotle Sauce	Spice	Turkey
56	1	1	1	0	0	0	0	0
991	1	0	0	1	0	0	0	0
406	0	1	1	0	1	0	1	1
239	0	0	0	0	0	1	1	0
33	0	0	0	0	0	0	0	1

In order to save space, we ignored the numerical variables and considered the presence of the product purchased by a client as 1 and the absence as 0. Alternative preprocessing may consider the number of occurrences of a value, therefore becoming no longer binary, but discrete.

Implementation in Java

In this chapter, we are going to explore the usage of Kohonen neural network applied to customer clustering based on customer information collected from Proben1 (Card dataset).

Card – credit analysis for customer profiling

The card dataset is composed of 16 variables in total. 15 are inputs and one is output. For security reasons, all variable names have been changed to meaningless symbols. This dataset brings a good mix of variable types (continuous, categorical with small numbers of values, and categorical with a larger number of values). The following table shows a summary of data:

Variable	Type	Values
V1	OUTPUT	0; 1
V2	INPUT #1	b, a
V3	INPUT #2	continuous
V4	INPUT #3	continuous
V5	INPUT #4	u, y, l, t.
V6	INPUT #5	g, p, gg
V7	INPUT #6	c, d, cc, i, j, k, m, r, q, w, x, e, aa, ff
V8	INPUT #7	v, h, bb, j, n, z, dd, ff, o
V9	INPUT #8	continuous
V10	INPUT #9	t, f
V11	INPUT #10	t, f
V12	INPUT #11	continuous
V13	INPUT #12	t, f
V14	INPUT #13	g, p, s
V15	INPUT #14	continuous
V16	INPUT #15	continuous

For simplicity we didn't use the inputs v5-v8 and v14, in order to not inflate the number of inputs very much. We applied the following transformation:

Variable	Type	Values	Conversion
V1	OUTPUT	0; 1	-
V2	INPUT #1	b, a	b = 1, a = 0
V3	INPUT #2	continuous	-
V4	INPUT #3	continuous	-
V9	INPUT #8	continuous	-
V10	INPUT #9	t, f	t = 1, f = 0
V11	INPUT #10	t, f	t = 1, f = 0
V12	INPUT #11	continuous	-
V13	INPUT #12	t, f	t = 1, f = 0
V15	INPUT #14	continuous	-
V16	INPUT #15	continuous	-

The neural net topology proposed is shown in the following figure:

Card – credit analysis for customer profiling

The number of examples stored is 690, but 37 of them have missing values. These 37 records were discarded. Therefore, 653 examples were used to train and test the neural network. The dataset division was made as follows:

Training: 583 records
Test: 70 records

The Kohonen training algorithm used to cluster similar behavior depends on some parameters, such as:

Normalization type
Learning rate

It is important to consider that the Kohonen training algorithm is unsupervised. So, this algorithm is used when the output is not known. In the card example there are output values in the dataset and they will be used here only to attest clustering. But in traditional clustering cases, the output values are not available.

In this specific case, because output is known, as classification, the clustering quality may be attested by:

Sensibility (true positive rate)
Specificity (true negative rate)
Total accuracy

In Java projects, the calculations of these values are done through a class named NeuralOutputData, previously developed in Chapter 6, Classifying Disease Diagnosis.

It is good practice to do many experiments to try to find the best neural net to cluster customers' profiles. Ten different experiments will be generated and each will be analyzed with the quality rates mentioned previously. The following table summarizes the strategy that will be followed:

Experiment	Learning rate	Normalization type
#1	0.1	MIN_MAX
#2	0.1	Z_SCORE
#3	0.3	MIN_MAX
#4	0.3	Z_SCORE
#5	0.5	MIN_MAX
#6	0.5	Z_SCORE
#7	0.7	MIN_MAX
#8	0.7	Z_SCORE
#9	0.9	MIN_MAX
#10	0.9	Z_SCORE

The ClusterExamples class was created to run each experiment. In addition to processing data in Chapter 4, Self-Organizing Maps it was also explained how to create a Kohonen net and how to train it via the Euclidian distance algorithm.

The following piece of code shows a bit of its implementation:

// enter neural net parameter via keyboard (omitted)

// load dataset from external file (omitted)

// data normalization (omitted)

// create ANN and define parameters to TRAIN:
CompetitiveLearning cl = new CompetitiveLearning(kn1, neuralDataSetToTrain, LearningAlgorithm.LearningMode.ONLINE);
  cl.show2DData=false;
  cl.printTraining=false;
  cl.setLearningRate( typedLearningRate );
  cl.setMaxEpochs( typedEpochs );
  cl.setReferenceEpoch( 200 );
  cl.setTestingDataSet(neuralDataSetToTest);

// train ANN
try {
System.out.println("Training neural net... Please, wait...");
  cl.train();
  System.out.println("Winner neurons (clustering result [TRAIN]):");
  System.out.println( Arrays.toString( cl.getIndexWinnerNeuronTrain() ) );
  
} catch (NeuralException ne) {
  ne.printStackTrace();
}

After running each experiment using the ClusteringExamples class and saving the confusion matrix and total accuracy rates, it is possible to observe that experiments #4, #6, #8, and #10 have the same confusion matrix and accuracy. These experiments used z-score to normalize data:

Experiment	Confusion matrix	Total accuracy
#1	[[14.0, 21.0] [18.0, 17.0]]	44.28%
#2	[[11.0, 24.0] [34.0, 1.0]]	17.14%
#3	[[21.0, 14.0] [17.0, 18.0]]	55.71%
#4	[[24.0, 11.0] [1.0, 34.0]]	82.85%
#5	[[21.0, 14.0] [17.0, 18.0]]	55.71%
#6	[[24.0, 11.0] [1.0, 34.0]]	82.85%
#7	[[8.0, 27.0] [7.0, 28.0]]	51.42%
#8	[[24.0, 11.0] [1.0, 34.0]]	82.85%
#9	[[27.0, 8.0] [28.0, 7.0]]	48.57%
#10	[[24.0, 11.0] [1.0, 34.0]]	82.85%

So, neural nets built by experiments #4, #6, #8, or #10 may be used to reach accuracy more than 80% to cluster customers financially.

Product profiling

Using a transactional database provided with the code, we've compiled about 650 purchase transactions into a big matrix transactions x products, where in each cell there is the quantity of the corresponding product that has been bought on the corresponding transaction:

#Trns.	Prd.1	Prd.2	Prd.3	Prd.4	Prd.5	Prd.6	Prd.7	…	Prd.N
1	56	0	0	3	2	0	0	…	0
2	0	0	40	0	7	0	19	…	0
…	…	…	…	…	…	…	…	…	…
n	0	0	0	0	0	0	0	…	1

Let's consider that this matrix is a representation in an N-dimensional hyperspace taking each product as a dimension and the transactions as points. For simplicity, let's consider an example on three dimensions. A given transaction with the quantities bought for each product will be placed in a point corresponding to the quantities at each dimension.

The idea is to cluster these transactions in order to find which products are usually bought together. So, we are going to use a Kohonen neural network in order to find the positions of the products that the clusters centers will be located at.

Our database consists of a clothing store and a sample of 27 registered products:

1 Long Dress A	19 Overall with zipper	43 Bermuda M
3 Long Dress B	22 Shoulder overall	48 Stripped skirt
7 Short Dress A	23 Long stamped skirt	67 Camisole shoulder strap
8 Stamped Dress	24 Stamped short dress	68 Jeans M
9 Women Camisole	28 Pants M	69 XL Short dress
13 Pants S	31 Sleeveless short dress	74 Stripped camisole S
16 Overall for children	32 Short dress shoulder	75 Stripped camisole M
17 Shorts	34 Short dress B	76 Stripped camisole L
18 Stamped overall	42 Two blouse overall	106 Straight skirt

How many clusters?

Sometimes it may be difficult to choose how many clusters to find in a clustering algorithm. Some approaches to determine an optimal choice include information criteria such as Akaike Information Criteria (AIC), Bayesian Information Criteria (BIC), and the Mahalanobis distance from the center to the data. We suggest to the reader to check the references if interested in further details on these criteria.

To make tests to product example, we also should use the ClusteringExamples class. For simplicity, we run tests with three and five clusters. For each experiment, the number of epochs was 1000, the learning rate was 0.5, and the normalization type was MIN_MAX (-1; 1). Some results are shown in the following table:

Number of clusters	Clusters of the first 15 elements	Sum of products bought
3	0, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 0, 2,	973, 585, 11, 5, 2, 4, 11, 6, 3, 2, 2, 2, 669, 672, 7,
5	0, 1, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 0, 0, 4,	973, 585, 11, 5, 2, 4, 11, 6, 3, 2, 2, 2, 669, 672, 7,

Number of clusters

Clusters of the first 15 elements

Sum of products bought

0, 1, 2, 2, 2,

2, 2, 2, 2, 2,

2, 2, 0, 0, 2,

973, 585, 11, 5, 2,

4, 11, 6, 3, 2,

2, 2, 669, 672, 7,

0, 1, 4, 4, 4,

4, 4, 4, 4, 4,

4, 4, 0, 0, 4,

973, 585, 11, 5, 2,

4, 11, 6, 3, 2,

2, 2, 669, 672, 7,

Observing the preceding table, we note when the sum of products acquired is more than 600, then it's clustered together. Otherwise, when the sum is in the range of 500 to 599, another cluster is formed. Lastly, if the sum is low, a large cluster is created, because the dataset is compound by many cases that customers doesn't by more than 20 items.

Tip

As recommend in the previous chapter, we suggest you explore the ClusteringExamples class and create a GUI to easily select the neural net parameters. You should try to reuse code through the inheritance concept.

Another tip is to further explore the product profiling example: varying the neural network training parameters, the number of clusters, and/or develop others ways of analyzing the clustering result.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Profiling

Create new playlist

Sign In

Sign Up