Profiling

One of the interesting tasks in unsupervised learning is the profiling or clustering of information, in this chapter, customers and products. Given one dataset, one wants to find groups of records that share similar characteristics. Examples are customers that buy the same products or products that are usually bought together. This task results in a number of benefits for business owners because they are provided the information on which groups of customers and products they have, whereby they are enabled to address them more accurately.

Pre-processing

As seen in Chapter 6, Classifying Disease Diagnosis transactional databases can contain both numerical and categorical data. Whenever we face a categorical unscaled variable, we need to split it into the number of values the variable may take, using the CategoricalDataSet class. For example, let's suppose we have the following transaction list of customer purchases:

Transaction ID

Customer ID

Products

Discount

Total

1399

56

Milk, Bread, Butter

0.00

4.30

1400

991

Cheese, Milk

2.30

5.60

1401

406

Bread, Sausage

0.00

8.80

1402

239

Chipotle Sauce, Spice

0.00

6.70

1403

33

Turkey

0.00

4.50

1404

406

Turkey, Butter, Spice

1.00

9.00

It can easily be seen that the products are unscaled categorical data and for each transaction there is an undefined number of products purchased, the customer may purchase one or several. In order to transform that dataset into a numerical dataset, preprocessing is needed. For each product there will be a variable added to the dataset, resulting in the following:

Cust. Id

Milk

Bread

Butter

Cheese

Sausage

Chipotle Sauce

Spice

Turkey

56

1

1

1

0

0

0

0

0

991

1

0

0

1

0

0

0

0

406

0

1

1

0

1

0

1

1

239

0

0

0

0

0

1

1

0

33

0

0

0

0

0

0

0

1

In order to save space, we ignored the numerical variables and considered the presence of the product purchased by a client as 1 and the absence as 0. Alternative preprocessing may consider the number of occurrences of a value, therefore becoming no longer binary, but discrete.

Implementation in Java

In this chapter, we are going to explore the usage of Kohonen neural network applied to customer clustering based on customer information collected from Proben1 (Card dataset).

Card – credit analysis for customer profiling

The card dataset is composed of 16 variables in total. 15 are inputs and one is output. For security reasons, all variable names have been changed to meaningless symbols. This dataset brings a good mix of variable types (continuous, categorical with small numbers of values, and categorical with a larger number of values). The following table shows a summary of data:

Variable

Type

Values

V1

OUTPUT

0; 1

V2

INPUT #1

b, a

V3

INPUT #2

continuous

V4

INPUT #3

continuous

V5

INPUT #4

u, y, l, t.

V6

INPUT #5

g, p, gg

V7

INPUT #6

c, d, cc, i, j, k, m, r, q, w, x, e, aa, ff

V8

INPUT #7

v, h, bb, j, n, z, dd, ff, o

V9

INPUT #8

continuous

V10

INPUT #9

t, f

V11

INPUT #10

t, f

V12

INPUT #11

continuous

V13

INPUT #12

t, f

V14

INPUT #13

g, p, s

V15

INPUT #14

continuous

V16

INPUT #15

continuous

For simplicity we didn't use the inputs v5-v8 and v14, in order to not inflate the number of inputs very much. We applied the following transformation:

Variable

Type

Values

Conversion

V1

OUTPUT

0; 1

-

V2

INPUT #1

b, a

b = 1, a = 0

V3

INPUT #2

continuous

-

V4

INPUT #3

continuous

-

V9

INPUT #8

continuous

-

V10

INPUT #9

t, f

t = 1, f = 0

V11

INPUT #10

t, f

t = 1, f = 0

V12

INPUT #11

continuous

-

V13

INPUT #12

t, f

t = 1, f = 0

V15

INPUT #14

continuous

-

V16

INPUT #15

continuous

-

The neural net topology proposed is shown in the following figure:

Card – credit analysis for customer profiling

The number of examples stored is 690, but 37 of them have missing values. These 37 records were discarded. Therefore, 653 examples were used to train and test the neural network. The dataset division was made as follows:

  • Training: 583 records
  • Test: 70 records

The Kohonen training algorithm used to cluster similar behavior depends on some parameters, such as:

  • Normalization type
  • Learning rate

It is important to consider that the Kohonen training algorithm is unsupervised. So, this algorithm is used when the output is not known. In the card example there are output values in the dataset and they will be used here only to attest clustering. But in traditional clustering cases, the output values are not available.

In this specific case, because output is known, as classification, the clustering quality may be attested by:

  • Sensibility (true positive rate)
  • Specificity (true negative rate)
  • Total accuracy

In Java projects, the calculations of these values are done through a class named NeuralOutputData, previously developed in Chapter 6, Classifying Disease Diagnosis.

It is good practice to do many experiments to try to find the best neural net to cluster customers' profiles. Ten different experiments will be generated and each will be analyzed with the quality rates mentioned previously. The following table summarizes the strategy that will be followed:

Experiment

Learning rate

Normalization type

#1

0.1

MIN_MAX

#2

Z_SCORE

#3

0.3

MIN_MAX

#4

Z_SCORE

#5

0.5

MIN_MAX

#6

Z_SCORE

#7

0.7

MIN_MAX

#8

Z_SCORE

#9

0.9

MIN_MAX

#10

Z_SCORE

The ClusterExamples class was created to run each experiment. In addition to processing data in Chapter 4, Self-Organizing Maps it was also explained how to create a Kohonen net and how to train it via the Euclidian distance algorithm.

The following piece of code shows a bit of its implementation:

// enter neural net parameter via keyboard (omitted)

// load dataset from external file (omitted)

// data normalization (omitted)

// create ANN and define parameters to TRAIN:
CompetitiveLearning cl = new CompetitiveLearning(kn1, neuralDataSetToTrain, LearningAlgorithm.LearningMode.ONLINE);
  cl.show2DData=false;
  cl.printTraining=false;
  cl.setLearningRate( typedLearningRate );
  cl.setMaxEpochs( typedEpochs );
  cl.setReferenceEpoch( 200 );
  cl.setTestingDataSet(neuralDataSetToTest);

// train ANN
try {
System.out.println("Training neural net... Please, wait...");
  cl.train();
  System.out.println("Winner neurons (clustering result [TRAIN]):");
  System.out.println( Arrays.toString( cl.getIndexWinnerNeuronTrain() ) );
  
} catch (NeuralException ne) {
  ne.printStackTrace();
}

After running each experiment using the ClusteringExamples class and saving the confusion matrix and total accuracy rates, it is possible to observe that experiments #4, #6, #8, and #10 have the same confusion matrix and accuracy. These experiments used z-score to normalize data:

Experiment

Confusion matrix

Total accuracy

#1

[[14.0, 21.0]

[18.0, 17.0]]

44.28%

#2

[[11.0, 24.0]

[34.0, 1.0]]

17.14%

#3

[[21.0, 14.0]

[17.0, 18.0]]

55.71%

#4

[[24.0, 11.0]

[1.0, 34.0]]

82.85%

#5

[[21.0, 14.0]

[17.0, 18.0]]

55.71%

#6

[[24.0, 11.0]

[1.0, 34.0]]

82.85%

#7

[[8.0, 27.0]

[7.0, 28.0]]

51.42%

#8

[[24.0, 11.0]

[1.0, 34.0]]

82.85%

#9

[[27.0, 8.0]

[28.0, 7.0]]

48.57%

#10

[[24.0, 11.0]

[1.0, 34.0]]

82.85%

So, neural nets built by experiments #4, #6, #8, or #10 may be used to reach accuracy more than 80% to cluster customers financially.

Product profiling

Using a transactional database provided with the code, we've compiled about 650 purchase transactions into a big matrix transactions x products, where in each cell there is the quantity of the corresponding product that has been bought on the corresponding transaction:

#Trns.

Prd.1

Prd.2

Prd.3

Prd.4

Prd.5

Prd.6

Prd.7

Prd.N

1

56

0

0

3

2

0

0

0

2

0

0

40

0

7

0

19

0

n

0

0

0

0

0

0

0

1

Let's consider that this matrix is a representation in an N-dimensional hyperspace taking each product as a dimension and the transactions as points. For simplicity, let's consider an example on three dimensions. A given transaction with the quantities bought for each product will be placed in a point corresponding to the quantities at each dimension.

Product profiling

The idea is to cluster these transactions in order to find which products are usually bought together. So, we are going to use a Kohonen neural network in order to find the positions of the products that the clusters centers will be located at.

Our database consists of a clothing store and a sample of 27 registered products:

1 Long Dress A

19 Overall with zipper

43 Bermuda M

3 Long Dress B

22 Shoulder overall

48 Stripped skirt

7 Short Dress A

23 Long stamped skirt

67 Camisole shoulder strap

8 Stamped Dress

24 Stamped short dress

68 Jeans M

9 Women Camisole

28 Pants M

69 XL Short dress

13 Pants S

31 Sleeveless short dress

74 Stripped camisole S

16 Overall for children

32 Short dress shoulder

75 Stripped camisole M

17 Shorts

34 Short dress B

76 Stripped camisole L

18 Stamped overall

42 Two blouse overall

106 Straight skirt

How many clusters?

Sometimes it may be difficult to choose how many clusters to find in a clustering algorithm. Some approaches to determine an optimal choice include information criteria such as Akaike Information Criteria (AIC), Bayesian Information Criteria (BIC), and the Mahalanobis distance from the center to the data. We suggest to the reader to check the references if interested in further details on these criteria.

To make tests to product example, we also should use the ClusteringExamples class. For simplicity, we run tests with three and five clusters. For each experiment, the number of epochs was 1000, the learning rate was 0.5, and the normalization type was MIN_MAX (-1; 1). Some results are shown in the following table:

Number of clusters

Clusters of the first 15 elements

Sum of products bought

3

0, 1, 2, 2, 2,

2, 2, 2, 2, 2,

2, 2, 0, 0, 2,

973, 585, 11, 5, 2,

4, 11, 6, 3, 2,

2, 2, 669, 672, 7,

5

0, 1, 4, 4, 4,

4, 4, 4, 4, 4,

4, 4, 0, 0, 4,

973, 585, 11, 5, 2,

4, 11, 6, 3, 2,

2, 2, 669, 672, 7,

Observing the preceding table, we note when the sum of products acquired is more than 600, then it's clustered together. Otherwise, when the sum is in the range of 500 to 599, another cluster is formed. Lastly, if the sum is low, a large cluster is created, because the dataset is compound by many cases that customers doesn't by more than 20 items.

Tip

As recommend in the previous chapter, we suggest you explore the ClusteringExamples class and create a GUI to easily select the neural net parameters. You should try to reuse code through the inheritance concept.

Another tip is to further explore the product profiling example: varying the neural network training parameters, the number of clusters, and/or develop others ways of analyzing the clustering result.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset