Using voting

Voting can be utilized in order to combine different clusterings of the same dataset. It is similar to voting for supervised learning, as each model (base learner) contributes to the final result with a vote. Here arises a problem of linking two clusters originating from two different clusterings. As each model will produce different clusters with different centers, we have to link similar clusters originating from different models. This is accomplished by linking together clusters that share the greatest number of instances. For example, assume that the following table and figure clusterings have occurred for a particular dataset:

Three distinct clustering results

The following table depicts each instance's cluster assignments for the three different clusterings.

Instance

1

2

3

4

5

6

7

8

9

10

Clustering 1

0

0

2

2

2

0

0

1

0

2

Clustering 2

1

1

2

2

2

1

0

1

1

2

Clustering 3

0

0

2

2

2

1

0

1

1

2

Cluster membership of each instance

Using the preceding mapping, we can calculate the co-association matrix for each instance. This matrix indicates how many times a pair of instances has been assigned to the same cluster:

Instances

1

2

3

4

5

6

7

8

9

10

1

3

3

0

0

0

2

2

1

2

0

2

3

3

0

0

0

2

2

1

2

0

3

0

0

3

3

3

0

0

0

0

3

4

0

0

3

3

3

0

0

0

0

3

5

0

0

3

3

3

0

0

0

0

3

6

2

2

0

0

0

3

1

0

3

0

7

2

2

0

0

0

1

3

0

1

0

8

1

1

0

0

0

0

0

3

2

0

9

2

2

0

0

0

3

1

2

3

0

10

0

0

3

3

3

0

0

0

0

3

Co-association matrix for the previous example

By dividing each element with the number of base learners in the ensemble, and clustering together samples that have a value greater than 0.5, we get the following cluster assignments:

Instance

1

2

3

4

5

6

7

8

9

10

Voting clustering

0

0

1

1

1

0

0

0

0

1

The voting cluster memberships

As it is evident, the clustering is more stable. Furthermore, it is apparent that two clusters are sufficient for this dataset. By plotting the data and their cluster membership, we can see that there are two distinct groups, which is exactly what the voting ensemble was able to model, although each base learner generated three distinct cluster centers:

Final cluster memberships for the voting ensemble
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset