Co-occurrence matrix linkage

Co-occurrence matrix linkage treats the co-occurrence matrix as a distance matrix between instances, and utilizes the distances in order to perform hierarchical clustering. The clustering stops when there is no element on the matrix with a value greater than the threshold. Again, we repeat the example. We use the finish_co_occ_linkage function to utilize co-occurrence matrix linkage with threshold=0.5, and use the 'co_occ_linkage' key to access the results:

# --- SECTION 1 ---
# Libraries and data loading
import openensembles as oe
import pandas as pd
import sklearn.metrics

from sklearn.datasets import load_breast_cancer

bc = load_breast_cancer()

# --- SECTION 2 ---
# Create the data object
cluster_data = oe.data(pd.DataFrame(bc.data), bc.feature_names)

# --- SECTION 3 ---
# Create the ensembles and calculate the homogeneity score
for K in [2, 3, 4, 5, 6, 7]:
for ensemble_size in [3, 4, 5]:
ensemble = oe.cluster(cluster_data)
for i in range(ensemble_size):
name = f'kmeans_{ensemble_size}_{i}'
ensemble.cluster('parent', 'kmeans', name, K)
preds = ensemble.finish_co_occ_linkage(threshold=0.5)
print(f'K: {K}, size {ensemble_size}:', end=' ')
print('%.2f' % sklearn.metrics.homogeneity_score(
bc.target, preds.labels['co_occ_linkage']))

The following table summarizes the results. Notice that it outperforms the other two methods. Furthermore, the results are more stable, and less time is required to execute it than either of the other two methods:

K

Size

Homogeneity

2

3

0.42

2

4

0.42

2

5

0.42

3

3

0.47

3

4

0.47

3

5

0.45

4

3

0.58

4

4

0.58

4

5

0.58

5

3

0.6

5

4

0.6

5

5

0.6

6

3

0.59

6

4

0.62

6

5

0.62

7

3

0.62

7

4

0.63

7

5

0.63

Homogeneity results for co-occurrence cluster linkage on the raw breast cancer dataset
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset