Co-occurrence matrix linkage

Co-occurrence matrix linkage treats the co-occurrence matrix as a distance matrix between instances, and utilizes the distances in order to perform hierarchical clustering. The clustering stops when there is no element on the matrix with a value greater than the threshold. Again, we repeat the example. We use the finish_co_occ_linkage function to utilize co-occurrence matrix linkage with threshold=0.5, and use the 'co_occ_linkage' key to access the results:

# --- SECTION 1 ---
# Libraries and data loading
import openensembles as oe
import pandas as pd
import sklearn.metrics

from sklearn.datasets import load_breast_cancer

bc = load_breast_cancer()

# --- SECTION 2 ---
# Create the data object
cluster_data = oe.data(pd.DataFrame(bc.data), bc.feature_names)

# --- SECTION 3 ---
# Create the ensembles and calculate the homogeneity score
for K in [2, 3, 4, 5, 6, 7]:
 for ensemble_size in [3, 4, 5]:
  ensemble = oe.cluster(cluster_data)
  for i in range(ensemble_size):
  name = f'kmeans_{ensemble_size}_{i}'
  ensemble.cluster('parent', 'kmeans', name, K)
  preds = ensemble.finish_co_occ_linkage(threshold=0.5)
  print(f'K: {K}, size {ensemble_size}:', end=' ')
  print('%.2f' % sklearn.metrics.homogeneity_score(
        bc.target, preds.labels['co_occ_linkage']))

The following table summarizes the results. Notice that it outperforms the other two methods. Furthermore, the results are more stable, and less time is required to execute it than either of the other two methods:

K	Size	Homogeneity
2	3	0.42
2	4	0.42
2	5	0.42
3	3	0.47
3	4	0.47
3	5	0.45
4	3	0.58
4	4	0.58
4	5	0.58
5	3	0.6
5	4	0.6
5	5	0.6
6	3	0.59
6	4	0.62
6	5	0.62
7	3	0.62
7	4	0.63
7	5	0.63

Homogeneity results for co-occurrence cluster linkage on the raw breast cancer dataset

Table of Contents for Co-occurrence matrix linkage

Create new playlist

Sign In

Sign Up

Table of Contents for
Co-occurrence matrix linkage