Gaining insights

In order to gain further insights into our dataset's structure and relationships, we will use the t-SNE approach, with ensembles of size 20 and base k-Nearest Neighbors (k-NN) clusterers with a K value of 10. First, we create and train the cluster. Then, we add the cluster assignments to the DataFrame as an additional pandas column. We then calculate the means for each cluster and create a bar plot for each feature:

# DATA LOADING SECTION START #

# Use the 2017 data and fill any NaNs
recents = data[data.Year == 2017]
recents = recents.dropna(axis=1, how="all")
recents = recents.fillna(recents.median())

# Use only these specific features
columns = ['Log GDP per capita',
'Social support', 'Healthy life expectancy at birth',
'Freedom to make life choices', 'Generosity',
'Perceptions of corruption','Positive affect', 'Negative affect',
'Confidence in national government', 'Democratic Quality',
'Delivery Quality']

# Transform the data with TSNE
tsne = t_sne.TSNE()
transformed = pd.DataFrame(tsne.fit_transform(recents[columns]))
# Create the data object
cluster_data = oe.data(transformed, [0, 1])

# DATA LOADING SECTION END #

# Create the ensemble
ensemble = oe.cluster(cluster_data)
for i in range(20):
name = f'kmeans_{i}-tsne'
ensemble.cluster('parent', 'kmeans', name, 10)

# Create the cluster labels
preds = ensemble.finish_co_occ_linkage(threshold=0.5)

# Add Life Ladder to columns
columns = ['Life Ladder', 'Log GDP per capita',
'Social support', 'Healthy life expectancy at birth',
'Freedom to make life choices', 'Generosity',
'Perceptions of corruption','Positive affect', 'Negative affect',
'Confidence in national government', 'Democratic Quality',
'Delivery Quality']
# Add the cluster to the dataframe and group by the cluster
recents['Cluster'] = preds.labels['co_occ_linkage']
grouped = recents.groupby('Cluster')
# Get the means
means = grouped.mean()[columns]
# Create barplots
def create_bar(col, nc, nr, index):
plt.subplot(nc, nr, index)
values = means.sort_values('Life Ladder')[col]
mn = min(values) * 0.98
mx = max(values) * 1.02
values.plot(kind='bar', ylim=[mn, mx])
plt.title(col[:18])

# Plot for each feature
plt.figure(1)
i = 1
for col in columns:
create_bar(col, 4, 3, i)
i += 1
plt.show()

The bar plots are depicted in the following diagram. The clusters are sorted according to their average Life Ladder value, in order to easily make comparisons between the individual features. As we can see, clusters 3, 2, and 4 have comparable average happiness (Life Ladder). The same can be said for clusters 6, 8, 9, 7, and 5. We could argue that the ensemble only needs 5 clusters, but, by closely examining the other features, we see that this is not the case:

Bar plots of cluster means for each feature

By looking at Healthy life expectancy and Freedom to make life choices, we see that clusters 3 and 4 are considerably better than 2. In fact, if we examine every other feature, we see that clusters 3 and 4 are, on average, more fortunate  than cluster 2. Maybe it is interesting to see how the individual countries are distributed among each cluster. The following table depicts the cluster assignments. Indeed, we see that clusters 2, 3, and 4 involve countries that have had to recently overcome difficulties that were not captured in our features. In fact, these are some of the most war-torn areas of the world. From a sociological point of view, it is extremely interesting that these war-torn and troubled regions seem to have the most confidence in their governments, despite exhibiting extremely negative democratic and delivery qualities:

N

Countries

1

Cambodia, Egypt, Indonesia, Libya, Mongolia, Nepal, Philippines, and Turkmenistan

2

Afghanistan, Burkina Faso, Cameroon, Central African Republic, Chad, Congo (Kinshasa), Guinea, Ivory Coast, Lesotho, Mali, Mozambique, Niger, Nigeria, Sierra Leone, and South Sudan

3

Benin, Gambia, Ghana, Haiti, Liberia, Malawi, Mauritania, Namibia, South Africa, Tanzania, Togo, Uganda, Yemen, Zambia, and Zimbabwe

4

Botswana, Congo (Brazzaville), Ethiopia, Gabon, India, Iraq, Kenya, Laos, Madagascar, Myanmar, Pakistan, Rwanda, and Senegal

5

Albania, Argentina, Bahrain, Chile, China, Croatia, Czech Republic, Estonia, Montenegro, Panama, Poland, Slovakia, United States, and Uruguay

6

Algeria, Azerbaijan, Belarus, Brazil, Dominican Republic, El Salvador, Iran, Lebanon, Morocco, Palestinian Territories, Paraguay, Saudi Arabia, Turkey, and Venezuela

7

Bulgaria, Hungary, Kuwait, Latvia, Lithuania, Mauritius, Romania, Taiwan Province of China

8

Armenia, Bosnia and Herzegovina, Colombia, Ecuador, Honduras, Jamaica, Jordan, Macedonia, Mexico, Nicaragua, Peru, Serbia, Sri Lanka, Thailand, Tunisia, United Arab Emirates, and Vietnam

9

Bangladesh, Bolivia, Georgia, Guatemala, Kazakhstan, Kosovo, Kyrgyzstan, Moldova, Russia, Tajikistan, Trinidad and Tobago, Ukraine, and Uzbekistan

10

Australia, Austria, Belgium, Canada, Costa Rica, Cyprus, Denmark, Finland, France, Germany, Greece, Hong Kong S.A.R. of China, Iceland, Ireland, Israel, Italy, Japan, Luxembourg, Malta, Netherlands, New Zealand, Norway, Portugal, Singapore, Slovenia, South Korea, Spain, Sweden, Switzerland, and United Kingdom

Cluster assignments

Starting with to cluster 1, we see that the happiness of people in these countries is considerably better than the previous clusters. This can be attributed to a better life expectancy (less wars), better GDP per capita, social support, generosity, and freedom to make choices regarding life changes. Still, these countries are not as happy as they could be, mainly due to problems with democratic quality and delivery quality. Nonetheless, their confidence in their governments are second only to the previous group of clusters we discussed. Clusters 6, 8, and 9 are more or less on the same level of happiness. Their differences are in GDP per capita, life expectancy, freedom, generosity, and confidence. We can see that cluster 6 has, on average, stronger economies and life expectancy, although people's freedom, generosity, and the government's efficiency seem to be lacking. Clusters 8 and 9 are less economically sound, but seem to have a lot more freedom and better functioning governments. Moreover, their generosity, on average, is greater than cluster 6. Moving on to clusters 7 and 5, we see that they, too, are close in terms of happiness. These are countries where we see a positive democratic and delivery quality, with sufficient freedom, economic strength, social support, and a healthy life expectancy. These are developed countries, where people, on average, live a prosperous life without fear of dying from economic, political, or military causes. The problems in these countries are mainly the perception of corruption, people's confidence in their governments, and the efficiency of the governments. Finally, cluster 10 contains countries that are better in almost every aspect, compared to the rest of the world. These countries have, on average, the highest GDP per capita, life expectancy, generosity, and freedom, while having sufficiently high confidence in their national governments and low perceptions of corruption. These could be considered the ideal countries to live in, given a compatible cultural background.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset