When data varies by orders of magnitude, transforming the data with logarithms is an obvious strategy. In my experience, it is less common to do the opposite transformation using an exponential function. Usually when exploring, we visualize a log-log or semi-log scatter plot of paired variables.
To demonstrate this transformation, we will use the Worldbank data for infant mortality rate per 1000 livebirths and Gross Domestic Product (GDP) per capita for the available countries. If we apply the logarithm of base 10 to both variables, the slope of the line we get by fitting the data has a useful property. A one percent increase in one variable corresponds to a percentage change given by the slope of the other variable.
Transform the data using logarithms with the following procedure:
import dautil as dl import matplotlib.pyplot as plt import numpy as np from IPython.display import HTML
wb = dl.data.Worldbank() countries = wb.get_countries()[['name', 'iso2c']] inf_mort = wb.get_name('inf_mort') gdp_pcap = wb.get_name('gdp_pcap') df = wb.download(country=countries['iso2c'], indicator=[inf_mort, gdp_pcap], start=2010, end=2010).dropna()
loglog = df.applymap(np.log10) x = loglog[gdp_pcap] y = loglog[inf_mort]
sp = dl.plotting.Subplotter(2, 1, context) xvar = 'GDP per capita' sp.label(xlabel_params=xvar) sp.ax.set_ylim([0, 200]) sp.ax.scatter(df[gdp_pcap], df[inf_mort]) sp.next_ax() sp.ax.scatter(x, y, label='Transformed') dl.plotting.plot_polyfit(sp.ax, x, y) sp.label(xlabel_params=xvar) plt.tight_layout() HTML(dl.report.HTMLBuilder().watermark())
Refer to the following screenshot for the end result (refer to the transforming_down.ipynb
file in this book's code bundle):