Some best practices for visualization

The first important step one can take to make a great visualization is to know what is the goal behind the effort. How does one know if the visualization has a purpose? It is also very important to know who the audience is and how this will help them.

Once the answers to these questions are known, and the purpose of visualization is well understood, the next challenge is to choose the right method to present it. The most commonly-used types of visualization could further be categorized according to the following:

  • Comparison and ranking
  • Correlation
  • Distribution
  • Location-specific or geodata
  • Part-to-whole relationships
  • Trends over time

Comparison and ranking

Comparing and ranking can be done in more than one way, but the traditional way is by using bar charts. A bar chart is believed to encode quantitative values as length on the same baseline. However, it is not always the best way to display comparison and rankings. For instance, to display the top 12 countries in Africa by GDP, the following presentation is a creative way to visualize (courtesy: Stats Legend, Andrew Gelman and Antony Unwin):

Comparison and ranking

Correlation

A simple correlation analysis is a great place to start for identifying the relationships between measures, although correlation doesn't guarantee a relationship. To confirm that the relationship truly exists, a statistical methodology is often required. The following is an example to build a simple scatter plot to detect the correlations between two factors, say gpa and tv or gpa and exercise among the students from a university:

Correlation

However, we can use other ways in order to display the correlation matrix. For instance, one can use scatter plots, heat maps, or some specific example to show the influence network amongst stocks in the S&P 100. (The following two plots are taken from Statistical Tools for High Throughput Analysis at http://www.sthda.com.) To emphasize further, a correlation matrix involves data in a matrix form. The data is correlated by using a scaled color map, as shown in the following examples. For more details, we suggest you to refer to the site, http://www.sthda.com.

Correlation

A correlation matrix is used to investigate the dependence between multiple variables at the same time. The result is a table containing the correlation coefficients between each variable and the others. Heat maps originated in 2D display of the values in a data matrix. There are many different color schemes that can be used to illustrate the heat map, with perceptual advantages and disadvantages for each.

Correlation

Distribution

A distribution analysis shows how the quantitative values are distributed across their range, and is therefore, extremely useful in data analysis. For example, compare the grade distribution of homework the midterm, the final exam, and the total course grade of a class of students. In this example, we will discuss two of the most commonly used chart types for this purpose. One is a histogram (as shown in the following image), and the other is a box plot or box-and-whisker plot.

Distribution

The shape of a histogram depends strongly on the specified bin size and locations. The box-and-whisker plots are excellent for displaying multiple distributions. They pack all the data points—in this case, grades per student—into a box-and-whisker display. Now you can easily identify the low values, the 25th-percentile values, the medians, the 75th-percentiles, and the maximum values across all categories—all at the same time.

Distribution

One of the many ways to conveniently plot these in Python is by using Plotly, which is an online analytics and visualization tool. Plotly provides online graphing, analytics, and statistics tools as well as scientific plotting libraries for Python, R, Julia, and JavaScript. For examples of histograms and box-and-whisker plots, refer to https://plot.ly/python/histograms-and-box-plots-tutorial.

Location-specific or geodata

Maps are the best way to display data that is location-specific. Maps are best used when paired with another chart that details what the map is displaying (such as a bar chart sorted from greatest to least, line chart showing the trends, and so on). For example, the following map shows the intensity of an earthquake compared across continents:

Location-specific or geodata

Part-to-whole relationships

Pie charts are known to be common for displaying part-to-whole relationships, but there are other ways to do it. Grouped bar charts are good for comparing each element in the categories with the others, and for comparing elements across categories. However, grouping makes it harder to tell the difference between the total of each group. This is where the stacked column charts come in.

Part-to-whole relationships

The stacked column charts are great for showing the total because they visually aggregate all the categories in a group. The downside is that it becomes harder to compare the sizes of the individual categories. Stacking also indicates a part-to-whole relationship.

Trends over time

One of the most frequently used visualization methods to analyze data is to display a trend over a period of time. In the following example, the investment in wearables startups from 2009-2015 has been plotted. It shows that the investment in wearables has been on the rise for a few years; activity shot through the roof in 2014, with 61 completed deals totaling $427 million, when compared to 43 deals worth only $166 million in 2013 (just a year earlier).

Trends over time

With this observation, it will be interesting to see how the marketplace evolves over the coming years.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset