Applying limitations to charts

While outliers can reveal all kinds of useful intelligence such as issues in data capture or associated process patterns, they can cause problems when you are building data visualizations. The most common issue is to do with scale, as you can see in the following example.

Getting ready

For this recipe, we will make use on inline data load which gives us the information on the number of call made for each month. Add the following code into the data load editor and reload the Qlik Sense application:

Data:
LOAD * INLINE [
    Month, Date, Calls
    Jan, 27/01/15, 25
    Jan, 28/01/15, 27
    Jan, 29/01/15, 25
    Jan, 30/01/15, 600
    Jan, 31/01/15, 22
    Feb, 01/02/15, 20
    Feb, 02/02/15, 19
    Feb, 03/02/15, 21
    Feb, 04/02/15, 1
    Feb, 05/02/15, 600
];
.

How to do it…

  1. Add a Bar chart object onto the content page.
  2. Add Month as a dimension.
  3. Add Avg(Calls) as a measure. Label it Average Calls.
  4. Click on Done. Notice that the values in both months are just below 150:
    How to do it…
  5. Next, go back into the Edit mode and replace the measure with the following code:
    Avg(If (Calls > Aggr(NODISTINCT Fractile(Calls, 0.1), Month) and Calls < Aggr(NODISTINCT Fractile(Calls, 0.9), Month),Calls))
  6. Click on Done.
  7. The chart should now resemble the following image. It not only has both bars significantly reduced down to below 30, but there is also a much bigger gap between January and February's average call volumes:
    How to do it…

How it works…

If we look at the source data we loaded at the beginning of the recipe it is clear that there are some outliers present. To exclude these and get a real picture of the normal, average amount of calls, we remove the top and bottom 10 percent of the value. This is done using the fractile function. The fractile function calculates the cut-off point for 10 and 90 percent based on our data. The Aggr function is needed because fractile is an aggregation function being nested inside another aggregation.

There's more…

Another method of handling outliers is not to exclude them from the expression, but hide them from what is visualized. For example, if a data point far exceeds the norm, you can set the axis limit to the second largest value; this focuses the visualization on the points that are closely related. You can do this by going to the object properties:

  • Under Appearance, click on Y-axis.
  • Switch off Auto Range and set the Max value by using an expression such as the following:
    =Max(aggr(avg(Calls),Date),2)

Here, we simply work out what the second largest number is and set that as the axis limit. This way we can produce an all inclusive line graph by date, albeit one data point will be off the screen.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset