Chapter 4. Introducing calculations in DAX

Now that you have seen the basics of the SQL Server Analysis Services (SSAS) tabular model, it is time to learn the fundamentals of Data Analysis Expressions (DAX). DAX has its own syntax for defining calculation expressions. It is somewhat similar to a Microsoft Excel expression, but it has specific functions that enable you to create more advanced calculations on data that is stored in multiple tables.

The goal of this chapter is to provide an overview of the main concepts of DAX without pretending to explain in detail all the implications of every feature and function in this language. If you want to learn DAX, we suggest reading our book, The Definitive Guide to DAX, published by Microsoft Press.


Image What’s new in SSAS 2016

DAX has a new syntax for variables, and table expressions can be used in calculated tables. You will also find an updated reference of tools to edit and format DAX expressions.


Introduction to the DAX language

DAX is a functional language, specifically designed to compute business formulas over a data model. The tabular model contains tables, columns, and relationships. A DAX expression can leverage an existing relationship without having to define it explicitly, as you have to do in a SQL query. Moreover, a DAX expression is evaluated in a particular context, where there are filters automatically applied to the tables in the data model. For this reason, an aggregation does not necessarily include all the rows of a table, but only those that appear active in the current evaluation context.

You can write a DAX expression to define the following entities:

Image Measure The DAX expression is evaluated in a filter context, where rows of one or more tables in the data model are aggregated and filtered according to implicit filters defined in a filter context. For example, a cell in a PivotTable in Excel displays the value of a measure, computed in an implicit filter context that is defined by the selection of slices, filters, rows, and columns of the PivotTable itself. The following expression is valid in a measure:

SUM ( Sales[Line Amount] )

Image Calculated column The DAX expression is evaluated for each row in a table and the DAX syntax can implicitly reference the value of each column of the table. The result of a calculated column is persisted in the data model and is automatically refreshed every time there is a refresh operation on any table of the data model. The following expression is valid in a calculated column:

Sales[Quantity] * Sales[Net Price]

Image Calculated table The DAX expression returns a table that is persisted in the data model and it is automatically refreshed every time there is a refresh operation on any table in the data model. The following example is an expression for a calculated table:

ADDCOLUMNS (
    ALL ( Product[Manufacturer] ),
    "Quantity", CALCULATE ( SUM ( Sales[Line Amount] ) )
)

Image Query The DAX expression returns a table that is materialized in the result of the query itself. The following example is a DAX query:

EVALUATE
SUMMARIZECOLUMNS (
    Product[Manufacturer],
    'Order Date' [Order Year Number],
    "Sales", SUM ( Sales[Line Amount] )
)

A DAX expression for a measure or calculated column must return a scalar value such as a number or a string. In contrast, a DAX expression for a calculated table or query must return a table (an entity with one or more columns and zero or more rows). You will see more examples of these entities later in this chapter.

To write DAX expressions, you need to learn the following basic concepts of DAX:

Image The syntax

Image The different data types that DAX can handle

Image The basic operators

Image How to refer to columns and tables

These and other core DAX concepts are discussed in the next few sections.

DAX syntax

You use DAX to compute values using columns of tables. You can aggregate, calculate, and search for numbers, but in the end, all the calculations involve tables and columns. Thus, the first syntax to learn is how to reference a column in a table.

The general format of a column reference is to write the table name enclosed in single quotes, followed by the column name enclosed in square brackets, such as the following example:

'Sales'[Quantity]

You can omit the single quotes if the table name does not start with a number, does not contain spaces, and is not a reserved word (like Date or Sum).


Image Note

It is common practice to not use spaces in table names. This way, you avoid the quotes in the formulas, which tend to make the code harder to read. Keep in mind, however, that the name of the table is the same name that you will see when browsing the model with PivotTables or any other client tool, such as Power View. Thus, if you like to have spaces in the table names in your report, you need to use single quotes in your code.


You can also avoid writing the table name at all, in case you are referencing a column or a measure in the same table where you are defining the formula. Thus, [SalesQuantity] is a valid column reference if written in a calculated column or in a measure of the FactSalesSmall table. Even if this technique is syntactically correct (and the user interface might suggest its use when you select a column instead of writing it), we strongly discourage you from using it. Such a syntax makes the code rather difficult to read, so it is better to always use the table name when you reference a column in a DAX expression.

DAX data types

DAX can perform computations with different numeric types, of which there are seven. In the list that follows, we show both the DAX name that is visible in the user interface of Visual Studio and the name of the same data type that is used internally in the Tabular Model Scripting Language (TMSL). The latter is indicated between parentheses. Certain data types are named differently than the standard terminology used in database jargon. Boolean values, for example, are called TRUE/FALSE in DAX terminology. We prefer to adhere to the internal definition that is close to the de-facto naming standard, and we refer to them as Boolean values. Wherever possible, we will also provide the corresponding data type used by SQL Server (Transact-SQL, or T-SQL), which could be helpful to understand the similarities and the differences between the two worlds.

The following list explains the different DAX data types:

Image Whole Number (int64) An integer number.

Image Decimal Number (double) A floating point number.

Image Currency (decimal) A fixed decimal number, with four decimal digits of fixed precision. It is internally stored as an integer. This is also known as a Fixed Decimal Number in the user interface of Power BI.

Image Date (dateTime) A date and time value.

Image TRUE/FALSE (boolean) A logical value.

Image Text (string) A string.

Image Binary (binary) A binary large object, also known as BLOB; it is usually used to store pictures and documents.

DAX has a powerful type-handling system so that you do not have to worry about data types. When you write a DAX expression, the resulting type is based on the type of terms used in the expression. You need to be aware of this in case the type returned from a DAX expression is not the expected one. Then you must investigate the data type of the terms used in the expression itself.

For example, if one of the terms of a sum is a date, then the result is a date too. In contrast, if the same operator is used with integers, the result is an integer. This is known as operator overloading. You can see an example of this in the following expression, which returns a date, adding seven days to the original value in the Order Date column. The result is, as we mentioned, a date.

'Dates'[Date] + 7

In addition to operator overloading, DAX automatically converts strings into numbers and numbers into strings whenever required by the operator. For example, if you use the & operator, which concatenates strings, DAX converts its arguments into strings. Look at the following formula:

5 & 4

It returns “54” as a string. On the other hand, observe the following formula:

"5" + "4"

It returns an integer result with the value of 9.

The resulting value depends on the operator and not on the source columns, which are converted following the requirements of the operator. Even if this behavior looks convenient, errors might happen during these automatic conversions. We suggest avoiding automatic conversions. If some kind of conversion needs to happen, then it is much better if you take control over it and make the conversion explicit. To be more explicit, the previous example should instead be written as follows:

VALUE ( "5" ) + VALUE ( "4" )

DAX data types might be familiar to people who are used to working with Excel or other languages. You can find specifications of DAX data types at https://msdn.microsoft.com/en-us/library/gg492146.aspx. However, it is useful to share a few considerations about each of these data types.

Whole Number (int64)

DAX has only one integer data type that can store a 64-bit value. All the internal calculations between the integer values in DAX also use a 64-bit value. This data type stores the corresponding T-SQL data types bigint, int, smallint, and tinyint.

Decimal Number (double)

A decimal number is always stored as a double-precision, floating point value. Do not confuse this DAX data type with the decimal and numeric data type of T-SQL. The corresponding data types of a DAX decimal number in T-SQL are float and real. However, consider that any decimal and numeric data type in SQL Server is converted to a decimal number data type in DAX when you import a table from SQL Server. You might want to consider the conversion to a currency data type in DAX of these columns whenever the precision required by T-SQL is available in such a data type in DAX.

Currency (decimal)

The currency data type in DAX stores a fixed decimal number. It can represent four decimal points, and it is internally stored as a 64-bit integer value, divided by 10,000. All calculations performed between currency data types always ignore decimals beyond the fourth decimal point. If you need more accuracy, you must do a conversion to the decimal data type in DAX.

The default format of the currency data type includes the currency symbol. You can also apply the currency formatting to whole and decimal numbers, and you can use a format without the currency symbol for a currency data type.

The currency data type in DAX stores the corresponding T-SQL data types: money and smallmoney.

Date (dateTime)

DAX stores dates in a date data type. This format uses a floating-point number internally, where the integer corresponds to the number of days since December 30, 1899, and the decimal part identifies the fraction of the day. Hours, minutes, and seconds are converted to the decimal fractions of a day. Thus, the following expression returns the current date plus one day (exactly 24 hours):

NOW () + 1

Its result is the date of tomorrow, at the same time of the evaluation. If you need only the date and not the time, use TRUNC to get rid of the decimal part. In the user interface of Power BI, you can see three different data types: Date/Time, Date, and Time. All these data types correspond to the date data type in DAX. To avoid confusion, we prefer to reference to this data type as dateTime, which is the name of the data type in TMSL. However, date and dateTime are the same concept when referring to the data type in a tabular model.

The date data type in DAX stores the corresponding T-SQL data types: date, datetime, datetime2, smalldatetime, and time. However, the range of values stored by the DAX data type does not correspond to the range of dates supported in T-SQL because DAX supports only dates between 1900 and 9999, and precision of time is of 3.33 ms.

TRUE/FALSE (boolean)

The boolean data type is used to express logical conditions. For example, a calculated column, defined by the following expression, is of the boolean type:

Sales[Unit Price] > Sales[Unit Cost]

You can also see boolean data types as numbers, where TRUE equals 1 and FALSE equals 0. This might be useful for sorting purposes, because TRUE > FALSE. The boolean data type in DAX stores the corresponding bit data type in T-SQL.

Text (string)

Every string in DAX is stored as a Unicode string, where each character is stored in 16 bits. The comparison between strings follows the collation setting of the database, which by default is case-insensitive. (For example, the two strings “Power Pivot” and “POWER PIVOT” are considered equal.) You can modify the collation in a database property Collation, which must be set before deploying the database to the server.

The text data type in DAX stores the corresponding T-SQL data types: char, varchar, text, nchar, nvarchar, and ntext.

Binary (binary)

The binary data type is used in the data model to store images, and it is not accessible in DAX. It is mainly used by Power BI or other client tools to show pictures stored directly in the data model.

The binary data type in DAX stores the corresponding T-SQL data types: binary, varbinary, and image.

DAX operators

Having seen the importance of operators in determining the type of an expression, you can now see a list of the operators that are available in DAX, as shown in Table 4-1.

Image

Table 4-1 Operators

Moreover, the logical operators are also available as DAX functions, with syntax that is very similar to Excel. For example, you can write the following lines:

AND ( [CountryRegion] = "USA", [Quantity] > 0 )
OR ( [CountryRegion] = "USA", [Quantity] > 0 )

Those expressions are equivalent, respectively, to the following lines:

[CountryRegion] = "USA" && [Quantity] > 0
[CountryRegion] = "USA" || [Quantity] > 0

The use of functions instead of operators for Boolean logic becomes very beneficial when you have to write complex conditions. In fact, when it comes to formatting large sections of code, functions are much easier to format and read than operators. However, a major drawback of functions is that you can only pass in two parameters at a time. This requires you to nest functions if you have more than two conditions to evaluate.

Column reference and measures reference

A table in the data model includes columns and measures, which are all accessible with the following syntax:

'TableName'[ColumnOrMeasureName]

The table name can be omitted if the expression is written in the same context as the table that includes the referenced column or measure. However, it is very important to follow these simple guidelines:

Image Always include the table name for a column reference For example:

'TableName'[ColumnName]

Image Always omit the table name for a measure reference For example:

[MeasureName]

There are many reasons for these guidelines, mainly related to readability and maintainability. A column name is always unique in a table, but you can have the same column name in different tables. A measure name is unique for the entire data model, and it cannot be the same as any other column or measure that is defined in any table of the data model. For this reason, the guideline produces an unambiguous definition in any context. Last, but not least, a measure reference implies a context transition (explained later in this chapter in the “Context transition” section), which has an important impact in the execution of the calculation. It is important to not confuse column references and measure references because they have a different calculation semantic.

Aggregate functions

Almost every data model needs to operate on aggregated data. DAX offers a set of functions that aggregate the values of a column or an expression in a table and then return a single value (also known as a scalar value). We call this group of functions aggregate functions. For example, the following measure calculates the sum of all the numbers in the SalesAmount column of the Sales table:

SUM ( Sales[SalesAmount] )

However, SUM is just a shortcut for the more generic expression called SUMX, which has two arguments: the table to scan and a DAX expression to evaluate for every row of the table, summing up the results obtained for all the rows considered in the evaluation context. Write the following corresponding syntax when using SUMX:

SUMX ( Sales, Sales[SalesAmount] )

Usually, the version with the X suffix is useful when you compute longer expressions row by row. For example, the following expression multiplies quantity and unit price row by row, summing up the results obtained:

SUMX ( Sales, Sales[Quantity] * Sales[Unit Price] )

Aggregation functions are SUMX, AVERAGEX, MINX, MAXX, PRODUCTX, GEOMEANX, COUNTX, COUNTAX, STDEVX, VARX, MEDIANX, PERCENTILEX.EXC, and PERCENTILEX.INC. You can also use the corresponding shorter version without the X suffix whenever the expression in the second argument is made by only one column reference.

The first argument of SUMX (and other aggregate functions) is a table expression. The simplest table expression is the name of a table, but you can replace it with a table function, as described in the next section.

Table functions

Many DAX functions require a table expression as an argument. You can also use table expressions in calculated tables and in DAX queries, as you will see later in this chapter. The simplest table expression is a table reference, as shown in the following example:

Sales

A table expression may include a table function. For example, FILTER reads rows from a table expression and returns a table that has only the rows that satisfy the logical condition described in the second argument. The following DAX expression returns the rows in Sales that have a value in the Unit Cost column that is greater than or equal to 10:

FILTER ( Sales, Sales[Unit Cost] >= 10 )

You can combine table expressions in the scalar expression, which is a common practice when writing measures and calculated columns. For example, the following expression sums up the product of quantity and unit price for all the columns in the Sales table with a unit cost greater than or equal to 10:

SUMX (
    FILTER ( Sales, Sales[Unit Cost] >= 10 ),
    Sales[Quantity] * Sales[Unit Price]
)

There are complex table functions in DAX that you can use to manipulate the rows and columns of the table you want as a result. For example, you can use ADDCOLUMNS and SELECTCOLUMNS to manipulate the projection, whereas SUMMARIZE and GROUPBY can join the tables and group rows by using the relationships in the data model and column specified in the function.

DAX also includes functions to manipulate sets (UNION, INTERSECT, and EXCEPT), to manipulate tables (CROSSJOIN and GENERATE), and to perform other specialized actions (such as TOPN).

An important consideration is that the most efficient way to apply filters in a calculation is usually by leveraging the CALCULATE and CALCULATETABLE functions. These transform the filter context before evaluating a measure. This reduces the volume of materialization (the intermediate temporary tables) that are required for completing the calculation.

Evaluation context

Any DAX expression is evaluated inside a context. The context is the environment under which the formula is evaluated. The evaluation context of a DAX expression has the following two distinct components:

Image Filter context This is a set of filters that identifies the rows that are active in the table of the data model.

Image Row context This is a single row that is active in a table for evaluating column references.

These concepts are discussed in the next two sections.

Filter context

Consider the following simple formula for a measure called Total Sales:

SUMX ( Sales, Sales[Quantity] * Sales[UnitPrice] )

This formula computes the sum of a quantity multiplied by the price for every row of the Sales table. If you display this measure in a PivotTable in Excel, you will see a different number for every cell, as shown in Figure 4-1.

Image

Figure 4-1 Displaying the Total Sales measure in a PivotTable.

Because the product color is on the rows, each row in the PivotTable can see, out of the whole database, only the subset of products of that specific color. The same thing happens for the columns of the PivotTable, slicing the data by product class. This is the surrounding area of the formula—that is, a set of filters applied to the database prior to the formula evaluation. Each cell of the PivotTable evaluates the DAX expression independently from the other cells. When the formula iterates the Sales table, it does not compute it over the entire database because it does not have the option to look at all the rows. When DAX computes the formula in a cell that intersects the White color and Economy class, only the products that are White and Economy are visible. Because of that, the formula only considers sales pertinent to the white products in the economy class.

Any DAX formula specifies a calculation, but DAX evaluates this calculation in a context that defines the final computed value. The formula is always the same, but the value is different because DAX evaluates it against different subsets of data. The only case where the formula behaves in the way it has been defined is in the grand total. At that level, because no filtering happens, the entire database is visible.

Any filter applied to a table automatically propagates to other tables in the data model by following the filter propagation directions, which were specified in the relationships existing between tables.

We call this context the filter context. As its name suggests, it is a context that filters tables. Any formula you ever write will have a different value depending on the filter context that DAX uses to perform its evaluation. However, the filter context is only one part of the evaluation context, which is made by the interaction of the filter context and row context.

Row context

In a DAX expression, the syntax of a column reference is valid only when there is a notion of “current row” in the table from which you get the value of a column. Observe the following expression:

Sales[Quantity] * Sales[UnitPrice]

In practice, this expression is valid only when it is possible to identify something similar to the generic concept of “current row” in the Sales table. This concept is formally defined as row context. A column reference in a DAX expression is valid only when there is an active row context for the table that is referenced. You have a row context active for the DAX expressions written in the following:

Image A calculated column

Image The argument executed in an iterator function in DAX (all the functions with an X suffix and any other function that iterates a table, such as FILTER, ADDCOLUMNS, SELECTCOLUMNS, and many others)

Image The filter expression for a security role

If you try to evaluate a column reference when there is no row context active for the referenced table, you get a syntax error.

A row context does not propagate to other tables automatically. You can use a relationship to propagate a row context to another table, but this requires the use of a specific DAX function called RELATED.

CALCULATE and CALCULATETABLE

DAX has two functions that can modify a filter context before executing an expression: CALCULATE and CALCULATETABLE. The only difference between the two functions is that the former returns a single value (string or numeric), whereas the latter executes a table expression and returns a table.

The filter context is a set of filters that are applied to columns and/or tables of the data model. Each filter corresponds to a list of the values allowed for one or more columns, or for the entire table. By invoking CALCULATE or CALCULATETABLE, you modify the existing filter context, overriding any existing filters and/or adding new filters. To simplify the explanation, we consider the syntax of CALCULATE, but all the considerations apply to CALCULATETABLE, too.

The syntax for CALCULATE is as follows:

CALCULATE ( expression, filter1, filter2, , filterN )

CALCULATE accepts any number of parameters. The only mandatory one is the first parameter in the expression. We call the conditions following the first parameter the filter arguments.

CALCULATE does the following:

Image It places a copy of the current filter context into a new filter context.

Image It evaluates each filter argument and produces for each condition the list of valid values for that specific column.

Image If two or more filter arguments affect the same column filters, they are merged together using an AND operator (or, in mathematical terms, using the set intersection).

Image It uses the new condition to replace the existing filters on the columns in the model. If a column already has a filter, then the new filter replaces the existing one. If, on the other hand, the column does not have a filter, then DAX simply applies the new column filter.

Image After the new filter context is evaluated, CALCULATE computes the first argument (the expression) in the new filter context. At the end, it will restore the original filter context, returning the computed result.

The filters accepted by CALCULATE can be of the following two types:

Image List of values This appears in the form of a table expression. In this case, you provide the exact list of values that you want to see in the new filter context. The filter can be a table with a single column or with many columns, as is the case of a filter on a whole table.

Image Boolean conditions An example of this might be Product[Color] = “White”. These filters need to work on a single column because the result must be a list of values from a single column.

If you use the syntax with a Boolean condition, DAX will transform it into a list of values. For example, you might write the following expression:

CALCULATE (
    SUM ( Sales[SalesAmount] ),
    Product[Color] = "Red"
)

Internally, DAX transforms the expression into the following one:

CALCULATE (
    SUM ( Sales[SalesAmount] ),
    FILTER (
        ALL ( Product[Color] ),
        Product[Color] = "Red"
    )
)


Image Note

The ALL function ignores any existing filter context, returning a table with all the unique values of the column specified.


For this reason, you can reference only one column in a filter argument with a Boolean condition. DAX must detect the column to iterate in the FILTER expression, which is generated in the background automatically. If the Boolean expression references more columns, then you must write the FILTER iteration in an explicit way.

Context transition

CALCULATE performs another very important task: It transforms any existing row context into an equivalent filter context. This is important when you have an aggregation within an iterator or when, in general, you have a row context. For example, the following expression (defined in the No CT measure shown later in Figure 4-2) computes the quantity of all the sales (of any product previously selected in the filter context) and multiplies it by the number of products:

SUMX (
    Product,
    SUM ( Sales[Quantity] )
)

The SUM aggregation function ignores the row context on the Product table produced by the iteration made by SUMX. However, by embedding the SUM in a CALCULATE function, you transform the row context on Product into an equivalent filter context. This automatically propagates to the Sales table thanks to the existing relationship in the data model between Product and Sales. The following expression is defined in the Explicit CT measure:

SUMX (
    Product,
    CALCULATE ( SUM ( Sales[Quantity] ) )
)

When you use a measure reference in a row context, there is always an implicit CALCULATE function surrounding the expression executed in the measure, so the previous expression corresponds to the following one, defined in the Implicit CT measure:

SUMX (
    Product,
    [Total Quantity]
)

The measure from Total Quantity in the previous expression corresponds to the following expression:

SUM ( Sales[Quantity] )

As you see in the results shown in Figure 4-2, replacing a measure with the underlying DAX expression is not correct. You must wrap such an expression within a CALCULATE function, which performs the same context transition made by invoking a measure reference.

Image

Figure 4-2 The different results of a similar expression, with and without context transition.

Variables

When writing a DAX expression, you can avoid repeating the same expression by using variables. For example, look at the following expression:

VAR
    TotalSales = SUM ( Sales[SalesAmount] )
RETURN
    ( TotalSales - SUM ( Sales[TotalProductCost] ) ) / TotalSales

You can define many variables, and they are local to the expression in which you define them. Variables are very useful both to simplify the code and because they enable you to avoid repeating the same subexpression. Variables are computed using lazy evaluation. This means that if you define a variable that, for any reason, is not used in your code, then the variable will never be evaluated. If it needs to be computed, then this happens only once. Later usages of the variable will read the previously computed value. Thus, they are also useful as an optimization technique when you use a complex expression multiple times.

Measures

You define a measure whenever you want to aggregate values from many rows in a table. The following convention is used in this book to define a measure:

Table[MeasureName] := <expression>

This syntax does not correspond to what you write in the formula editor in Visual Studio because you do not specify the table name there. We use this writing convention in the book to optimize the space required for a measure definition. For example, the definition of the Total Sales measure in the Sales table (which you can see in Figure 4-3) is written in this book as the following expression:

Sales[Total Sales] := SUMX ( Sales, Sales[Quantity] * Sales[Unit Price] )

Image

Figure 4-3 Defining a measure in Visual Studio that includes only the measure name; the table is implicit.

A measure needs to be defined in a table. This is one of the requirements of the DAX language. However, the measure does not really belong to the table. In fact, you can move a measure from one table to another one without losing its functionality.

The expression is executed in a filter context and does not have a row context. For this reason, you must use aggregation functions, and you cannot use a direct column reference in the expression of a measure. However, a measure can reference other measures. You can write the formula to calculate the margin of sales as a percentage by using an explicit DAX syntax, or by referencing measures that perform part of the calculation. The following example defines four measures, where the Margin and Margin % measures reference other measures:

Sales[Total Sales] := SUMX ( Sales, Sales[Quantity] * Sales[Unit Price] )
Sales[Total Cost]  := SUMX ( Sales, Sales[Quantity] * Sales[Unit Cost] )
Sales[Margin]      := [Total Sales] - [Total Cost]
Sales[Margin %]    := DIVIDE ( [Margin], [Total Sales] )

The following Margin % Expanded measure corresponds to Margin %. All the referenced measures are expanded in a single DAX expression without the measure references. The column references are always executed in a row context generated by a DAX function (always SUMX in the following example):

Sales[Margin % Expanded] :=
DIVIDE (
    SUMX ( Sales, Sales[Quantity] * Sales[Unit Price] )
        - SUMX ( Sales, Sales[Quantity] * Sales[Unit Cost] ),
    SUMX ( Sales, Sales[Quantity] * Sales[Unit Price] )
)

You can also write the same expanded measure using variables, making the code more readable and avoiding the duplication of the same DAX subexpression (as it is the case for TotalSales, in this case):

Sales[Margin % Variables]:=
VAR TotalSales =
    SUMX ( Sales, Sales[Quantity] * Sales[Unit Price] )
VAR TotalCost =
    SUMX ( Sales, Sales[Quantity] * Sales[Unit Cost] )
VAR Margin = TotalSales – TotalCost
RETURN
    DIVIDE ( Margin, TotalSales )

Calculated columns

A calculated column is just like any other column in a table. You can use it in the rows, columns, filters, or values of a PivotTable or in any other report. You can also use a calculated column to define a relationship. The DAX expression defined for a calculated column operates in the context of the current row of the table to which it belongs (a row context). Any column reference returns the value of that column for the current row. You cannot directly access the values of the other rows. If you write an aggregation, the initial filter context is always empty (there are no filters active in a row context).

The following convention is used in this book to define a calculated column:

Table[ColumnName] = <expression>

This syntax does not correspond to what you write in the formula editor in Visual Studio because you do not specify the table and column names there. We use this writing convention in the book to optimize the space required for a calculated column definition. For example, the definition of the Price Class calculated column in the Sales table (see Figure 4-4) is written in this book as follows:

Sales[Price Class] =
SWITCH (
    TRUE,
    Sales[Unit Price] > 1000, "A",
    Sales[Unit Price] > 100, "B",
    "C"
)

Image

Figure 4-4 How the definition of a calculated column in Visual Studio does not include the table and column names.

Calculated columns are computed during the database processing and then stored in the model. This might seem strange if you are accustomed to SQL-computed columns (not persisted), which are computed at query time and do not use memory. In Tabular, however, all the calculated columns occupy space in memory and are computed during table processing.

This behavior is helpful whenever you create very complex calculated columns. The time required to compute them is always at process time and not query time, resulting in a better user experience. Nevertheless, you must remember that a calculated column uses precious RAM. If, for example, you have a complex formula for a calculated column, you might be tempted to separate the steps of computation in different intermediate columns. Although this technique is useful during project development, it is a bad habit in production because each intermediate calculation is stored in RAM and wastes precious space.

For example, if you have a calculated column called LineAmount, it is defined as follows:

Sales[LineAmount] = Sales[Quantity] * Sales[UnitPrice]

You might create the following Total Amount measure:

Sales[Total Amount] := SUM ( Sales[LineAmount] )

However, you should be aware that in reality, the last expression corresponds to the following:

Sales[Total Amount] := SUMX ( Sales, Sales[LineAmount] )

You can create the same Total Amount measure by writing a single formula in the following way:

Sales[Total Amount] :=
SUMX (
    Sales,
    Sales[Quantity] * Sales[UnitPrice]
)

Replacing a column reference pointing to a calculated column is usually a good idea if the column has a relatively large cardinality, as is the case in LineAmount. However, you might prefer to implement a calculated column instead of writing a single dynamic measure whenever you use a column reference to the calculated column in a filter argument of CALCULATE.

For example, consider the following expression of the Discounted Quantity measure, returning the sum of Quantity for all the sales having some discount:

Sales[Discounted Quantity] :=
CALCULATE (
    SUM ( Sales[Quantity] ),
    Sales[Unit Discount] <> 0
)

You can create a calculated column HasDiscount in the Sales table using the following expression:

Sales[HasDiscount] = Sales[Unit Discount] <> 0

Then, you can use the calculated column in the filter of CALCULATE, reducing the number of values pushed in the filter context (one value instead of all the unique values of Unit Discount except zero):

sales[Discounted Quantity Optimized] :=
CALCULATE (
    SUM ( Sales[Quantity] ),
    Sales[HasDiscount] = TRUE
)

Using a calculated column that produces a string or a Boolean to filter data is considered a good practice. It improves query performances with a minimal cost in terms of memory, thanks to the high compression of a column with a low number of unique values.

Calculated tables

A calculated table is the result of a DAX table expression that is materialized in the data model when you refresh any part of it. A calculated table can be useful to create a lookup table from existing data to create meaningful relationships between entities.

The following convention is used in this book to define a calculated table:

Table = <expression>

This syntax does not correspond to what you write in the formula editor in Visual Studio because you do not specify the table name there. We use this writing convention in the book to optimize the space required for a calculated table definition. For example, the definition of the calculated table Colors, as shown in Figure 4-5, is written in this book as follows:

Colors =
UNION (
    ALL ( 'Product'[Color] ),
    DATATABLE (
        "Color", STRING,
        {
            { "*custom*" },
            { "Cyan" },
            { "Magenta" },
            { "Lime" },
            { "Maroon" }
        }
    )
)

Image

Figure 4-5 The definition of a calculated column in Visual Studio, which does not include the table and column names.

Calculated tables are computed at the end of the database processing and then stored in the model. In this way, the engine guarantees that the table is always synchronized with the data that exists in the data model.

Writing queries in DAX

You can use DAX as both a programming language and a query language.

A DAX query is a DAX expression that returns a table, used with the EVALUATE statement. The complete DAX query syntax is as follows:

[DEFINE { MEASURE <tableName>[<name>] = <expression> }]
EVALUATE <table>
[ORDER BY {<expression> [{ASC | DESC}]} [, …]
    [START AT {<value>|<parameter>} [, …]]
]

The initial DEFINE MEASURE part can be useful to define measures that are local to the query (that is, they exist for the lifetime of the query). It becomes very useful when you are debugging formulas, because you can define a local measure, test it, and then put it in the model once it behaves as expected. This is very useful when using DAX Studio to test your measures, as shown in Figure 4-6.

Image

Figure 4-6 The execution of a DAX query using DAX Studio.

For example, the following query evaluates the Total Sales and Net Sales measures for each product category. The Total Sales measure is defined in the data model, whereas Net Sales is defined within the query and would override any measure with the same name defined in the data model.

DEFINE
    MEASURE Sales[Net Sales] =
        SUMX ( Sales, Sales[Quantity] * Sales[Net Price] )
EVALUATE
ADDCOLUMNS (
    ALL ( 'Product Category'[Category] ),
    "Sales", [Total Sales],
    "Net Sales", [Net Sales]
)

Writing DAX queries is useful to test and debug your measures. It is also required to create efficient reports in Microsoft SQL Server Reporting Services. This is because DAX is more efficient than MDX for producing the type of tabular results required in datasets for Reporting Services.

Formatting DAX code

A very important aspect of DAX is formatting the code. DAX is a functional language, meaning that no matter how complex it is, a DAX expression is always a single function call with some parameters. The complexity of the code translates into the complexity of the expressions that you use as parameters to the outermost function.

For this reason, it is normal to see expressions that span 10 lines or more. Seeing a 20-line DAX expression is not unusual, as you will see. As formulas start to grow in length and complexity, it is extremely important that you learn how to format them so they are easier to read and identify.

To understand why formatting is very important, we show a formula that computes the sum of account balances for the last available date of each account. It is a somewhat complex formula, but definitely not the most complex one you can author. The following example shows you what it can look like if you do not format the expression:

IF (COUNTX (BalanceDate, CALCULATE (COUNT( Balances[Balance] ), ALLEXCEPT ( Balances,
BalanceDate[Date] ))) > 0, SUMX (ALL ( Balances[Account] ), CALCULATE (SUM(
Balances[Balance] ), LASTNONBLANK (DATESBETWEEN (BalanceDate[Date], BLANK(),LASTDATE(
BalanceDate[Date] )), CALCULATE ( COUNT( Balances[Balance] ))))), BLANK ())

It is nearly impossible to understand what this formula computes. It is not clear what the outermost function is, nor how DAX evaluates the different parameters to create the complete flow of execution. It is very hard to read the formula and try to correct it (in case there is some error) or to modify it for whatever reason.

The following example is the same expression, properly formatted:

IF (
    COUNTX (
        'Date',
        CALCULATE (
            COUNT ( Balances[Balance] ),
            ALLEXCEPT ( Balances, 'Date'[Date] )
        )
    ) > 0,
    SUMX (
        VALUES ( Balances[Account] ),
        CALCULATE (
            SUM ( Balances[Balance] ),
            LASTNONBLANK (
                DATESBETWEEN ( 'Date'[Date], BLANK (), LASTDATE ( 'Date'[Date] ) ),
                CALCULATE ( COUNT ( Balances[Balance] ) )
            )
        )
    ),
    BLANK ()
)

The code is the same. This time, however, it is much easier to look at the three parameters of IF. More importantly, it is easier to follow the blocks that arise naturally by indenting lines, and you can more easily see how they compose the complete flow of execution. Yes, the code is still hard to read, and it is longer. But now the problem lies in using DAX, not the formatting.

We use a consistent set of rules to format DAX code, which we employed in this book. The complete list is available at http://sql.bi/daxrules.

DAX Formatter, DAX Studio, and DAX Editor

We created a website that is dedicated to formatting DAX code. We did it for ourselves, because formatting the code is a time-consuming operation, and we do not want to spend our time formatting every formula we write. Once the tool was working, we decided to donate it to the public domain so that users can format their own DAX code. (By the way, we have been able to promote our formatting rules this way.) You can find it at http://www.daxformatter.com. The user interface is simple: Just copy your DAX code into the formatter at the top and click Format. The page refreshes, showing a nicely formatted version of your code, which you can then copy and paste back into the original window.

The formatting provided by DAX Formatter is also available as a service that can be used by other tools. For example, DAX Studio uses the DAX Formatter service. You can use DAX Studio (http://daxstudio.codeplex.com/) as an editor for your DAX expressions, and use the Format Query button in the ribbon to format the code as required. You can also use DAX Editor (http://www.sqlbi.com/tools/dax-editor) to edit multiple measures of a tabular project in a single text file within Visual Studio. Also, DAX Editor uses the DAX Formatter service to properly format DAX expressions.

Summary

In this chapter, you explored the syntax of DAX, its data types, and the available operators and functions. The most important concepts you have learned are the difference between a calculated column and a measure, and the components of an evaluation context, which are the filter context and the row context. You also learned the following:

Image CALCULATE and CALCULATETABLE are efficient functions that compute an expression in a modified filter context.

Image There is a syntax to define variables in DAX.

Image You should always include the table name in a column reference, and always omit the table name in a measure reference.

Image Table expressions can be used in calculated tables and DAX queries.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset