Chapter 1. Introducing the tabular model

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 1. Introducing the tabular model

This chapter introduces SQL Server Analysis Services (SSAS) 2016, provides a brief overview of what the tabular model is, and explores its relationship to the multidimensional model, to SSAS 2016 as a whole, and to the wider Microsoft business intelligence (BI) stack. This chapter will help you make what is probably the most important decision in your project’s life cycle: whether you should use a tabular model or a multidimensional model. Finally, it includes a short description of the main differences in tabular models between SSAS 2016 and previous versions.

What’s new in SSAS 2016

New compatibility levels and a new implementation of DirectQuery have improved performance and removed limitations present in previous versions.

Semantic models in Analysis Services

In the Microsoft ecosystem, BI is not a single product, but a set of features distributed across several products. The following sections explain the role of SSAS in this ecosystem. The tabular model is one of the two types of semantic models you can create in SSAS. (The other is the multidimensional model.)

What is Analysis Services and why should I use it?

SSAS is an online analytical data engine—a type of service that is highly optimized for the kinds of queries and calculations that are common in a business intelligence environment. It does many of the same things that a relational database can do, but it differs in many respects. In most cases, it will be easier to develop your BI solution by using SSAS in combination with a relational database, such as Microsoft SQL Server, than by using SQL Server alone. SSAS does not replace the need for a relational database or a properly designed data warehouse, however.

One way of thinking about SSAS is as an extra layer of metadata, or a semantic model, that sits on top of a data warehouse in a relational database. This extra layer contains information about how fact tables and dimension tables should be joined, how measures should aggregate up, how users should be able to explore the data through hierarchies, the definitions of common calculations, and so on. This layer also includes one or more models containing the business logic of your data warehouse. End usersquery these models rather than the underlying relational database. With all this information stored in a central place and shared by all users, the queries that users need to write become much simpler. In most cases, all a query needs to do is describe which columns and rows are required, and the model applies the appropriate business logic to ensure that the numbers that are returned make sense. Most important, it becomes impossible to write a query that returns “incorrect” results due to a mistake by end users. This, in turn, means that end-user reporting and analysis tools must do much less work and can provide a clearer visual interface for end users to build queries. It also means that different tools can connect to the same model and return consistent results.

Another way of thinking about SSAS is as a kind of cache that you can use to speed up reporting. In most scenarios in which SSAS is used, it is loaded with a copy of the data in the data warehouse. Subsequently, all reporting and analytic queries are run against SSAS instead of the relational database. Even though modern relational databases are highly optimized and contain many features specifically aimed at BI reporting, SSAS specifically designed for this type of workload and can, in most cases, achieve much better query performance. For end users, optimized query performance is extremely important because it allows them to browse through data without waiting a long time for reports to run and without any breaks in their chain of thought.

For the IT department, the biggest benefit of all this is that it becomes possible to transfer the burden of authoring reports to end users. A common problem with BI projects that do not use Online Analytical Processing (OLAP) is that the IT department must build not only a data warehouse but also a set of reports to go with it. This increases the amount of time and effort involved, and can be a cause of frustration for the business when it finds that IT is unable to understand its reporting requirements or to respond to them as quickly as is desirable. When an OLAP database such as SSAS is used, the IT department can expose the models it contains to end users and enable them to build reports themselves, using whatever tool with which they feel comfortable. By far the most popular client tool is Microsoft Excel. Ever since Office 2000, Excel PivotTables have been able to connect directly to SSAS multidimensional models (also known as cubes), and Excel 2016 has some extremely powerful capabilities as a client for SSAS.

All in all, Analysis Services not only reduces the IT department’s workload but also increases end-user satisfaction. Users now find they can build the reports they want and explore the data at their own pace, without having to go through an intermediary.

A short history of Analysis Services

SQL Server Analysis Services—or OLAP Services, as it was originally called when it was released in 1998 with SQL Server 7.0—was the first foray by Microsoft into the BI market. When it was released, many people saw it as an indicator that BI software was ready to break out of its niche and reach a mass market. Over the past 16 years, the success of Analysis Services and the rest of the Microsoft BI stack has proved them correct.

SQL Server Analysis Services 2000 was the first version of Analysis Services to gain significant traction in the marketplace. Analysis Services 2005 became the biggest-selling OLAP tool not long after its release. As Analysis Services 2008 and 2008 R2 improved scalability and performance still further, more and more companies started to adopt it as a cornerstone of their BI strategy. By 2010, terabyte-sized cubes were not uncommon. The famous example of the 24-terabyte (TB) cube Yahoo! built shows just what can be achieved.

Microsoft Analysis Services 2012 leveraged an existing infrastructure to introduce a new engine and a new type of data model, essentially becoming two products in one. It still contains Analysis Services from the SQL Server 2008 R2 release and before, but that has become known as the multidimensional model. Although Analysis Services has seen a few improvements since the 2008 R2 release, related to performance, scalability, and manageability, no new major functionality has been introduced since then. Meanwhile, Analysis Services 2012 has a new data-modeling experience and a new engine that closely resembles the Power Pivot and Power BI data modeling experience. This is called the tabular model.

The following version of SQL Server did not introduce new BI features, so there are no differences between Analysis Services 2012 and 2014, provided you run the latest service packs and cumulative updates. However, Analysis Services 2016 introduces many new features and improvements to the tabular model, to the point that we considered it necessary to write a new book about it.

The tabular model in Analysis Services 2016 is the subject of this book. We will cover migration issues from tabular models created in former versions of Analysis Services, but if you are not planning to upgrade to 2016, then we recommend you read our previous book, Microsoft SQL Server 2012 Analysis Services: The BISM Tabular Model.

Understanding Tabular and Multidimensional

This section explains a little about the architecture of Analysis Services, which since SQL Server 2012 has been split into two modes: Tabular and Multidimensional.

When installing Analysis Services, you must choose between installing an instance that runs in Tabular mode and one that runs in Multidimensional mode. (For more details on the installation process, see Chapter 2, “Getting started with the tabular model.”) A Tabular mode instance can support only databases containing tabular models, and a Multidimensional mode instance can support only databases containing multidimensional models. Although these two parts of Analysis Services share much of the same code underneath, in most respects they can be treated as separate products. The concepts involved in designing the two types of models are very different. You cannot convert a tabular database into a multidimensional database, or vice versa, without rebuilding everything from scratch. That said, it is important to emphasize the fact that, from an end user’s point of view, the two models do almost exactly the same things and appear almost identical when used through client tools such as Excel and Power BI.

The following sections compare the functionalities available in the tabular and multidimensional models and define some important terms that are used throughout the rest of this book.

The tabular model

A database is the highest-level object in the tabular model. It is very similar to the concept of a database in the SQL Server relational database. An instance of Analysis Services can contain many databases, and each database can be thought of as a self-contained collection of objects and data relating to a single business solution. If you are writing reports or analyzing data and find that you need to run queries on multiple databases, you have probably made a design mistake somewhere because everything you need should be contained in a single database. If you face a situation where data is scattered across different tabular databases, you should consider refactoring the analytical models into one. Discussing the correct design of analytical databases and of underlying data warehouse and/or data marts is beyond the scope of this book. You can find more information about this in the “SQLBI Methodology” whitepaper (http://www.sqlbi.com/articles/sqlbi-methodology/).

You design tabular models using SQL Server Data Tools (SSDT) for Visual Studio (VS). A project in SSDT maps onto a database in Analysis Services. After you have finished designing a project in SSDT, it must be deployed to an instance of Analysis Services. This means SSDT executes numerous commands to create a new database in Analysis Services or alters the structure of an existing database. You can also use SQL Server Management Studio (SSMS), a tool that can be used to manage databases that have already been deployed, to write queries against databases.

Databases are made up of one or more tables of data. Again, a table in the tabular model is very similar to a table in the relational database world. A table in the tabular model is usually loaded from a single table in a relational database or from the results of a SQL SELECTstatement. A table has a fixed number of columns that are defined at design time and can have a variable number of rows, depending on the amount of data that is loaded. Each column has a fixed type. For example, a single column could contain only integers, only text, or only decimal values. Loading data into a table is referred to as processing that table.

It is also possible to define relationships between tables at design time. Unlike in SQL, it is not possible to define relationships at query time; all queries must use these preexisting relationships. However, relationships between tables can be marked as active or inactive, and at query time it is possible to choose which relationships between tables are actually used. It is also possible to simulate the effect of relationships that do not exist inside queries and calculations. All the relationships are one-to-one or one-to-many, and must involve just one column from each of two tables. A relationship can propagate a filter in one or both directions. It is not possible to design relationships based on more than one column from a table or recursive relationships that join a table to itself.

The tabular model uses a purely memory-based engine and stores only a copy of its data on disk. That way, no data is lost if the service is restarted. Whereas the multidimensional model, like most relational database engines, stores its data in a row-based format, the tabular model uses a column-oriented database called the in-memory analytics engine. In most cases, this offers significant improvements in query performance. (For more details on the column-based type of database, see http://en.wikipedia.org/wiki/Column-oriented_DBMS.)

Note

The in-memory analytics engine was known as the VertiPaq engine before the public release of Analysis Services 2012. Many references to the VertiPaq name remain in documentation, blog posts, and other material online. It even persists inside the product itself in property names and Profiler events. For these reasons and for brevity, we will use the term VertiPaq in this book when referring to the in-memory analytics engine.

Queries and calculations in the tabular model are defined in Data Analysis eXpressions (DAX), the native language of a model created in the tabular model, Power Pivot, or Power BI. The multidimensional model has internal calculations defined in Multi Dimensional eXpressions (MDX) language. Client tools can generate DAX or MDX queries to retrieve data from a semantic model, regardless of whether it is a tabular or a multidimensional one. This means the tabular model is backward-compatible with the large number of existing Analysis Services client tools designed for the multidimensional model that are available from Microsoft, such as Excel and SQL Server Reporting Services, as well as tools from third-party software vendors that use MDX to query a semantic model. At the same time, the multidimensional model is compatible with new client tools such as Power BI, which generates queries in DAX.

You can add derived columns, called calculated columns, to a table in a tabular model. They use DAX expressions to return values based on the data already loaded in other columns or other tables in the same Analysis Services database. You can add derived tables, called calculated tables, to a tabular model as if they were new tables. They use DAX table expressions to return values based on data already loaded in other tables in the same Analysis Services database. Calculated columns and calculated tables are populated at processing time. After processing has taken place, they behave in exactly the same way as regular columns and tables.

You can also define measures on tables by using DAX expressions. You can think of a measure as a DAX expression that returns some form of aggregated value based on data from one or more columns. A simple example of a measure is one that returns the sum of all values from a column of data that contains sales volumes. Key performance indicators (KPIs) are very similar to measures, but are collections of calculations that enable you to determine how well a measure is doing relative to a target value and whether it is getting closer to reaching that target over time.

Most front-end tools such as Excel use a pivot table–like experience for querying tabular models. For example, you can drag columns from different tables onto the rows axis and columns axis of a pivot table so that the distinct values from these columns become the individual rows and columns of the pivot table, and measures display aggregated numeric values inside the table. The overall effect is something like a Group By query in SQL, which aggregates rows by selected fields. However, the definition of how the data aggregates up is predefined inside the measures and is not necessarily specified inside the query itself.

To improve the user experience, it is also possible to define hierarchies on tables inside the tabular model. These create multilevel, predefined drill paths. Perspectives can hide certain parts of a complex model, which can aid usability, and security roles can be used to deny access to specific rows of data from tables to specific users. Perspectives should not be confused with security, however. Even if an object is hidden in a perspective, it can still be queried, and perspectives themselves cannot be secured.

The multidimensional model

At the highest level, the multidimensional model is very similar to the tabular model: Data is organized in databases, and databases are designed in SSDT (formerly BI Development Studio, or BIDS) and managed using SQL Server Management Studio.

The differences become apparent below the database level, where multidimensional rather than relational concepts are prevalent. In the multidimensional model, data is modeled as a series of cubes and dimensions, not tables. Each cube is made up of one or more measure groups, and each measure group in a cube is usually mapped onto a single fact table in the data warehouse. A measure group contains one or more measures, which are very similar to measures in the tabular model. A cube also has two or more dimensions. One special dimension, the measures dimension, contains all the measures from each of the measure groups. Various other dimensions such as time, product, geography, customer, and so on, map onto the logical dimensions present in a dimensional model. Each of these non-measures dimensions consists of one or more attributes (for example, on a date dimension, there might be attributes such as date, month, and year), and these attributes can themselves be used as single-level hierarchies or to construct multilevel user hierarchies. Hierarchies can then be used to build queries. Users start by analyzing data at a highly aggregated level, such as a year level on a time dimension, and can then navigate to lower levels such as quarter, month, and date to look for trends and interesting anomalies.

As you would expect, because the multidimensional model is the direct successor to versions of Analysis Services before 2016, it has a very rich and mature set of features representing the fruit of more than a decade of development (even if some of them are not used very often). Most of the features available in the tabular model are present in the multidimensional model, but the multidimensional model also has many features that have not yet been implemented in the tabular one. A detailed feature comparison between the two models appears later in this chapter.

In terms of data storage, the multidimensional model can store its data in three ways:

Multidimensional OLAP (MOLAP) In this model, all data is stored inside Analysis Services’ own disk-based storage format.

Relational OLAP (ROLAP) In this model, Analysis Services acts purely as a metadata layer. No data is stored in Analysis Services itself. Instead, SQL queries are run against the relational source database when a cube is queried.

Hybrid OLAP (HOLAP) This is the same as ROLAP but some pre-aggregated values are stored in MOLAP.

MOLAP storage is used in the vast majority of implementations, although ROLAP is sometimes used when there is a requirement for so-called real-time BI. HOLAP is almost never used.

One area in which the multidimensional and tabular models differ is in the calculation languages they support. The native language of the multidimensional model is MDX. Only MDX can be used for defining calculations, whereas queries can be in either MDX or DAX. The MDX language has been successful and is supported by numerous third-party client tools for Analysis Services. It was also promoted as a semi-open standard by a cross-vendor industry body called the XMLA Council (now effectively defunct). As a result, it has also been adopted by many other OLAP tools that are direct competitors to Analysis Services. The problem with MDX, however, is the same problem many people have with the multidimensional model in general: Although it is extremely powerful, many BI professionals have struggled to learn it because the concepts it uses, such as dimensions and hierarchies, are very different from the ones they are accustomed to using in SQL.

Why have two models?

Why has this split happened? There are several reasons:

The Analysis Services multidimensional model was designed in an age of 32-bit servers with one or two processors and less than a gigabyte of RAM, when disk-based storage was the only option for databases. Times have changed, however, and modern hardware is radically different. Now, a new generation of memory-based, columnar databases has set the standard for query performance with analytic workloads, and Analysis Services must adopt this new technology to keep up. Retrofitting the new in-memory analytics engine into the existing multidimensional model was not, however, a straightforward job, so it was necessary to introduce the new tabular model to take full advantage of VertiPaq.

Despite the success of the Analysis Services multidimensional model, there has always been a perception that it is difficult to learn. Some database professionals, accustomed to relational data modeling, struggle to grasp multidimensional concepts—and those who manage it find the learning curve is steep. Therefore, if Microsoft wants to bring BI to an ever-wider audience, it must simplify the development process—hence the move from the complex world of the multidimensional model to the relatively simple and familiar concepts of the tabular model.

Microsoft sees self-service BI as a huge potential source of growth, where Power Pivot and Power BI are its entry into this market. (Self-service BI enables less-technical business users to build BI solutions.) It is also important to have consistency between the Microsoft self-service and corporate BI tools. Therefore, if Analysis Services must be overhauled, it makes sense to make it compatible with Power Pivot and Power BI, with a similar design experience, so self-service models can easily be upgraded to full-fledged corporate solutions.

Some types of data are more appropriate or more easily modeled using the tabular approach, and some types of data are more appropriate or more easily modeled for a multidimensional approach. Having different models gives developers the choice to use whichever approach suits their circumstances.

What is the BI Semantic Model (BISM)?

One term that was mentioned a lot in discussions about Analysis Services 2012 before its releasewas BI Semantic Model (BISM). This term does not refer to either the multidimensional or tabular model specifically. Instead, it describes the function of Analysis Services in the Microsoft BI stack—the fact that it acts as a semantic layer on top of a relational data warehouse, adding a rich layer of metadata that includes hierarchies, measures, and calculations. In that respect, it is very similar to the term Unified Dimensional Model (UDM) that was used around the time of the SQL Server 2005 launch. In some cases, BISM has been used to refer only to the tabular model, but this is not correct. Because this book is specifically concerned with the tabular model, we will not use this term very often. Nevertheless, we believe it is important to understand exactly what it means and how it should be used.

The future of Analysis Services

Having two models inside Analysis Services, plus two query and calculation languages, means you must choose which model to use at the start of your project. The only problem is, at that early stage, you might not know enough about your requirements to know which one is appropriate. We will address this dilemma in the next section. Having two models also means that anyone who decides to specialize in Analysis Services must learn two technologies or else specialize in just one of the modeling types.

Microsoft has been very clear in saying that the multidimensional model is not deprecated and that the tabular model is not its replacement. It is likely that new features for the multidimensional model will be released in future versions of Analysis Services. The fact that the tabular and multidimensional models share some of the same code suggests that some new features could easily be developed for both models simultaneously. In the very long term, the two models might converge and offer much the same functionality, so the decision about which model to use will be based on whether the developer prefers to use a multidimensional or relational way of modeling data. For example, support for DAX queries in the multidimensional model was introduced with an update of Analysis Services 2012 (Cumulative Update 3 for Service Pack 1), and it represented one important step in this direction. In fact, thanks to this feature, the Power BI client can establish a live connection to a multidimensional model.

Azure Analysis Services

Analysis Services is also available as a service on Azure, named Azure Analysis Services (Azure AS). You cancreate an instance of Azure AS that is deployed in a very short amount of time, and you pay only when said instance is active. When you pause the service, there is no charge, even if data is still loaded in the instance and ready to be queried as soon as you restart the service. As of February 2017, only the tabular model is available on Azure. For the purposes of this book, you can consider an instance of Azure AS to be an instance of an on-premises version of Analysis Services. You can just skip all the details about the installation of the services, because the provisioning is automatically managed by the Azure infrastructure.

Choosing the right model for your project

It might seem strange to address the question of whether the tabular model is appropriate for your project at this point in the book, before you have even learned anything about it. But you must answer this question at an equally early stage of your BI project. At a rough guess, either model will work equally well for about 60 to 70 percent of projects, but for the remaining 30 to 40 percent, the correct choice of model will be vital.

As stated, after you have started developing with one model in Analysis Services, there is no way to switch over to use the other. You would have to start all over again from the beginning, possibly wasting much precious development time. Therefore, it is very important to make the correct decision as soon as possible. When making this decision, you must take many factors into account. This section discusses all of them in a reasonable amount of detail. You can then bear these factors in mind as you read the rest of this book. When you finish it, you will be in a position to determine whether to use a tabular or multidimensional model.

Licensing

Analysis Services 2016 is available in SQL Server Standard and SQL Server Enterprise editions. In the SQL Server Standard edition, both multidimensional and tabular models are available, even with certain limitations for cores, memory, and features available. This means several important features needed for scaling up a model, such as partitioning, are not available in the SQL Server Standard edition. This is a short recap of the limitations of the Standard edition. (Please refer to official Microsoft licensing documentation for a more detailed and updated description, and remember that the Enterprise edition does not have such limitations.)

Memory An instance in the multidimensional model can allocate up to 128 gigabytes (GB), whereas an instance in the tabular model can allocate up to 16 GB. This limitation mainly affects tabular models. Because all the data must be allocated in memory, the compressed database must consume no more than 16 GB. Considering the compression ratio and the need for memory during query, this limit corresponds to an uncompressed relational database of 100 to 150 GB. (The exact compression ratio depends on many factors. You can increase compression using best practices described in Chapter 12, “Inside VertiPaq,” and Chapter 15, “Optimizing tabular models.”)

Cores You cannot use more than 24 cores. Considering the limit in database size, this limitationshould not affect more than the memory constraint.

Partitions You cannot split a table in multiple partitions regardless of whether you use a multidimensional or tabular model. This affects both processing and query performance in the multidimensional model, whereas it only affects processing performance in the tabular model. Usually, you use partitions to process only part of a large table—for example, the current and the last month of a transactions table.

DirectQuery You cannot use DirectQuery—a feature that transforms a query sent to the semantic model in one or more queries to the underlying relational database—in the tabular model. The correspondent feature in the multidimensional model is ROLAP, which is supported in the Standard edition. This affects semantic models that must expose data changing in real-time.

Perspectives You cannot use perspectives, regardless of whether you use a multidimensional or tabular model.

Note

In Analysis Services 2012 and 2014, the features that enabled the sending of DAX queries to a multidimensional model were available only in the Enterprise and Business Intelligence editions of the product. In Analysis Services 2016, this feature is also available in the Standard edition. Azure Analysis Services supports all the features of the Enterprise edition.

Upgrading from previous versions of Analysis Services

As mentioned, there is no easy way to turn a multidimensional model into a tabular model. If any tool existed that claimed to make this transition with a few mouse clicks, it could only ever work for very simple multidimensional models and would not save much development time. Therefore, if you already have a mature Multidimensional mode implementation and the skills in-house to develop and maintain it, it probably makes no sense to abandon it and move over to the tabular model unless you have specific problems with the multidimensional model that the tabular model is likely to solve, such as several measures based on distinct count.

Ease of use

If you are starting an Analysis Services 2012 project with no previous multidimensional or OLAP experience, it is very likely that you will find the tabular model much easier to learn than the multidimensional model. Not only are the concepts much easier to understand, especially if you are used to working with relational databases, but the development process is also much more straightforward and there are far fewer features to learn. Building your first tabular model is much quicker and easier than building your first multidimensional model. It can also be argued that DAX is easier to learn than MDX, at least when it comes to writing basic calculations, but the truth is that both MDX and DAX can be equally confusing for anyone used to SQL.

Compatibility with Power Pivot

The tabular model and Power Pivot are almost identical in the way their models are designed. The user interfaces for doing so are practically the same, and both use DAX. Power Pivot models can also be imported into SQL Server Data Tools to generate a tabular model, although the process does not work the other way around. That is, a tabular model cannot be converted to a Power Pivot model. Therefore, if you have a strong commitment to self-service BI by using Power Pivot, it makes sense to use Tabular for your corporate BI projects because development skills and code are transferable between the two. However, in Tabular you can only import Power Pivot models loading data straight from a data source without using Power Query. This feature might be added in a future update of Analysis Services 2016.

Compatibility with Power BI

Despite models in Tabular and Power BI being identical and using the same core engine, Power BI uses a feature equivalent to Power Query to import data into the data model, and this feature is not supported by Analysis Services 2016 in its first release. As soon as this feature is added in a future update of Analysis Services 2016, it will be possible to import a Power BI model in Tabular, too.

Query performance characteristics

Although it would be dangerous to make sweeping generalizations about query performance, it’s fair to say that the tabular model will perform at least as well as the multidimensional model in most cases and will outperform it in some specific scenarios. For example, distinct count measures—a particular weakness of the multidimensional model—perform extremely well in the tabular model. Anecdotal evidence also suggests that queries for detail-level reports (for example, queries that return numerous rows and return data at a granularity close to that of the fact table) will perform much better in the tabular model as long as they are written in DAX and not MDX. Unfortunately, when more complex calculations are involved, it is much more difficult to say whether the multidimensional or tabular model will perform better. A proper proof of concept will be the only way to tell whether the performance of either model will meet the requirements.

Processing performance characteristics

Comparing the processing performance of the multidimensional and tabular models is difficult. The number of rows of raw data that can be processed per second, for a single partition, is likely to be similar between the two models—if you disregard the different, incomparable operations that each model performs when processing data, such as building aggregations and indexes in a multidimensional model.

However, the tabular model has the following significant advantages over the multidimensional model when it comes to processing:

There are no aggregations in the tabular model. This means there is one less time-consuming task to be performed at processing time.

Processing one table in a tabular model has no direct impact on any of the other tables in the model, whereas in the multidimensional model, processing a dimension has consequential effects. Doing a full process on a dimension in the multidimensional model means you must do a full process on any cubes in which that dimension is used. Even doing a process update on a dimension requires a process index on a cube to rebuild aggregations.

Both of these factors can cause major headaches on large Multidimensional mode deployments, especially when the window available for processing is small. One similar bottleneck in the tabular model is calculated columns, which are always computed for the entire table, even when you refresh a single partition. Thus, you should not use calculated columns in large fact tables of a data model.

Hardware considerations

The multidimensional and tabular models have very different hardware-specification requirements. The multidimensional model’s disk-based storage means it’s important to have high-performance disks with plenty of space on those disks. It also caches data, so having sufficient RAM is very useful, although not essential. For the tabular model, the performance of disk storage is much less of a priority because it is an in-memory database. For this very reason, though, it is much more important to have enough RAM to hold the database and to accommodate any spikes in memory usage that occur when queries are running or when processing is taking place.

The multidimensional model’s disk requirements will probably be easier to accommodate than the tabular model’s memory requirements. Buying a large amount of disk storage for a server is relatively cheap and straightforward for an IT department. Many organizations have storage area networks (SANs) that, although they might not perform as well as they should, make providing enough storage space (or increasing that provision) very simple. However, buying large amounts of RAM for a server can be more difficult. You might find that asking for half a terabyte of RAM on a server raises some eyebrows. If you find you need more RAM than you originally thought, increasing the amount that is available can also be awkward. Based on experience, it is easy to start with what seems like a reasonable amount of RAM. But as fact tables grow, new data is added to the model, and queries become more complex, you might start to encounter out-of-memory errors. Furthermore, for some extremely large Analysis Services implementations with several terabytes of data, it might not be possible to buy a server with sufficient RAM to store the model. In that case, the multidimensional model might be the only feasible option.

Real-time BI

Although not quite the industry buzzword that it was a few years ago, the requirement for real-time or near–real-time data in BI projects is becoming more common. Real-time BI usually refers to the need for end users to be able to query and analyze data as soon as it has been loaded into the data warehouse, with no lengthy waits for the data to be loaded into Analysis Services.

The multidimensional model can handle this in one of the following two ways:

Using MOLAP storage and partitioning your data so that all the new data in your data warehouse goes to one relatively small partition that can be processed quickly

Using ROLAP storage and turning off all caching so that the model issues SQL queries every time it is queried

The first of these options is usually preferred, although it can be difficult to implement, especially if dimension tables and fact tables change. Updating the data in a dimension can be slow and can also require aggregations to be rebuilt. ROLAP storage in the multidimensional model can often result in very poor query performance if data volumes are large, so the time taken to run a query in ROLAP mode might be greater than the time taken to reprocess the MOLAP partition in the first option.

The tabular model offers what are essentially the same two options but with fewer shortcomings than their multidimensional equivalents. If data is being stored in the in-memory engine, updating data in one table has no impact on the data in any other table, so processing times are likely to be faster and implementation much easier. If data is to remain in the relational engine, then the major difference is the equivalent of ROLAP mode, called DirectQuery. A full description of how to configure DirectQuery mode is given in Chapter 9, “Using DirectQuery.”

Client tools

In many cases, the success or failure of a BI project depends on the quality of the tools that end users employ to analyze the data being provided. Therefore, it is important to understand which client tools are supported by which model.

Both the tabular model and the multidimensional model support both MDX and DAX queries. In theory, then, most Analysis Services client tools should support both models. Unfortunately, this is not true in practice. Although some client tools such as Excel and Power BI do work equally well on both, some third-party client tools might need to be updated to their latest versions to work, and some older tools that are still in use but are no longer supported might not work properly or at all. In general, tools designed to generate MDX queries (such as Excel) work better with the multidimensional model, and tools designed to generate DAX queries (such as Power BI) work better with the tabular model, even if the support of both query languages guarantees all combinations to work.

Feature comparison

One more thing to consider when choosing a model is the functionality present in the multidimensional model that either has no equivalent or is only partially implemented in the tabular model. Not all this functionality is important for all projects, however, and it must be said that in many scenarios it is possible to approximate some of this multidimensional functionality in the tabular model by using some clever DAX in calculated columns and measures. In any case, if you do not have any previous experience using the multidimensional model, you will not miss functionality you have never had.

The following list notes the most important functionality missing in the tabular model:

Writeback This is the ability of an end user to write values back to a multidimensional database. This can be very important for financial applications in which users enter budget figures, for example.

Dimension security on measures This enables access to a single measure to be granted or denied.

Cell security This enables access to individual cells to be granted or denied. Again, there is no way of implementing this in the tabular model, but it is only very rarely used in the multidimensional model.

Ragged hierarchies This is a commonly used technique for avoiding the use of a parent/child hierarchy. In a multidimensional model, a user hierarchy can be made to look something like a parent/child hierarchy by hiding members if certain conditions are met—for example, if a member has the same name as its parent. This is known as creating a ragged hierarchy. Nothing equivalent is available in the tabular model.

Role-playing dimensions These are designed and processed once, then appear many times in the same model with different names and different relationships to measure groups. In the multidimensional model, this is known as using role-playing dimensions. Something similar is possible in the tabular model, by which multiple relationships can be created between two tables. (See Chapter 3, “Loading data inside Tabular,” for more details on this.) Although this is extremely useful functionality, it does not do exactly the same thing as a role-playing dimension. In the tabular model, if you want to see the same table in two places in the model simultaneously, you must load it twice. This can increase processing times and make maintenance more difficult. However, it is also true that using role-playing dimensions is not a best practice in terms of usability. This is because attribute and hierarchy names cannot be renamed for different roles. This creates confusion in the way data is displayed, using multiple roles of the same dimension in a report.

Scoped assignments and unary operators Advanced calculation functionality is present in MDX in the multidimensional model but it is not possible—or at least not easy—to re-create it in DAX in the tabular model. These types of calculations are often used in financial applications, so this and the lack of writeback and true parent/child hierarchy support mean that the tabular model is not suited for this class of application. Workarounds are possible in the tabular model, but at the cost of an increased development effort for each data model requiring these features.

The following functionality is only partially supported in the tabular model:

Parent/child hierarchy support In the multidimensional model, this is a special type of hierarchy built from a dimension table with a self-join on it by which each row in the table represents one member in the hierarchy and has a link to another row that represents the member’s parent in the hierarchy. Parent/child hierarchies have many limitations in the multidimensional model and can cause query performance problems. Nevertheless, they are very useful for modeling hierarchies, such as company organization structures, because the developer does not need to know the maximum depth of the hierarchy at design time. The tabular model implements similar functionality by using DAX functions, such as PATH(see Chapter 5, “Building hierarchies,” for details). Crucially, the developer must decide what the maximum depth of the hierarchy will be at design time.

Drillthrough This enables the user to click a cell to see all the detail-level data that is aggregated to return that value. Drillthrough is supported in both models, but in the multidimensional model, it is possible to specify which columns from dimensions and measure groups are returned from a drillthrough. In the tabular model, no interface exists in SQL Server data tools for doing this. By default, a drillthrough returns every column from the underlying table.

Understanding DAX and MDX

A tabular model defines its calculations using the DAX language. However, you can query a tabular model by using both DAX and MDX. In general, it is more efficient to use DAX as a query language, but the support for MDX is important to enable compatibility with many existing clients designed for the Analysis Services multidimensional model. (Keep in mind that any version of Analysis Services prior to 2012 only supported multidimensional models.) This section quickly describes the basic concepts of these two languages, guiding you in the choice of the query language (and client tools) to consume data from a tabular model.

The DAX language

DAX is a functional language that manipulates table and scalar expressions. A query in DAX always returns a table made by a variable number of rows (depending on the data) and a fixed number of typed columns (depending on the query statement). From this point of view, the result of a DAX query is very similar to that of a SQL query, and we can say that DAX is more similar to SQL than to MDX in its principles. The bigger difference is that SQL queries always express any relationship between tables in an explicit way, whereas DAX implicitly uses relationships existing in the data model, making it necessary to read the data model definition to understand the meaning of a query.

Similar to SQL, DAX does not have any semantic to operate with hierarchies. DAX manipulates only tables, rows, columns, and relationships. Even if a tabular model can include metadata defining hierarchies, the DAX language cannot use such information (which can be used by client tools to display data and has a semantic meaning only in MDX expressions).

As a pure functional language, DAX does not have imperative statements, but it leverages special functions called iterators that execute a certain expression for each row of a given table expression. These arguments are close to the lambda expressions in functional languages. However, there are limitations in the way you can combine them, so we cannot say they correspond to a generic lambda expression definition. Despite its functional nature, DAX does not allow you to define new functions and does not provide recursion.

The MDX language

The MDX language evaluates expressions in a multidimensional space. A query in MDX can return another multidimensional array, where the number of members of each axis of the result might depend on the data and could be not constrained to a limited number of members. Even if in most cases the result of an MDX expression is produced using two axes (rows and columns), you can easily build a query returning a variable number of columns, depending on context and filters used. The result of an MDX query is designed for client tools, such as a PivotTable. It could require extra effort to make sure the result of an MDX query will fit in a predefined structure with a fixed number (and type) of columns.

Every column in a table of a tabular model corresponds to an attribute of a dimension of a single cube, and the cube corresponds to the tabular model itself. Every hierarchy in a tabular model corresponds to a user hierarchy for MDX, and each measure in a tabular model corresponds to a measure for MDX that belongs to a measure group, having the same name as the table that includes the measure’s definition. Relationships in the tabular model are implicitly considered relationships between measure groups and dimensions. If you are used to star schemas, which are the best relational model for both multidimensional and tabular databases, you will see a very similar result obtained by importing the same data in Multidimensional and Tabular mode. However, the presence of user hierarchies within a table enables the use of MDX functions to navigate hierarchies that do not have any correspondent feature in DAX.

An MDX query can use scope and local measures defined in MDX, as well as local measures defined in DAX. The opposite is not true. You cannot evaluate an MDX expression from a DAX query. MDX enables recursive calculations, even if such a technique might be limited when applied to a tabular model, because defining MDX measures embedded in the data model is not supported.

Choosing the query language for Tabular

In general, the default choice for querying a tabular model should be DAX. It provides the best performance, regardless of the granularity of the calculation (aggregated or leaf-level). However, most existing client tools for Analysis Services generate queries in MDX, so it is important to evaluate what the limitations are in using MDX for a tabular model.

Any leaf-level calculation applied to the data model, such as a multiplication made row-by-row in a table, has a better definition (and query plan) in DAX. The navigation of a user hierarchy has a shorter syntax in MDX, whereas it requires numerous constructs in DAX. From a performance point of view, usually MDX does not provide a real advantage, even if it offers a second-level cache (in the formula engine) that is not available in DAX.

If you must choose a language to query data for a table-based report (such as reporting services), you should use DAX. MDX could be the language of choice if you generate results in a dynamic matrix with a variable number of rows and columns, and nested levels of aggregation and subtotals (such as in a pivot table). If you choose a client that generates MDX queries, be aware of the possible performance issues when you place a high number of attributes in rows and columns of the result set. This is a typical problem using an Excel PivotTable querying a tabular or multidimensional model and trying to get a table with detailed information with many descriptive and grouping attributes. DAX is far more efficient for such kinds of queries.

Introduction to Tabular calculation engines

Every query sent to the tabular model in Analysis Services 2016 is executed by two layers of calculation engines. Analysis Services parses both DAX and MDX queries, transforming them in query plans executed by the formula engine, which can execute any function or operation of the two languages. To retrieve the raw data and perform calculations, the formula engine makes one or more calls to the storage engine, which could be either the in-memory analytics engine (VertiPaq) or the external relational database (DirectQuery). Each data model (corresponding to a database in Analysis Services) defines which storage engine to use. As shown in Figure 1-1, the VertiPaq engine contains a copy of the data read from the data source when you refresh the data model, whereas DirectQuery forwards requests to the external data source when necessary, reducing the latency between data updates in the data source and availability of updated data in Analysis Services. The VertiPaq engine receives requests in an internal binary structure (externally described using a human-readable format called xmSQL), whereas requests sent through DirectQuery are in the SQL language supported by the data source.

Figure 1-1 Formula and storage engines in the Analysis Services tabular model.

In this section, you will find an introduction to the two storage engines so you can make an initial choice for your data model. Later in this book, you will find dedicated chapters for each of the two storage engines, with a practical guide to their use and optimization: Chapter 9 and Chapter 12.

Introduction to VertiPaq

The in-memory analytics engine used by the tabular model, also known as the VertiPaq engine, is an in-memory columnar database. Being in-memory means that all the data handled by a model reside in RAM. Being columnar means that data is organized in a separate-columns structure, optimizing vertical scanning and requiring a greater effort if an entire row must be materialized with all its columns. VertiPaq does not have additional structures to optimize queries, such as indexes in a relational database. Since a complete logical scan of a column is required for any query, data is also compressed in memory (using algorithms that allow a quick scan operation) to reduce the scan time and the memory required.

The VertiPaq engine is only one part of the execution engine that provides results to DAX and MDX queries and expressions. In fact, VertiPaq is only the storage engine that has physical access to the compressed data and performs basic aggregations, filters, and joins between tables. The more complex calculations expressed in DAX or MDX are in charge of the formula engine, which receives intermediate results from the storage engine (VertiPaq or DirectQuery) and executes the remaining steps to complete the calculation. The formula engine is often the bottleneck of a slow query using the VertiPaq storage engine (with DirectQuery this might be different). This is because the formula engine usually executes a query in a single thread (but it handles requests from different users in parallel, if necessary). In contrast, VertiPaq can use multiple cores if the database is large enough to justify the usage of multiple threads (usually requiring a minimum of 16 million rows in a table, but this number depends on the segment size used at processing time).

VertiPaq storage processing is based on a few algorithms: hash encoding, value encoding, and run-length encoding (RLE). Each value in a column is always mapped into a 32-bit integer value. The mapping can be done in one of two ways: value encoding or hash encoding. Value encoding uses a dynamic arithmetic calculation to convert from the real value into an integer and vice versa. Hash encoding inserts new values into a hash table. The 32-bit integer value is then compressed before it is stored in the columns. Using RLE, the engine sorts data so that contiguous rows having the same values in a column will get a better compression, storing the number of rows with duplicated values instead of repeating the same value multiple times.

Note

Whether you select value encoding or hash encoding depends on various factors, which are explained in more depth in Chapter 12. In that chapter, you will also learn how to improve the compression, reduce the memory usage, and improve the speed of data refresh.

Introduction to DirectQuery

The in-memory analytics engine (VertiPaq) is the default choice for any tabular model you create. However, you also have the option to avoid storing a copy of the data in memory. To do so, you use an alternative approach that converts a query to a tabular model into one or more SQL queries to the data source, without using the VertiPaq engine. This option is called DirectQuery. In this section, you learn its purpose. You will learn how to use it in your tabular model in Chapter 9.

The main benefit of using DirectQuery is to guarantee that data returned by a query is always up to date. Moreover, because Analysis Services does not store a copy of the database in memory, the size of the database can be larger than the memory capacity of the server. The performance provided by DirectQuery strongly depends on the performance and optimization applied to the relational database used as a data source. For example, if you use Microsoft SQL Server, you can take advantage of the columnstore index to obtain faster response times. However, it would be wrong to assume that a generic existing relational database could provide better performance than a properly tuned Analysis Services server for a tabular model. Usually you should consider using DirectQuery for small databases updated frequently, or for very large databases that cannot be stored in memory. However, in the latter case, you should set expectations of query refresh in the order of magnitude of seconds (or more). This reduces the user-friendliness of an interactive navigation of data.

DirectQuery is supported for a limited number of relational databases: Microsoft SQL Server (version 2008 or later), Microsoft Azure SQL Database, Microsoft Azure SQL Data Warehouse, Microsoft Analytics Platform System (APS), Oracle (version 9i and later), and Teradata (V2R6 and later). Other relational databases and/or versions are also supported, and more might be supported in the future. To verify the latest news about databases and versions supported, refer to the Microsoft documentation (https://msdn.microsoft.com/en-us/library/gg492165.aspx).

DirectQuery does not support all the features of a tabular data model or of MDX or DAX. From a data-model point of view, the main limitation of DirectQuery is that it does not support calculated tables. You can, however, use calculated columns, although with some limitations (which we will describe later).

From a DAX point of view, there are numerous functions that have a different semantic because they are converted in correspondent SQL expressions, so you might observe an inconsistent behavior across platforms by using time intelligence and statistical functions. From an MDX point of view, there are numerous limitations that affect only the MDX coding style. For example, you cannot use relative names, session-scope MDX statements, or tuples with members from different levels in MDX subselect clauses. However, there is one limitation that affects the design of a data model: You cannot reference user-defined hierarchies in an MDX query sent to a model using DirectQuery. This affects the usability of DirectQuery from Excel because such a feature works without any issue when you use VertiPaq as a storage engine.

Note

DirectQuery had many more limitations in Analysis Services 2012/2014. For example, it worked only for Microsoft SQL Server, MDX was not supported, and every DAX query was converted in a complete SQL query, without using the formula engine of the Analysis Services tabular model. And features like calculated columns and time-intelligence functions were not supported at all. For this reason, the list of restrictions for using DirectQuery was much longer. However, DirectQuery received a complete overhaul in Analysis Services 2016. In this book, you will find only information about DirectQuery used in Analysis Services with the new compatibility level, which is described in the next section.

Tabular model compatibility level (1200 vs. 110x)

Analysis Services 2016 introduced a new compatibility level, which refers to a set of release-specific behaviors in the Analysis Services engine. Databases created with previous versions of Analysis Services have a different compatibility level, and Analysis Services 2016 also supports all these. This means you can easily migrate existing databases created in previous versions to Analysis Services 2016, but you must migrate to the new data model if you want to use specific features available only in this new release. You can also create new databases in Analysis Services using an older compatibility level in case you need to use certain features that were available in a previous compatibility level (even if they were not officially supported), as you will see later in this section.

The following compatibility levels are available for tabular models:

1100 This is available in SQL Server 2012, SQL Server 2014, and SQL Server 2016.

1103 This is available in SQL Server 2012 SP1, SQL Server 2014, and SQL Server 2016.

1200 This is available in SQL Server 2016 and Azure Analysis Services.

You can create and deploy any compatibility level using SQL Server Data Tools (SSDT) for VS 2015. You can use only compatibility levels 1100 and 1103 if you use SQL Server Data Tools for Business Intelligence (SSDT-BI) for VS 2012 or 2013.

The new compatibility level 1200 stores the data model using a JSON format, whereas the previous compatibility levels 110x use an XML-based format that was inherited from the Multidimensional version of Analysis Services. Such a model was adapted for the tabular model in its first version (Analysis Services 2012), but it required a useless overhead and redundancy in code. It also had numerous effects in performance (longer time to deploy data models) and manageability (small changes in a data model produced numerous differences in files, complicating the comparison of different versions and the use of source-control systems).

Many new features in Analysis Services 2016 are available only using the new compatibility level. Therefore, it is generally a good idea to use the compatibility level 1200 when you start a new project or migrate an existing one. Such a compatibility level is also required to deploy a database on Azure Analysis Services. However, certain tools and extensions that were available in SSDT-BI for VS 2012 and VS 2013 are no longer available in SSDT for VS 2015. If you used any of these, you should consider carefully when to upgrade an existing model to the new compatibility level and to the new version of VS. The following two open-source tools are the most affected by this change:

BIDS Helper You can no longer use BIDS Helper (http://bidshelper.codeplex.com/) to edit Actions or the HideMemberIfproperty because these features are no longer supported in the compatibility level 1200. If you use BIDS Helper to edit display folders or translations, you should rely on the new equivalent features for the tabular model now supported in Analysis Services 2016.

DAX Editor If you use DAX Editor (http://www.sqlbi.com/tools/dax-editor/) to edit the list of measures of a tabular model in a single text file, make sure you download the latest version (2.0 or newer), which is compatible with both XML and JSON formats.

You can still use BIDS Helper when deploying a model to Analysis Services 2016, taking advantage of the new engine and DAX functions available in this version. However, you must use SSDT-BI for VS 2012 or VS 2013, without accessing the features available in the new data model. If you make this choice, consider it a temporary situation and define a plan for a future migration to the new model compatibility level.

Note

This book is based on the features available in the new compatibility level 1200. If you want to create a model using a 110x compatibility level, you might find differences in features, performance, and the user interface of the development tools. In that case, we suggest you rely on the documentation available for Analysis Services 2012/2014 and refer to our previous book, Microsoft SQL Server 2012 Analysis Services: The BISM Tabular Model, as a reference.

Analysis Services and Power BI

Power BI is an excellent client tool for Analysis Services. As of now, Power BI is the only widely available client tool that generates queries in DAX, so it can take full advantage of a tabular model. In fact, the engine used internally by Power BI is the same engine as Analysis Services 2016 for tabular models. Power BI is available online, consuming reports through a web browser or a mobile device. It is also available as a free desktop application called Power BI Desktop.

You can connect Power BI Desktop to Analysis Services using the Connect Live mode in Get Data. If you want to use the same database online, you must install the on-premises data gateway, which creates a bridge between the online client tool and your Analysis Services database installed on-premises, as shown in Figure 1-2.

Figure 1-2 SQL Server Analysis Services connected to Power BI through the On-Premises Data Gateway.

Although you can connect any version of the Analysis Services tabular model to Power BI, you should be aware that Power BI creates optimized DAX queries for Analysis Services 2016, using many new DAX features designed expressly to improve the performance of Power BI. When Power BI connects to Analysis Services 2012/2014, it must use an older version of the DAX language, producing less-efficient queries and slower user interaction in complex reports. This performance difference is not related to the compatibility version level. Also, databases deployed with an older compatibility level can get the full benefits of the new DAX syntax available in Analysis Services 2016. Thus, consider upgrading the server to Analysis Services 2016 as an existing data model if you want to consume it with Power BI—even if you don’t want to or cannot upgrade the model compatibility level.

Summary

This chapter discussed the tabular and multidimensional models in Analysis Services, their strengths and weaknesses, and when they should be used. The key point to remember is that the two models are very different—they’re practically separate products. You should not make the decision to use the tabular model on a project without considering whether it is a good fit for your requirements, or whether you should use a multidimensional model instead. In the next chapter, you will take a first look at how to build tabular models.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 1. Introducing the tabular model

Create new playlist

Sign In

Sign Up

Chapter 1. Introducing the tabular model

Semantic models in Analysis Services

What is Analysis Services and why should I use it?

A short history of Analysis Services

Understanding Tabular and Multidimensional

The tabular model

The multidimensional model

Why have two models?

The future of Analysis Services

Azure Analysis Services

Choosing the right model for your project

Licensing

Upgrading from previous versions of Analysis Services

Ease of use

Compatibility with Power Pivot

Compatibility with Power BI

Query performance characteristics

Processing performance characteristics

Hardware considerations

Real-time BI

Client tools

Feature comparison

Understanding DAX and MDX

The DAX language

The MDX language

Choosing the query language for Tabular

Introduction to Tabular calculation engines

Introduction to VertiPaq

Introduction to DirectQuery

Tabular model compatibility level (1200 vs. 110x)

Analysis Services and Power BI

Summary

Table of Contents for
Chapter 1. Introducing the tabular model