Chapter 4. Configure and maintain SQL Server Analysis Services (SSAS)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 4. Configure and maintain SQL Server Analysis Services (SSAS)

In addition to testing your ability to develop and query multidimensional and tabular models, the 70-768 exam tests your knowledge of a variety of administrative tasks for SSAS. You must know how to perform configuration tasks related to memory and disk management for optimal performance. Additionally, you must know which tools you can use to monitor performance, how to identify performance bottlenecks, and what steps to take to resolve performance issues. You must also understand the management of model processing. And finally, you must know how to enhance multidimensional and tabular models with key performance indicators and translations.

Skills in this chapter:

Plan and deploy SSAS

Monitor and optimize performance

Configure and manage processing

Create Key Performance Indicators (KPIS) and translations

Skill 4.1: Plan and deploy SSAS

One of the primary reasons to implement an SSAS solution, whether multidimensional or tabular, is to enable users to query data faster than they can access data in an online transactional processing (OLTP) system. In addition to modeling data correctly, you must also configure the server properly to deliver optimal performance.

This section covers how to:

Configure memory limits

Configure Non-Union Memory Architecture (NUMA)

Configure disk layout

Determine SSAS instance placement

Configure memory limits

Both multidimensional and tabular models rely on server memory to manage processing and to store data for faster responses to queries. However, the way that an SSAS server uses memory and the configuration of the server’s memory limits depends on whether it is running in multidimensional or tabular mode.

Multidimensional-mode memory limits

SSAS uses a combination of memory- and disk-based operations for processing tasks and query resolution. As the size or complexity of cubes or the number of users querying SSAS increases, the amount of memory that SSAS requires also increases.

The server’s operating system allocates physical and virtual memory to SSAS, which relies on its own memory manager to distribute memory across its component. If SSAS requires more memory than the amount available on the server, it pages to disk until it reaches specified limits. Server performance slows when this happens.

Figure 4-1 illustrates the memory model of an SSAS server. A fixed amount of physical memory on the server is allocated to the operating system and other services running on the server. When the SSAS Windows service starts, the operating system allocates memory to the Msmdsrv.exe process. A portion of this memory is non-shrinkable memory that SSAS uses for the Msmdsrve.exe process and the storage of objects required by the server, such as metadata about server objects, active user sessions, and memory required for active queries. Non-shrinkable memory is never deallocated, although it can be paged to disk.

FIGURE 4-1 Memory model of SSAS server in multidimensional mode

The remaining SSAS memory is shrinkable memory, which is used to store cached data to optimize query performance. SSAS not only caches dimension data, but it also caches certain calculation results and frequently accessed cube cell values. SSAS increases shrinkable memory as requests for memory resources increase until reaching the configured Hard Memory Limit value, at which point it rejects any new requests. Meanwhile, SSAS begins removing infrequently used objects from memory when the combined non-shrinkable and shrinkable memory exceeds the configured Low Memory Limit and aggressively removes objects when memory consumption exceeds the configured Total Memory Limit value. Shrinkable memory below the Low Memory Limit is never released until the server is restarted.

The operating system also allocates memory to the file system cache. This cache can continue to request memory, and eventually share memory with SSAS. However, it does not overlap with the SSAS memory below the Low Memory Limit.

Most of the time, the default memory limits in the SSAS configuration are adequate. You can change these limits when the server encounters low memory conditions frequently or when you find performance is not optimal. To change memory limits, you can access and update server properties in SQL Server Management Studio (SSMS) after connecting to your multidimensional instance. In Object Explorer, right-click the server, click Properties, click General in the Analysis Services Properties dialog box, and then select the Show Advanced (All) Properties check box to view the following memory-related properties, some which are shown in Figure 4-2:

Memory HardMemoryLimit This property sets the threshold at which SSAS experiences memory pressure and rejects new requests for SSAS resources. When the value is the default of zero, SSAS sets the limit to a value midway between TotalMemoryLimit and the total physical memory or virtual address space, whichever is less.

Memory LowMemoryLimit This property sets the first threshold at which SSAS begins clearing infrequently used objects from memory. The default of 65 is the percentage of physical memory or virtual address space, whichever is less. This value is usually suitable, but consider setting it to a lower value if other processes are running on the same server.

Memory TotalMemoryLimit This property is the maximum percentage of memory that SSAS uses as a threshold for aggressively releasing memory. The default value of 80 is the percentage of physical memory or the virtual memory, whichever is less. Be sure to keep this value lower than the HardMemoryLimit.

FIGURE 4-2 Advanced security properties for a multidimensional instance of SSAS

Important Memory property values

With the exception of HardMemoryLimit, which is set to 0, the default values for memory limits are expressed as percentages of memory. If you change these values, use a value below 100 to represent percentages or a value above 100 to represent bytes. For example, if you want to specify 8 gigabytes (GB) as a property value, you must specify 8 * 1024 * 1024 * 1024 bytes as 8589934592 (without commas). This expression of property values is true for both multidimensional and tabular models.

Note Guidance for configuration of multidimensional-mode memory properties

You can find recommendations for configuring the memory properties for a multidimensional-mode SSAS server, in addition to a few others that are generally changed under direction of Microsoft support in the “Analysis Services Operations Guide,” which you can download from https://msdn.microsoft.com/en-us/library/hh226085.aspx. Although written for SSAS in SQL Server 2008 R2, its guidance remains largely applicable because the underlying architecture has not changed much in subsequent releases. One exception is a change to the heap allocator that was redesigned in SQL Server 2016.

Tabular-mode memory limits

The performance of a SSAS server in tabular mode is dependent on the availability of memory on the server. Like a multidimensional-mode server, a tabular-mode server requires memory for processing operations and, to a lesser extent, query resolution. However, a tabular-mode server also requires enough memory to store each database.

In the memory model shown in Figure 4-3, you can see how a fixed amount of memory, approximately 150 megabytes (MBs), is allocated to the Msmdsrv.exe process and formula engine, and the remaining memory requirements increase for each database added to the server. As SSAS performs operations, it requests memory from the operating system until reaching the configured Total Memory Limit value. Meanwhile, upon exceeding the configured Low Memory Limit value, it attempts to remove objects from memory where possible. When memory consumption increases surpass the Total Memory Limit, the deallocation process becomes more aggressive by clearing memory in use by expired sessions and unused calculations. Upon crossing the Hard Memory Limit threshold, sessions are terminated to forcibly return memory usage below this threshold.

FIGURE 4-3 Memory model of SSAS server in tabular mode

The memory required to store tabular data is not easy to calculate because you cannot base an estimate on the number of rows in a table like you would when estimating the size of a relational database. Recall from Chapter 2, “Design a tabular BI semantic model,” that a tabular model stores data by columns, and each column contains only distinct values in a compressed format. As part of this compression process, dictionaries are created to encode data and reduce the data storage requirements. Columns with a large number of distinct values require more memory than columns with a low number of distinct values. The different options for dictionary construction make it difficult to create an estimate. Instead, you should evaluate your own data in raw form and then measure the size of the database after processing the table. Double the number of rows in the relational source and remeasure the tabular model size. That way, you can develop an estimate of the memory requirements for a single table for incremental additions of data.

With regard to processing, the memory required on the SSAS server depends on the type of processing performed and whether the database is new or if it already exists on the server. When you perform Process Full, SSAS reads each table in the database, loads it into memory, and then creates a dictionary and an index for each column. If the database already exists, both the existing copy and the new copy of data exist in memory simultaneously (unless you perform Process Clear first as described in Chapter 2) to ensure that the server can respond to queries during processing. When the processing transaction commits, the existing copy is removed and the new copy of the database is kept. If physical memory is too low during processing, paging occurs and the entire server’s performance degrades for both query and processing operations. Therefore, you should attempt to ensure that enough physical memory is on the server to support a Process Full operation, which is typically two to three times the size of the database.

Just like multidimensional-mode memory properties described in the previous section, the tabular-mode memory properties are accessible on the General page of the Analysis Services Properties dialog box that you open from the server node in the SSMS Object Explorer. You can then review or change the following memory properties:

HardMemoryLimit This property has a default value of 0, and sets this threshold at a point midway between the TotalMemoryLimit value and the server’s total physical memory, or the total virtual address space if physical memory is greater than the virtual address space of the process. At this threshold, SSAS becomes more aggressive about clearing objects from memory.

LowMemoryLimit SSAS uses this value as the threshold at which it starts clearing objects from memory. The default of 65 is the percentage of physical memory or virtual address space, whichever is less.

TotalMemoryLimit The default value of 80 is the percentage of physical memory or the virtual address space, whichever is less. Above this value, SSAS is most aggressive about removing objects from memory. If you change this value, you must set it lower than HardMemoryLimit.

VertiPaqMemoryLimit The effect of this property depends on the VertiPaqPagingPolicy’s property value. If its value is 0, this property sets the percentage of total memory that the SSAS server uses for storing all in-memory databases without paging. Otherwise, this property sets the percentage of total physical memory that the SSAS server uses before paging to disk. The default is 60.

Note VertiPaq is xVelocity

When tabular models were first introduced in SQL Server 2012, the code base referred to the storage engine as VertiPaq, but the official term for the storage engine is xVelocity as explained in Chapter 2. Because most properties, performance counters, and trace events still refer to VertiPaq, references to xVelocity in this chapter will use the term VertiPaq instead.

VertiPaqPagingPolicy This property specifies whether the SSAS server can page to disk when the SSAS server has insufficient memory. The default value is 1, which enables the SSAS to page to disk as needed, although performance can suffer as a result of paging. To disable paging, change this value to 0. More information about this property and VertiPaqMemoryLimit is available in Chapter 2.

If you set this value to 0, you can ensure the SSAS server’s overall performance does not suffer. However, if a query requires more than the available memory, SSAS kills the connection. If the majority of queries are lightweight, this option is fine. On the other hand, if many queries are complex, the server can continue to service requests even when the memory requirements exceed the server’s physical memory. In that case, there is a risk that the server performance is degraded for both complex and lightweight queries.

Note Configuration of Windows Privileges on service account

If you choose to change the VertiPaqPagingPolicy value to 0, you must add Windows privileges to the service account for the SSAS tabular instance, as described in “Configure Services Accounts (Analysis Services)” at https://msdn.microsoft.com/en-us/library/ms175371.aspx#bkmk_winpriv.

Note Guidance for configuration of tabula-mode memory properties

You can review recommendations for configuring the tabular memory in “Hardware Sizing a Tabular Solution (SQL Server Analysis Services,” which you can download from https://msdn.microsoft.com/en-us/library/jj874401.aspx.

Configure Non-Union Memory Access (NUMA)

Non-union memory access (NUMA) is the term used to describe a memory architecture in which each processor on a server has its own memory to overcome problems occurring when multiple processors attempt to access the same memory. However, for situations in which separate processors require the same data, NUMA supports moving data between memory banks, although there is a performance impact resulting from this transfer.

Before reviewing configuration options for a NUMA server, you should understand how SSAS uses processors to perform operations. The operating system creates one or more processor groups as a container for a set of up to 64 logical processors. Typically, the processors in the same processor group share physical proximity. A NUMA node is typically assigned to a single processor group, depending on the node capacity.

Many SSAS operations are multi-threaded, which means that SSAS divides an operation into multiple jobs that run in parallel to improve the overall performance of the operation. Furthermore, SSAS manages thread pools, a set of threads that is instantiated and ready for assignment to a new job request. When SSAS starts, the Msmdsrv.exe process is assigned to a specific processor group, but subsequent SSAS operations can be assigned on any logical processor in any processor group. Typically, SSAS attains its best performance when it can access all available logical processors.

SSAS is NUMA-aware by default, whether running in multidimensional or tabular mode (although, in the latter case, only when you have applied SQL Server 2016 Service Pack 1). When testing reveals that performance suffers when a SSAS query and processing loads are spread over too many processors, which can happen when a cube has a large number of partitions and there are many queries requesting data from multiple partitions, you can set affinity between SSAS operations and specific logical processors. That is, you can configure the SSAS GroupAffinity properties that restrict thread pools to specific processors. However, the effect of this approach is typically noticeable and beneficial only when the server has more than 64 logical processors. For this reason, the GroupAffinity property is undefined by default.

When you decide to set affinity for SSAS, bear in mind the maximum cores that each edition can use—24 cores in SQL Server Standard edition or up to 640 cores in Enterprise edition. When SSAS starts, it computes affinity masks for each of the following types of thread pools that you configure in the SSAS server properties:

Thread Pool Command This thread pool is used to manage XML for Analysis (XMLA) commands.

Thread Pool IOProcess This thread pool is used only by a multidimensional-mode server. SSAS uses it to perform input-output (IO) operations requested by the storage engine, described in detail in Skill 4.2, “Monitor and optimize performance.” Because these threads scan, filter, and aggregate partition data, NUMA can affect their performance.

In addition to configuring an affinity mask for the IOProcess thread pool for a multidimensional-mode server only, you canuse the PerNumaNode property to tune performance. To configure PerNumaNode for a multidimensional server, use one of the values shown in Table 4-1.

Thread Pool Parsing Long This thread pool provides threads for requests that cannot fit within a single network message.

Thread Pool Parsing Short This thread pool provides threads for requests that can fit within a single network message. Sometimes queries that can execute quickly, such as Discover or Cancel, are handled by this thread pool rather than the Query thread pool.

Thread Pool Process This thread pool manages the longer-running jobs such as the creation of aggregations and indexes and commit operations. If a multidimensional partition is configured to use ROLAP mode, this thread pool manages ROLAP queries.

Thread Pool Query This thread pool handles MDX and DAX requests, among others, that are not handled by either of the parsing thread pools.

VertiPaq Thread Pool This thread pool performs table scans for tabular model queries.

TABLE 4-1 Valid values for PerNumaNode property on a multidimensional-mode server

Note Thread pool properties

For more information about theses thread pools, see “Thread Pool Properties” at https://msdn.microsoft.com/en-us/library/ms175657.aspx.

You can configure the GroupAffinity property for any or all of these thread pools by defining a custom affinity. A custom affinity is an array of hexadecimal values corresponding to processor groups. SSAS tries to use the logical processors in the specified processor group when allocating threads within the thread pool. For each logical processor in a processor group, up to 64, you set a bitmask of 1 if it is used and 0 if it is not. You then use the bitmask to calculate the hexadecimal value that you assign to the GroupAffinity property by following these steps:

1. Note the number of logical processors and processor groups on the server. You can use the Coreinfo utility available for download at https://technet.microsoft.com/en-us/sysinternals/cc835722.aspx to list this information in its Logical Processor To Group Map section.

2. Within each processor group, list the processors in descending order, from left to right. Let’s assume that your server has 8 processors. In that case, your list looks like this:

7654 3210

3. For each processor group, compute a bitmask by replacing the processor number with a 1 to include it or a 0 to exclude it. For example, if you want to keep processors 1 through 5 and ignore the others, your bitmask looks like this:

0011 1110

4. Use a binary to converter tool to convert the bitmask to a hexadecimal value. In the current example, 0011 1110 in binary converts to 3E, which you must prefix with 0x to produce the hexadecimal value of 0x3E for the GroupAffinity property.

5. Update the GroupAffinity property for the thread pool with the calculated hexadecimal value. If the entire mask is invalid, an error displays. However, if part of the mask is valid and part is invalid, SSAS ignores the GroupAffinity property and uses all logical processors on the server.

When the SSAS server has multiple processor groups, you can assign a comma delimited list of hexadecimal values to the GroupAffinity property. Use 0x0 to ignore specific processor groups. As an example, if your server has four processor groups, but you want to exclude processor groups 0 and 4, set the GroupAffinity property like this:

0x0, 0x3E, 0x3E, 0x0

Note Current thread pool settings in SSAS log

Current thread pool settings and thread pool affinity masks are output to the msmdsrv.log file (in the Program FilesMicrosoft SQL ServerMSAS13.<instance>OLAPLog folder) each time the SSAS services starts.

Configure disk layout

Disk layout configuration applies primarily to a multidimensional-mode server because a tabular-mode server is optimized for memory storage and access. In the Analysis Services Properties dialog box, accessed in SSMS as described in the “Configure memory limits” section, you can configure the following properties for file locations:

DataDir This path determines where to place files storing database objects and metadata. The default location is Program FilesMicrosoft SQL ServerMSAS13.<instance name>OLAPData. You can override this location when you deploy a database by specifying alternate locations for one or more cube partitions as described in Skill 4.3, “Configure and manage processing.” If you do this, you must add each path to AllowedBrowsingFolders, a server property for SSAS, using a pipe delimiter between paths.

LogDir SSAS uses this path for storing log files, including the Flight Recorder and query logs created by usage-based optimization.

Flight Recorder is a short-term log that captures snapshots of dynamic management views (DMVs) to help you troubleshoot query and processing issues that occurred recently. Because this logging mechanism adds input/output (IO) overhead to the server, consider disabling it in the Analysis Services Properties by changing the Log Flight Recorder Enabled property to False. Instead, you can use other techniques described in this chapter to monitor query and processing performance on an as needed basis.

Usage-based optimization is another logging tool that you can use to guide the development of aggregations to improve the performance of common query patterns. Refer to the “Optimize and manage model design” section for more information about using usage-based optimization.

TempDir SSAS uses this path to store temporary files created during processing. Specifically, dimension processing, aggregation processing, and ROLAP dimensions might require more memory than the server has available in which case SSAS spills to disk. Consider changing this path to a separate volume from the data path when you are processing high volumes of data and the server is experiencing performance issues during processing.

Determine SSAS instance placement

For an optimal configuration of a single server, you should place a production SSAS instance on a dedicated server without other server products, such as the SQL Server database engine, SQL Server Integration Services (SSIS), SQL Server Reporting Services (SSRS), or another SSAS instance. You can use several SSAS servers in a scale-out architecture when you need to support a large number of concurrent users or process or store a high volume of data, whether using multidimensional- or tabular-mode servers.

If you decide to run both SQL Server and SSAS on the same server for licensing reasons, be aware that resource contention over memory and CPU are more likely to occur. That said, if SQL Server is used to host a small to medium data warehouse as the data source for SSAS, it is possible to run both services on the same server to achieve faster processing and reduce overall hardware costs, especially if the resources of each service are in use at different times, but only if you are running a multidimensional-mode server. A tabular mode server has high memory requirements and is better suited to running in a standalone environment. You must evaluate the workload of both servers to determine whether a single server can provide adequate resources to each.

One option for scaling SSAS is to establish one server for query loads and a second server for processing loads, as shown in Figure 4-4. Use this approach when the SSAS server must be available to queries around the clock and this lack of downtime prevents you from performing processing operations without adversely impacting query performance. In this case, you can process on a separate server and then copy the processed database from the processing server to the query server.

FIGURE 4-4 Separate processing and query servers

Note Options for copying an SSAS database

To copy an SSAS database from one server to another, use one of the following methods and view the associated reference for more information:

Synchronization, see “Synchronize Analysis Services Databases” at https://msdn.microsoft.com/en-us/library/ms174928.aspx

High-speed copying, see “SQLCAT’s Guide to BI and Analytics” at http://download.microsoft.com/download/0/F/B/0FBFAA46-2BFD-478F-8E56-7BF3C672DF9D/SQLCAT’s%20Guide%20to%20BI%20and%20Analytics.pdf

Backup and restore, see “Backup and Restore of Analysis Services Databases” at https://msdn.microsoft.com/en-us/library/ms174874.aspx

Attach and detach, see “Attach and Detach Analysis Services Databases” at https://msdn.microsoft.com/en-us/library/cc280583.aspx

When you need to process multiple large partitions, you can speed up the processing operations by using separate remote partitions for the database on the processing server, as shown in Figure 4-5, to parallelize the processing operation. Furthermore, by distributing the processing load to multiple servers, you can take advantage of the memory and processor resources of separate servers, rather than overwhelming a single server.

FIGURE 4-5 Remote partitions for processing

Note Remote partitions

More information on remote partitions is available in the article “Create and Manage a Remote Partition (Analysis Services)” at https://msdn.microsoft.com/en-us/library/ms174751.aspx.

If the number of concurrent users is adversely impacting query performance, you can create a Network Load Balancing (NLB) cluster of SSAS servers, as shown in Figure 4-6. In this case, client applications connect to a virtual IP address and the NLB mechanism forwards each query to one of the servers in the cluster in round-robin fashion. One way to keep the query servers up-to-date is to maintain a master database and then use synchronization to ensure each of the query servers has an exact copy of the master database.

FIGURE 4-6 Network load balanced query servers

Note Analysis Services clustering

See “How to Cluster SQL Server Analysis Services” at https://msdn.microsoft.com/en-us/library/dn736073.aspx for more information.

Skill 4.2: Monitor and optimize performance

As source data volumes increase or query patterns change, performance issues can become more prevalent as a result of the current model design, application configuration, or server hardware limitations. The 70-768 exam tests your ability to use monitoring tools, evaluate results, and take appropriate action to resolve the identified performance problems.

This section covers how to:

Monitor performance and analyze query plans by using Extended Events and Profiler

Identify bottlenecks in SSAS queries

Monitor processing and query performance

Resolve performance issues

Configure usability limits

Optimize and manage model design

Monitor performance and analyze query plans by using Extended Events and Profiler

Of all the performance issues that can occur on the SSAS server, slow queries are important to identify and resolve so that you can provide an optimal data access experience for your users. After all, one of the main reasons to move data into a multidimensional or tabular model is to enable fast queries. The methods you use to derive insight into the potential causes of slow queries depend on whether SSAS is running in multidimensional or tabular mode.

Exam Tip

The exam is likely to test your understanding of analyzing trance events. Be sure you understand the key events that provide details necessary to troubleshoot query performance and how to draw conclusions from this information.

Multidimensional query monitoring

Although you cannot access query plans for a SSAS multidimensional-mode server, you can use SQL Server Profiler to capture events that provide information about the division of labor between the engines that SSAS uses to process queries. This technique is useful when you need to resolve performance issues for specific queries. If you want to monitor general query performance on the server, you can use the Windows Performance Monitor as described in the “Monitor processing and query performance” section later in this chapter.

Before we examine the Profiler events that are important to monitor, let’s review the following high-level stages in the multidimensional query processing architecture, as shown in Figure 4-7:

1. When a client application, such as SSMS or Excel, sends an MDX query to the SSAS server, the Query Parser checks the query to validate the syntax. If the query is invalid, an error message is returned to the client application. Otherwise, the query processor sends the query to the SSAS formula engine.

2. The formula engine’s first step is to evaluate the query’s axes and populate axes. As described in Chapter 3, “Develop queries using Multidimensional Expressions (MDX) and Data Analysis Expressions (DAX),” the set on rows and the set on columns is resolved and the applicable members are retrieved from the dimension cache or storage. If the dimension data is not in the cache, the formula engine requests the data from the storage engine and then saves the results in the cache.

3. After the rows and columns are evaluated, the cells resulting from row and column intersections are combined with the WHERE clause to identify tuples to retrieve from the cube to compute cell data. Put another way, the formula engine evaluates the multidimensional space requested by the query.

4. SSAS builds a logical execution plan that it uses to determine the best way to retrieve cell values in the multidimensional space and then creates a physical execution plan. Empty tuples are removed from the physical execution plan if possible. The physical execution plan is then sent to the cache subsystem as a series of subcube operations. A subcube is a subset of the multidimensional space that represents a crossjoin of one or more members from a level of each attribute hierarchy. (Subcubes in this context are not the same as the subcubes that you define in the MDX script, as explained in Chapter 3.)

5. The subcube operations request the storage engine to return fact data from one of the following locations:

The first location from which the storage engine can retrieve data is the storage engine cache. This cache is empty when the SSAS service starts, and is loaded as the storage engine retrieves data from disk to speed up subsequent requests for the same data.

If the requested data is not in the cache, the next location from which the storage engine can retrieve data is the aggregations store on disk. Aggregations are explained in the “Optimize and manage model design” section later in this chapter. Aggregation data is copied into cache and then sent to the formula engine.

When the storage engine is unable to retrieve data from the previous two locations, it then retrieves values from the fact data store on disk. Just as with aggregations, the retrieved data is copied into cache and forwarded on to the formula engine.

Note File system cache

When the storage engine makes a request to retrieve data from disk, the IO subsystem first checks the Windows file system cache. Only when the data is not in the file system cache is it actually retrieved from disk. You can read more about how SSAS interacts with the file system cache in the “Analysis Services Operations Guide,” which you can download from https://msdn.microsoft.com/en-us/library/hh226085.aspx.

6. After the data for all subcube requests are received by the formula engine, it computes calculations in calculated measures. If the calculation definition comes from the model rather than the query, it has global scope and the result is stored in the formula engine cache to speed up subsequent queries from any user requesting the same calculation. The formula engine then combines the data from multiple subcube requests with the calculation results and sends the final query results to requesting client application.

FIGURE 4-7 Multidimensional query processing architecture

Note Query processing architecture in-depth

The Analysis Services MOLAP Performance Guide provides more in-depth explanation of the query processing architecture at a level of detail that the exam does not test, but is helpful for increasing your knowledge of this topic. Although written for SQL Server 2012 and 2014, it remains applicable to SQL Server 2016. You can download the guide from https://msdn.microsoft.com/en-us/library/dn749781.aspx.

You can observe these stages by capturing events in a trace by performing the following steps in SQL Server 2016 Profiler:

Note Sample multidimensional database for this chapter

This chapter uses the database created in Chapter 1, “Design a multidimensional business intelligence (BI) model,” to illustrate monitoring and configuration concepts. If you have not created this database, you can restore the 70-768-Ch1.ABF file included with this chapter’s code sample files as described in Chapter 3.

Note User permissions

You must have ALTER TRACE permission to use SQL Server 2016 Profiler if you are not a member of the sysadmin role. To save a trace to a table, you must also have a minimum of CREATE TABLE and ALTER SCHEMA permissions.

1. Click New Trace on the File menu, and then connect to the Analysis Services server on which you want to run the trace.

2. In the Trace Properties dialog box, decide whether you want to save the results for later analysis. You can choose either or both of the following check boxes:

Save To File Save the captured events to a TRC file that you can later open and review in Profiler. This method is preferable when you can launch profiling and save the file on a remote server because it reduces the performance impact on the SSAS server.

Save To Table Save the captured events to a specific SQL Server database and table that you can later review by using any query tool. An advantage of saving the trace data to a table is the ability to create reports to compare query performance before and after making changes to tune behavior.

3. Click the Events Selection tab, select the Show All Events checkbox, and then select or clear check boxes in the events list such that only the following check boxes are selected:

Progress Report Begin The storage engine begins reading partition data if this data is not available in the cache.

Progress Report Current The storage engine reports information about the partition data it is reading.

Progress Report End The storage engine has finished reading partition data.

Query Begin The query is received by SSAS. You can view the MDX query in the TextData column.

Query End SSAS has finished query processing. You can view the overall time to resolve and return the query results in the Duration column.

Calculate Non Empty Begin SSAS eliminates empty coordinates from the query results when the Non Empty keyword or NonEmpty function appear in the query. The IntegerData value is 11, the evaluation of non-empty cells is performed in cell-by-cell mode and slower than bulk mode, which is represented by any other IntegerData value.

Calculate Non Empty End The elimination of empty coordinates is complete and the duration is reported.

Get Data From Aggregation The storage engine retrieves data from an aggregation.

Get Data From Cache The storage engine retrieves data from the data cache. If you do not see many of these events, the server might not have enough memory for caching, or calculations might prevent caching.

Query Cube Begin The formula engine begins executing the query.

Query Cube End The formula engine ends query execution.

Query Dimension The storage engine retrieves dimension members. The event subclass indicates whether these members are retrieved from cache or disk.

Serialize Results Begin The process to send results back to the client begins usually after calculations are complete.

Serialize Results Current The members to place on axes are serialized and streamed back to the client.

Serialize Results End All results have been sent to the client.

Query Subcube The storage engine retrieves a subcube from cache or disk. The subcube is represented as a binary vector.

Query Subcube Verbose The storage engine retrieves a subcube from cache or disk. It repeats the information from the Query Subcube event, but provides more detail.

4. Switch to SSMS, click the Analysis Services XMLA Query button, connect to the multidimensional SSAS server, and then execute the statement shown in Listing 4-1 to clear the SSAS cache. This step allows you to evaluate query with a cold cache to reveal its behavior at its slowest.

LISTING 4-1 Clear SSAS cache

Click here to view code image

Important Clearing the SSAS cache

Clearing the SSAS cache adversely affects performance of an active SSAS server in production. For this reason, you should perform this step only when you can ensure no queries are executing or when you can use a test server that duplicates your production environment as closely as possible.

Note Clearing the file system cache

When you need to perform more rigorous query performance testing, you should also clear the file system cache and its standby cache. To do this, you can use the RAMMap utility available for download from https://technet.microsoft.com/en-us/sysinternals/rammap.aspx or execute the ClearFileSystemCache function in the Analysis Services Stored Procedure Project available for download from http://asstoredprocedures.codeplex.com/.

5. Switch back to SQL Server Profiler, and click Run to start the trace.

6. Switch back to SSMS, click the Analysis Services MDX Query button, connect to the server, and then execute the query shown in Listing 4-2.

LISTING 4-2 MDX query to trace

Click here to view code image

SELECT
[Measures].[Sales Amount Without Tax] ON COLUMNS,
[Stock Item].[Color].[Color].Members ON ROWS
FROM
[Wide World Importers DW]
WHERE
([Invoice Date].[Calendar Year].[CY2016]);

7. After executing the query, switch to SQL Server Profiler, and click the Stop Selected Trace in the SQL Server Profiler toolbar to view the trace events that fired during query execution, as shown in Figure 4-8. You can use drag and drop to rearrange the columns in the following sequence: EventClass, EventSubclass, TextData, ObjectPath, Duration, and IntegerData.

FIGURE 4-8 Profiler trace of simple MDX query

The trace events describe the following behavior during query execution:

Query Begin The server receives the query. You can see the structure of the MDX query in the TextData column.

Query Subcube The storage engine retrieves requested data from disk, as indicated by the event subclass, 2 – Non-cache Data. The TextData column contains values that are difficult to interpret visually, but reflect the dimensions and measure groups defined in the requested subcube. Not visible in Figure 4-8 is the Duration column, which contains a value of which you should take note as described later in this section. In this case, the duration is 28 milliseconds.

Query Subcube Verbose This trace event is another version of the Query Subcube event that provides more details about the subcube, as shown here:

Click here to view code image

Dimension 0 [City] (0 0 0 0 2 0 0 0)
[City]:0  [State Province]:0  [Country]:0  [Continent]:0
[Sales Territory]:[Far West]  [Region]:0  [Subregion]:0
[Latest Recorded Population]:0

Dimension 1 [Invoice] (0 0)  [Sale Key]:0  [Invoice]:0

Dimension 2 [Customer] (0 0 0)  [Customer]:0  [Category]:0  [Buying Group]:0

Dimension 3 [Bill To Customer] (0 0 0)  [Customer]:0  [Category]:0
[Buying Group]:0

Dimension 4 [Delivery Date] (0 0 0 0)  [Date]:0  [Calendar Month]:0
[Calendar Year]:0  [ISO Week Number]:0

Dimension 5 [Invoice Date] (0 0 0 0)  [Date]:0  [Calendar Month]:0
[Calendar Year]:0  [ISO Week Number]:0

Dimension 6 [Sales Reason] (0)  [Sales Reason]:0

Dimension 7 [Stock Item] (0 * 0 0)  [Stock Item History]:0  [Stock Item]:*
[Color]:0  [Size]:0

Each dimension is listed separately and includes each attribute. Following the dimension name is a vector, a series of values enclosed in parentheses. The position of each value in the vector corresponds to an attribute in the dimension and defines the subcube request according to the following rules:

0 requests the default member of the attribute

* requests all members of the attribute

Non-zero requests a single member by DataID, the internal identifier of an attribute member

+ requests more than one attribute member

– requests a limited number of members typically due to a slice on a different attribute in the dimension

To put this explanation into context, let’s review the subcube definition with respect to the City dimension, which is listed first in the Query Subcube Verbose event. The City vector (0 0 0 0 2 0 0 0) describes the request of the default member of each attribute except for the Sales Territory attribute (in position 5) in which case the member with DataID 2, Far West, is requested. You can also see the attributes listed by position, with City in position 1, State Province in position 2, and so on. When the dimension’s subcube definition includes a non-zero value to request a specific DataID, you can see the corresponding member name displayed next to the attribute name. In this case, the display of [Sales Territory]:[Far West] corresponds to the 2 in the fifth position in the dimension vector.

Next, notice in Dimension 7, the second position of the Stock Item vector is an asterisk (*). It corresponds to the second attribute, Stock Item, which also displays an asterisk. In this case, the subcube requests all members of the Stock Item attribute.

The combination of these two vectors defines the multidimensional space retrieved from the cube. More specifically, the formula engine requests all cells for any stock item with a non-empty measure in the Sale measure group in the Far West sales territory.

Query Cube Begin The formula engine begins executing the query.

Query Dimension You can see by the event subclass, 2 – Non-cache Data, that the formula engine requests dimension members from disk. The ObjectPath columns indicates that the request refers to the Date dimension.

Calculate Non Empty Begin The IntegerData value of 3 indicates that the query execution occurs in bulk evaluation mode.

Query Dimension You can see by the event subclass, 1 – Cache Data, that this time the formula engine requests Date dimension members from cache. Something in the execution plan triggered a second request for the Date dimension, but the trace provides no insight into the execution plan.

Calculate Non Empty End The duration of the process to eliminate empty cells is 1 millisecond.

Serialize Results Begin and Serialize Results Current There are two Serialize Results Current events for each axis that SSAS serializes: Axis 0 (columns), Axis 1 (rows), and Slicer Axis (the query’s WHERE clause). The ProgressTotal columns lists the number of members returned for each axis. In this case, Axis 0 has 1 member, Axis 1 has 9 members, and the Slicer Axis has 1 member.

Progress Report Begin and End The TextData column for the Progress Report Begin event explains how the storage engine retrieves data. In this case, the storage engine is reading data from the Sale partition on disk.

Query Subcube and Query Subcube Verbose The storage engine again retrieves requested data from disk, as you can see by the event subclass, 2 – Non-cache Data. The Duration column for this event is 23 milliseconds. The TextData column in the Query Subcube Verbose event is nearly identical to the previous occurrence of this event, except for the vector for Dimension 5 [Invoice Date] as shown here:

Click here to view code image

Dimension 5 [Invoice Date] (0 0 4 0) [Date]:0 [Calendar Month]:0
[Calendar Year]:[CY2016] [ISO Week Number]:0

Serialize Results Current and Serialize Results End This time the cell intersections of the axes are returned to the client now that the data has been read from two subcubes.

Query Cube End The formula engine ends executing the query.

Query End SSAS ends all work related to the query. The important value to note is the value in the Duration column, currently not visible, which in this case is 98 milliseconds.

Note Evaluation of cached queries

If you were to start a new trace and execute the statement in Listing 4-2 again, the Query Subcube events are replaced by a single new event, Get Data From Cache, because the data retrieval performed by the first execution of the query loaded the cache. Consequently, the duration is reduced significantly. On the author’s computer, the duration of the cached query is 11 milliseconds. This type of behavior is not a candidate for query tuning, but illustrates the difference between queries that retrieve non-cached data versus cached data. To properly evaluate the impact of changes that you make to the design of a query, you should clear the cache between each query execution. As mentioned previously in this section, you should consider clearing the file system cache for the more rigorous testing.

In-memory tabular query monitoring

Before considering how to monitor query performance for in-memory tabular models, it is important to first understand the query architecture for in-memory tabular models, shown in Figure 4-9. The SSAS processes queries for this model type by following these steps:

1. The SSAS query parser first evaluates whether the incoming request is a valid DAX or MDX query. (Tools like Excel or SQL Server Reporting Services (SSRS) can send MDX requests to a tabular model.)

2. If the query is an MDX query, SSAS invokes the MDX formula engine, which then sends a DAX request for measure calculations to the DAX formula engine. The MDX formula engine can request a measure embedded in the tabular model or request a calculation defined in the WITH clause of the MDX query. It can also request dimension from the VertiPaq storage engine. The MDX formula engine caches measures unless the MDX query contains a WITH clause.

3. The DAX formula engine receives either a DAX query request from the parser or a DAX request for measure calculations from the MDX formula engine. Either way, the DAX formula engine generates a query plan that it sends to the VertiPaq storage engine.

4. The VertiPaq storage engine processes the query plan received from the DAX formula engine. The storage engine is multi-threaded and scales well on multiple cores. It can scan large tables very efficiently and quickly. It can also evaluate simple mathematical operations, but pushes more complex operations back to the formula engine. If a calculation is too complex, it sends a callback to the formula engine.

5. The storage engine returns its results to the formula engine which compiles the data and returns the query results to the client application. It maintains a short-term VertiPaq cache to benefit multiple requests for the same data in the same query. An ancillary benefit is the availability of this data for subsequent queries for a period of time.

FIGURE 4-9 Query processing architecture for in-memory tabular models

The DAX formula engine produces the following two types of query plans for tabular models:

Logical query plan Contains an execution tree used by the physical plan and provides insight into the physical plan when a query is complex. The engine produces this query plan quickly prior to creating the physical plan.

Physical query plan Describes how the engine actually executes a query. It lists the operators and parameters used during execution, but can be difficult to interpret.

Note Tabular model query plan interpretation

The exam does not test you on an in-depth interpretation of a logical or physical query plan, but does require you to know the steps necessary to acquire the query plan. If you want additional information about logical and physical query plans, see “DAX Query Plan, Part 1, Introduction” at http://mdxdax.blogspot.com/2011/12/dax-query-plan-part-1-introduction.html, “DAX Query Plan, Part 2, Operator Properties” at http://mdxdax.blogspot.com/2012/01/dax-query-plan-part-2-operator.html, “DAX Query Plan, Part 3, Vertipaq Operators” at http://mdxdax.blogspot.com/2012/03/dax-query-plan-part-3-vertipaq.html, and “Whitepaper: Understanding DAX Query Plans” at https://www.sqlbi.com/articles/understanding-dax-query-plans/.

Note Sample tabular database for this chapter

This chapter uses the database created in Chapter 2 to illustrate monitoring and configuration concepts. If you have not created this database, you can restore the 70-768-Ch2.ABF file included with this chapter’s code sample files as described in Chapter 3. Profiler trace

You can use SQL Server Profiler (or Extended Events as explained later in this section) to create a trace for a tabular-mode SSAS server, much like you can for a multidimensional-mode server. To do this, perform the following steps:

1. Click New Trace on the File menu and connect to your tabular instance. You can save the trace to a file or to a SQL Server table, if you like.

2. Click Events Selection, select the Show All Events check box, and then select or clear check boxes in the events list such that only the following check boxes are selected:

Query Begin The query is received by SSAS. You can view the DAX or MDX query in the TextData column.

Query End The query processing is complete. The Duration column displays the time required to resolve and return the query results.

DAX Query Plan The TextData column contains the logical or physical query plan. If the originating query is MDX, multiple DAX query plans can be generated.

VertiPaq SE Query Begin The storage engine retrieves data from memory. The TextData column shows the tables, columns, and measures requested.

VertiPaq SE Query Cache Match The storage engine uses its short-term cache rather than perform a scan to get required for a calculation.

VertiPaq SE Query End The storage engine completes data retrieval. The Duration column specifies the time required to complete the operation.

3. Click Run to start the trace.

4. Switch to SSMS, click the Analysis Services MDX Query button, connect to the tabular instance, execute the query in Listing 4-1 to clear the VertiPaq and MDX cache (replacing the database ID with 70-768-Ch2), and then execute the query shown in Listing 4-3.

LISTING 4-3 DAX query to trace

Click here to view code image

EVALUATE
SUMMARIZE(
    FILTER(
        'Sale',
        RELATED('Date'[Calendar Year]) = 2016
    ),
    'Stock Item'[Color],
    "Total Sales",
        'Sale'[Total Sales]
)

5. After executing the query, switch to SQL Server Profiler, and click the Stop Selected Trace in the SQL Server Profiler toolbar to view the following trace events:

Query Begin The server receives the query, the text of which you can see in the TextData column.

DAX Query Plan The event subclass, 1 – DAX VertiPaq Logical Plan, indicates the first query plan generated by the formula engine, shown here:

Click here to view code image

AddColumns: RelLogOp DependOnCols()() 0-1 RequiredCols(0, 1)('Stock Item'[Color],
''[Total Sales])
    GroupBy_Vertipaq: RelLogOp DependOnCols()() 0-95
    RequiredCols(0, 17, 18)('StockItem'[Color], 'Sale'[Invoice Date Key],
    'Sale'[Stock Item Key])
        Filter_Vertipaq: RelLogOp DependOnCols()() 0-95
        RequiredCols(57, 78, 79)('Stock Item'[Color], 'Sale'[Stock Item Key],
        'Sale'[Invoice Date Key])
            Scan_Vertipaq: RelLogOp DependOnCols()() 0-95
            RequiredCols(36, 57, 78, 79)('Date'[Calendar Year],
            'Stock Item'[Color], 'Sale'[Stock Item Key],
            'Sale'[Invoice Date Key])
            'Date'[Calendar Year] = 2016: ScaLogOp DependOnCols(36)
            ('Date'[Calendar Year]) Boolean DominantValue=FALSE
    Sum_Vertipaq: ScaLogOp DependOnCols(0)('Stock Item'[Color]) Double
    DominantValue=BLANK
        Scan_Vertipaq: RelLogOp DependOnCols(0)('Stock Item'[Color]) 96-191
        RequiredCols(0, 184)('Stock Item'[Color], 'Sale'[Total Excluding Tax])
        'Sale'[Total Excluding Tax]: ScaLogOp DependOnCols(184)
        ('Sale'[Total Excluding Tax]) Double DominantValue=NONE

A logical query plan consists of multiple lines of texts in which each line is an operator and indented according to its level within the execution hierarchy. The first line defines the outermost operator, such as AddColumns in this example. If a line is indented more than the preceding line, the line is subordinate to the preceding line, its parent, and deliver its output to the parent line. Therefore, to understand the query plan, you must start at the bottom level of each branch and work backwards through the query plan. To interpret the query plan for the query in Listing 4-3, consider the following points:

The two Scan_Vertipaq operators in the query plan initiate the data retrieval process and provide output to be processed by their respective parents in separate branches of the query plan. The goal of this operator is to join a root table to its related tables as defined by the requested columns.

In Branch 1, which begins with the first Scan_Vertipaq operator, the RelLogOp operator type produces a table identified in the RequiredCols list: ‘Date’[Calendar Year], ‘Stock Item’[Color], ‘Sale’[Stock Item Key], and ‘Sale’[Invoice Date Key]. This list is represented by both column number and name following the RequiredCols parameter.

The output of the Scan_Vertipaq operator and its sibling predicate Date’[Calendar Year] = 2016 are passed to the Filter_VertiPaq operator in Branch 1 to restrict the rows in the output.

Branch 1 continues by sending the results of the Filter_Vertipaq operator to the GroupBy_Vertipaq operator where a grouping of rows by the following columns is performed: ‘Stock Item’[Color], ‘Sale’[Invoice Date Key], and ‘Sale’[Stock Item Key].

Meanwhile, Branch 2 begins with the second Scan_Vertipaq operator. It returns a table containing two columns, specified in both DependOnCols and RequiredCols to return a table containing ‘Stock Item’[Color] and ‘Sale’[Total Excluding Tax].

The Scan_Vertipaq results in Branch 2 are sent to the Sum_Vertipaq operator to which the SUM aggregation is applied. This operation returns summed sales by the ‘Stock Item’[Color] column as a two-column table.

The last step in the query plan is the outermost AddColumns operator, which combines the results of the two branches to produce a two-column table consisting of ‘Stock Item’[Color] and [Total Sales].

VertiPaq SE Query Begin and VertiPaq SE Query End These event types have one of two event subclasses, VertiPaq Scan and Internal VertiPaq Scan. Internal VertiPas Scan is the optimized version of VertiPaq Scan. Sometimes you can see VertiPaq SE queries occur before the formula engine creates the logical query plan. This situation occurs when the formula engine needs information from the database to determine how to optimize the query. Other times queries are used to spool intermediate results into memory to optimize the physical query plan or query execution.

Queries are written by using a pseudo-SQL statement for easier comprehension. Each VertiPaq SE Query Begin/End pair corresponds to a Scan_VertiPaq operator in the logical query plan. The operators in Branch 2 and Branch 1 correspond to the first and second VertiPaq scans respectively, shown here:

Click here to view code image

-- First VertiPaq Scan
SET DC_KIND="AUTO";
SELECT
[Stock Item (22)].[Color (86)] AS [Stock Item (22)$Color (86)],
SUM([Sale (25)].[Total Excluding Tax (116)]) AS [$Measure0]
FROM [Sale (25)]
  LEFT OUTER JOIN [Date (16)] ON
    [Sale (25)].[Invoice Date Key (107)]=[Date (16)].[Date (61)]
  LEFT OUTER JOIN [Stock Item (22)] ON
    [Sale (25)].[Stock Item Key (106)]=[Stock Item (22)].[Stock Item Key (83)]
WHERE
  [Date (16)].[Calendar Year (68)] = 2016;

-- Second VertiPaq Scan
SET DC_KIND="AUTO";
SELECT
[Stock Item (22)].[Color (86)] AS [Stock Item (22)$Color (86)],
[Sale (25)].[Stock Item Key (106)] AS [Sale (25)$Stock Item Key (106)],
[Sale (25)].[Invoice Date Key(107)] AS [Sale (25)$Invoice Date Key (107)]
FROM [Sale (25)]
  LEFT OUTER JOIN [Date (16)] ON
    [Sale (25)].[Invoice Date Key (107)]=[Date (16)].[Date (61)]
  LEFT OUTER JOIN [Stock Item (22)] ON
    [Sale (25)].[Stock Item Key (106)]=[Stock Item (22)].[Stock Item Key (83)]
WHERE
  [Date (16)].[Calendar Year (68)] = 2016;

DAX Query Plan The event subclass, 2 – DAX VertiPaq Physical Plan, indicates the second DAX Query Plan in the trace is the physical query plan, which looks like this:

Click here to view code image

AddColumns: IterPhyOp LogOp=AddColumns IterCols(0, 1)('Stock Item'[Color],
''[Total Sales])
    Spool_Iterator<SpoolIterator>: IterPhyOp LogOp=GroupBy_Vertipaq
    IterCols(0)('Stock Item'[Color]) #Records=9 #KeyCols=1 #ValueCols=0
        AggregationSpool<GroupBy>: SpoolPhyOp #Records=9
            Spool_Iterator<SpoolIterator>: IterPhyOp LogOp=
            GroupBy_Vertipaq IterCols(0, 17, 18)
            ('Stock Item'[Color], 'Sale'[Invoice Date Key],
            'Sale'[Stock Item Key]) #Records=17962 #KeyCols=96 #ValueCols=0
                ProjectionSpool<ProjectFusion<>>: SpoolPhyOp #Records=17962
                    VertipaqResult: IterPhyOp #FieldCols=3 #ValueCols=0
    SpoolLookup: LookupPhyOp LogOp=Sum_Vertipaq LookupCols(0)('Stock Item'[Color])
    Double #Records=9 #KeyCols=96 #ValueCols=1 DominantValue=BLANK
        ProjectionSpool<ProjectFusion<Sum>>: SpoolPhyOp #Records=9
            VertipaqResult: IterPhyOp #FieldCols=1 #ValueCols=1

Just like the logical query plan, the physical query plan starts with its innermost lines in the hierarchical list and includes two branches that use the VertipaqResult operator. This operator maps to the VertiPaq SE Query event. When evaluating performance, take note of the #Records value which impacts memory when used with the SpoolPhyOp operator. This operator indicates the records are materialized and stored in memory.

Moving up the hierarchy in the first branch, the VertiPaq operator materializes 17,962 rows which then are reduced to 9 records by the GroupBy_VertiPaq operator and sent to the Spool_Iterator. This latter operator’s sibling is the SpoolLookup operator, which gets the VertiPaq result from Branch 2 and makes these rows available for row-by-row lookups to the iterator. The AddColumns operator receives these results and produces a two column table containing Color and Total Sales.

Query End The server finishes the query and returns the overall duration.

Extended Events trace

Another way to capture trace information for tabular queries is to create an Extended Events session in SSMS, which you can do by performing the following steps:

1. Connect to the tabular instance, expand the Management node in Object Explorer, expand the Extended Events node, right-click the Sessions node, and click New Session.

2. In the New Session dialog box, type a name for the session, such as DAX query plan.

3. Click Events, and then double-click the following events to add them to the Selected Events list: DAX Query Plan, Query Begin, Query End, VertiPaq SE Query Begin, VertiPaq SE Query Cache Match, and VertiPaq SE Query End.

4. Click the Data Storage tab, click Add, and then select one of the following target types in the Type drop-down list.

Event_file Store the session data in an XEL file.

Event_stream Enable the Watch Live Data feature in SSMS. Use this option to follow the example in this section.

Ring_buffer Store the session data in memory until the service restarts.

5. Click OK, right-click the DAX Query Plan node in the Session folder in Object Explorer, click Watch Live Data, expand the DAX Query Plan node, and then click Event_stream to open the Extended Events viewer as a new window in the query editor.

6. Switch to a MDX query window and execute the query in Listing 4-3.

7. In the Extended Events window, you can see the list of events that fired during the DAX query execution. Click the first DAX Query Plan in the list, and then scroll to locate the TextData row in the Details pane below the event list. Right-click this row and click Show Column In Table to include the TextData column next to each event at the top of the window, as shown in Figure 4-10.

FIGURE 4-10 An Extended Events session for a tabular instance

Note XMLA script for Extended Events

Instead of using the SSMS graphical interface to manage Extended Event sessions, you can execute XMLA scripts. For details, see “Monitor Analysis Services with SQL Server Extended Events” at https://msdn.microsoft.com/en-us/library/gg492139.aspx.

DirectQuery query plans

The SSAS engine translates DAX queries to a DirectQuery model into an equivalent SQL query, which it sends to the source database. You must use Profiler or Extended Events to capture the resulting query plan from the database engine if SQL Server is the source or use a monitoring tool applicable to other data sources. If you use Profiler on the tabular instance to monitor query execution, you can include the DirectQuery Begin and DirectQuery End events to review the SQL query that was generated and its duration, but there is no option to optimize this query.

Identify bottlenecks in SSAS queries

Once you know how to generate a query trace for an MDX query and how to obtain a query plan for a DAX query, you can explore these items to reveal more information about query execution. Whether you are analyzing performance issues for MDX or DAX queries, the process of identifying the source of the bottleneck is similar. By comparing the ratios of formula engine and storage engine time to overall query time, you can determine whether your query tuning efforts should focus on improving formula engine time or storage engine time. Refer to the “Resolve performance issues” section later in this section for guidance on specific actions you can take for each of these bottlenecks.

To find the relative time required by each engine for an MDX query executed on a multidimensional-mode server, start by computing the sum of the duration of each Query Subcube event (or the Query Subcube Verbose event, which is another view of the same event). This value is the total time consumed by the storage engine. There are no events in the trace that explicitly define the time consumed by the formula engine, but you can calculate formula engine time by subtracting the total storage engine time from the query duration. To apply this calculation to the trace for Listing 4-2, the formula to compute formula engine time is 98 – (28 + 23) and the result is 47. The next step is to compute the ratios of formula engine time and storage engine time relative to query execution time. In this case, the formula engine ratio is 47/98 or 48% while the storage engine ratio is 51/98 or 52%. In this case, that means you should tune the storage engine for this query because it consumes the majority of the query execution time. However, a query that takes 98 milliseconds is not one that requires tuning, so the results of this example serves only to illustrate the process that you should follow when analyzing long-running queries in a production environment.

Determining the source of tabular model bottlenecks begins by computing storage engine time. To do this, sum the duration of the VertiPaq SE Query End events that have the VertiPaq Scan event subclass. Subtract this value from the duration listed for the Query End event to compute the formula engine time. As an example, when the two VertiPaq SE Query End events for the trace in Listing 4-3 total 36 milliseconds and the Query End duration value is 83 milliseconds, the storage engine time ratio to total query time is 43%. That leaves 47 milliseconds for formula engine time for which the ratio to total query time is 56%. In theory, the formula engine is the bottleneck for this query, but a query that runs in 83 milliseconds is not a candidate for optimization.

Monitor processing and query performance

To properly manage processing and query operations on an SSAS server, you must be familiar with other monitoring tools to capture the information necessary to diagnose performance issues. These monitoring tools include Windows Performance Monitor, dynamic management views (DMVs), and Windows Task Manager. You can also use SQL Server Profiler to capture trace events for processing operations.

Exam Tip

You should be able to select the appropriate monitoring tool for a diagnostic task, describe the steps necessary to conduct a troubleshooting session, and recognize an indicator of a problem.

Performance Monitor

By using Performance Monitors, you can gather information about server resources and measure the impact of processing and queries on these resources. One strategy is to establish a benchmark and periodically measure the processing time to determine if the time required is increasing and potentially exceeding the preferred processing window. Another strategy is to monitor performance counters while you troubleshoot a specific operation. You can take measurements before you start making changes and then take new measurements after you make incremental changes to assess the impact. Either way, you set the collection period for performance counters to a frequency of 15 seconds, so you can gather information without adding noticeable overhead to the server.

Specific performance counters to monitor are described in the “Resolve performance issues” section later in this chapter. Regardless of which counters you need to monitor, use the following steps to begin the collection process:

1. Open Windows Performance Monitor, click Performance Monitor, and click the Add button in the toolbar.

2. In the Add Counters dialog box, select the server to monitor in the Select Counters From Computer drop-down list (or use the Browse button to locate a server on your network), and then scroll through the Available Counters list to locate the set of counters to monitor. The sets for multidimensional- or tabular-mode servers begin with MSAS13 or MSOLAP$<instance name>. In this chapter, we use MSAS13 to refer to multidimensional counter and MSOLAP$TABULAR to refer to tabular counters.

3. Expand a counter set, select a specific counter, such as Memory Usage KB, and click the Add button. Continue adding counters as needed. When finished, click OK. You can then view real-time metrics for the selected counters.

Note Configuration of counter properties

If you are monitoring multiple counters at the same time, right-click a counter and select Properties to change Color, Width, Style values to more easily distinguish between counters. You might also need to reset the scale for a counter.

When you need to monitor server resource consumption during the processing or querying of a tabular model, you should ensure you are testing a single database in isolation so that you can be sure you are capturing measurements for that database as accurately as possible. To isolate your database, perform the following steps:

1. Open SSMS, connect to the tabular instance, expand the Databases folder in preparation for processing. If your SSAS tabular-mode server hosts databases other than the example database, you should detach them to isolate the focus of monitoring to a single database. To do this, right-click the database to temporarily remove it, click Detach, and click OK.

2. When you complete your performance analysis, you can reactivate the databases by right-clicking the Databases folder and clicking Attach. In the Attach Database dialog box, click the ellipsis button and then navigate to the Data folder containing the database to attach. Select the database and click OK twice to make the database available for queries again.

Dynamic management views

You can execute DMVs in the MDX query window after connecting to the SSAS server. The following DMVs are more helpful for assessing the impact of processing operations rather than for troubleshooting queries:

$System.discover_object_activity Use this DMV to see which objects are consuming the most CPU time during a processing operation by sorting the object_cpu_time_ms column in descending order.

$System.discover_object_memory_usage Use this DMV to find the objects consuming the most memory after processing by sorting the object_memory_shrinkable column in descending order.

Note SSAS DMVs

See “Use Dynamic Management Views (DMVs) to Monitor Analysis Services” at https://msdn.microsoft.com/en-us/library/hh230820.aspx for more information about these and other DMVs.

Task Manager

The Performance tab of the Task Manager can provide clues that can help you quickly ascertain whether the formula engine or the storage engine is being used for a query or processing operation. This technique applies to both multidimensional- and tabular-mode servers. To use this approach, you should isolate your environment to a single user and the single query or processing operation that you are measuring.

Open the CPU graph and switch it to the view of logical processors if necessary. Then launch the operation to test and observe the activity of the processors. If all processors are active during the entire operation, the storage engine is in use. If only a single processor is active, your operation is using the formula engine.

Trace events for processing

The processing commands described in Skill 4.3 for a multidimensional model and in Chapter 2 for a tabular model allow you to manage the data refresh process at varying levels of granularity. In general, the processing architecture consists of a nested series of parent and child jobs. These jobs are established by the object hierarchy within the database, dependencies between objects, and the type of processing job.

Just as you can use SQL Server Profiler to capture trace events for queries, as described earlier in this chapter, you can also use it to capture the trace events for processing operations.

When tracing processing, include the following events in your trace:

CommandBegin SSAS receives the XMLA command to process an object. This event also applies to commands for data definition, backup, and restore operations.

CommandEnd SSAS completes the XMLA command.

Progress Report Begin SSAS begins one of processing operations identified in the event subclass. Although there is some overlap, the set of event subclasses for multidimensional processing is different from the set for tabular processing.

Progress Report End A processing operation ends.

Progress Report Error An error occurs during an operation.

Note More information about tracing processing events

Bob Duffy has created tools for baselining processing events for both multidimensional and tabular models and describes how to analyze these events in his posts “Tool for baselining MOLAP Processing” at http://blogs.prodata.ie/post/Tool-for-baselining-MOLAP-Processing.aspx and “Baselining Tabular Model Processing with a trace file” at http://blogs.prodata.ie/post/Baselining-Tabular-Model-Processing-with-a-trace-file.aspx. These two posts also provide diagrams of the nested objects in the processing architecture.

Resolve performance issues

Analysis of trace events, query plans when available, and performance counters can collectively provide the clues you need to identify bottlenecks and proceed with taking action to resolve processing or performance issues. This section reviews some of the more common problems you can encounter, explains how to identify the root of the problem, and provides recommendations for next steps.

Exam Tip

Although the exam does not test your knowledge of every possible approach to problem resolution, be prepared to identify an appropriate action for more commonly encountered problems.

Multidimensional performance issues

Server tuning efforts should start with optimizing query performance for a multidimensional-mode server, and then shift to processing optimization once queries are executing satisfactorily. As explained in the “Identify bottlenecks in SSAS queries,” query bottlenecks can arise in either the formula engine or the storage engine. Tuning server memory and CPU utilization can be important tasks for both query and processing tuning as well. You most likely will need to tune the relational layer to optimize processing.

Note Storage engine tuning requires model optimization

When your performance testing identifies the storage engine as a bottleneck for queries, there are specific design changes that you can make to the multidimensional model to reduce or eliminate bottlenecks. Refer to the “Optimize and manage model design” section for changes that you can make to improve storage engine performance.

MDX query tuning

If you find the source of a query bottleneck is the formula engine, you should use a process of elimination to isolate the problem to a specific calculation in your query and then try to restructure the MDX. If the problem calculation is not query-scoped, but defined in the cube, you can comment out all calculations in the MDX script (except the CALCULATE command) and add back each calculation independently until you find the problem.

The most common reason for a slow query is the use of cell-by-cell evaluation instead of bulk-mode evaluation. Try rewriting your calculations if you find any of the following MDX constructs in a slow query:

Set alias You can define a set in the WITH clause in a query and then reference it in an expression, such as using it as the first argument of a SUM function. Replace the set alias in the expression with the set expression.

Late binding in functions When you create an expression that applies a function to the CURRENTMEMBER function, the function cannot be evaluated efficiently and must be evaluated only in context of the current row or column. Functions for which this late binding behavior impacts query performance include LINKMEMBER, STRTOSET, STRTOMEMBER, and STRTOVALUE.

Analysis Services Stored Procedure A user-defined function in an Analysis Services Stored Procedure (ASSP) always evaluates in cell-by-cell mode. The only way to avoid this behavior is to remove the user-defined function from the query.

Cell-level security The application of rules defined in cell-level security (described in Chapter 3) forces SSAS to use cell-by-cell mode. If you must restrict users from viewing certain measures and cannot resolve query performance in another way, consider using separate cubes.

Exam Tip

You can find additional examples of MDX constructs using cell-by-cell evaluation and a discussion of calculation best practices in the Analysis Services MOLAP Performance Guide, available at https://msdn.microsoft.com/en-us/library/dn749781.aspx. Experiment by creating similar queries in the 70-768-Ch1 database using the slower construct and the recommendation adjustments to monitor the effect of tuning a query.

One way to validate a problem calculation is by using cell-by-cell evaluation to compute the ratio between two performance counters in the MSAS13:MDX set, Number Of Cell-By-Cell Evaluation Nodes, and Number Of Bulk-Mode Evaluation Nodes. If the ratio is a high value, focus on finding and rewriting the calculation that is failing to use bulk-mode.

Cache warming

Sometimes the bottleneck on a multidimensional-mode server can be a combination of the formula engine and storage engine. When this occurs, the trace includes many Query Subcube events that individually do not consume much time. However, collectively the total of these events can be significant. This behavior is known as IO thrashing. When storage optimization techniques fail to significantly reduce the number of these Query Subcube events, you can warm the cache by populating it with the data that these subcube requests retrieve after a service restart and in advance of user queries. That way, the data can be retrieved in bulk instead of piecemeal and subsequent user queries can execute faster. Because the cache is periodically cleaned, you might need to schedule these queries to warm the cache from time to time.

Note Cache warming techniques

There are multiple ways that you can warm the cache, but the preferred method is to use the CREATE CACHE statement as Nicholas Dritsas recommends in “How to warm up the Analysis Services data cache using Create Cache statement?” at https://blogs.msdn.microsoft.com/sqlcat/2007/04/26/how-to-warm-up-the-analysis-services-data-cache-using-create-cache-statement/. Chris Webb explores the issues to consider when warming the cache and also recommends the CREATE CACHE statement above other methods as he explains in “Building a Better Cache-Warmer, Part 1: The Storage Engine Cache” at https://blog.crossjoin.co.uk/2009/09/30/building-a-better-cache-warmer-part-1-the-storage-engine-cache/.

Memory management

When memory is scarce on the server, Analysis Services spills data to disk during the following types of operations:

Dimension processing Generally, dimension processing does not require much memory and is able to complete successfully. However, if you change the Processing Group property on a dimension from its default of ByAttribute to ByTable, as described in Skill 4.3, the memory requirements of dimension processing increase and can overwhelm available resources. Therefore, you should use the ByTable setting only if the dimension data fits in memory. Even if you keep the ByAttribute setting, the processing of a large dimension creates a collection of hash tables that collectively can exceed available memory.

Aggregation processing There are two server properties (which you can reconfigure only by changing values in the server’s msmdsrv.ini file), AggregationMemoryLimitMin and AggregationMemoryLimitMax, that allocate memory explicitly for processing the aggregations of a single partition.

The AggregationMemoryLimitMin sets the maximum amount of physical memory to allocate for building a partition’s aggregations and indexes. The default is 80. Consider lowering this number to free up memory for processing other partitions.

The AggregationMemoryLimitMax property sets the minimum physical memory to allocate for aggregations and indexes during partition processing. The default is 10 and applies to each partition that is processed in parallel. For example, if there are four concurrent jobs to process partitions, SSAS allocates forty percent of reserved memory to processing. If this allocation is insufficient, SSAS blocks some of the partitions from processing until memory is released. You can increase parallelism by decreasing this value.

If the AggregationMemoryLimitMax value is too low, SSAS spills to disk to continue its processing operations. Therefore, your first option is to consider changing the configured limits. (Decreasing the AggregationMemoryLimitMax also allows you to potentially process more partitions in parallel.) Another option to prevent aggregation processing from spilling to disk is to design aggregations that are small enough to fit in memory. Yet another option is to reduce the number of partitions that process in parallel and to separate processing operations by using ProcessData and ProcessIndex operations as described in Skill 4.3. Last, you can increase memory limits for SSAS or add physical memory to the server.

ROLAP dimension attribute stores The choice to implement a ROLAP dimension typically results from the need to support drillthrough actions to return data from a degenerate dimension stored in a fact table configured as a ROLAP partition. In this case, the volume of data is often too large to hold in memory and there is little you can do to resolve this issue other than configure the TempDir property for SSAS to use a folder on the fastest possible disk.

The goal of managing memory on a multidimensional-mode server is to prevent its memory usage from reaching TotalMemoryLimit. As long as SSAS is not currently paging, you can extend its memory limits as the first step towards resolving memory-related performance issues. If the server is paging, add more RAM to the server if possible. Otherwise, consider decreasing memory limits.

In addition to reviewing the memory limits in the Analysis Services Server Properties dialog box, as described in Skill 4.1, “Plan and deploy SSAS,” you can review the BufferMemoryLimit setting in the server’s msmdsrv.ini file. This property has a default value of 60, which is usually adequate. It sets the percentage of physical memory allocated to fact data during partition processing. Consider raising this value if the measure group granularity is more summarized than the source data, such as when you design the measure group to store data by month, but the source fact table stores data by day. SSAS uses a single processing buffer to group data by month and then flushes this buffer to disk when the buffer is full. Therefore, a larger buffer reduces the frequency of flushing. If this value is too high, parallelism decreases.

Inadequate memory resource causes evictions from the data and calculation caches, which may require that subsequent queries be resolved from disk rather than cache, and may also resolve in memory paging. Normally, you should see a lot of Get Data From Cache events on an active server in production. The lack of these events can indicate that the server does not have enough memory available for caching.

Note Formula for best memory configuration

The “Analysis Services Operations Guide” at https://msdn.microsoft.com/en-us/library/hh226085.aspx includes a formula that allows you to combine data from performance counters and DMVs to compute appropriate LowMemoryLimit, TotalMemoryLimit, and HardMemoryLimit values for your environment.

CPU utilization

Most operations in SSAS multidimensional models are performed in parallel. To manage these operations, SSAS maintains a Query Thread Pool and a Process Thread Pool instead of creating and destroying threads for each operation. When SSAS receives a query or processing request, it requests threads from the respective pool and returns them back to the pool upon completion of the request.

Although formula engine is single-threaded, SSAS can execute multiple queries in parallel by using separate threads from the Query Thread Pool. Query threads are requested to calculate executions in serial fashion, but they must wait for the storage engine, and then are released once results are returned to the requesting client application. To determine if you should increase query thread parallelism, check the Query Pool Job Queue Length and Query Pool Idle Threads counters from the MSAS13:Threads set. If Query Pool Job Queue Length value is not zero and Query Pool Idle Threads is greater than 0, you can increase the ThreadPool Query MaxThreads server property as long as long as the % Processor Time counter in the Processor set is not already approaching 100%. Changing this property does not help any individual query, but can benefit multiple simultaneous requests.

The storage engine is multi-threaded and can manage multiple operations with its thread pool. The key to tuning its threading behavior involves managing the Process Thread Pool to increase the number of threads available for the following types of operations:

Querying When the formula engine requests data from the storage engine, SSAS requests a thread from the Process Thread Pool which retrieves data from either the cache or disk, stores any data retrieved from disk in cache, and returns the data to the query thread.

Processing During a ProcessData operation for a partition, SSAS uses three threads in parallel to retrieve data from the data source, perform a lookup to dimension stores to get dimension keys and then load the processing buffer with partition and dimension key data, and to write this buffer to disk when it is full.

To properly perform thread performance testing for processing operations and evaluate performance counter results, you should start by isolating processing operations to ProcessData. Consider thread tuning when you find the Processing Pool Job Queue Length is greater than 0 during ProcessData operations. If this condition is true, then use one of the following rules to determine the appropriate tuning action to take:

Processing Pool Idle Threads = 0 If this counter’s value remains at 0 for long periods of time during processing, there are not enough threads available to SSAS. In this case, increase the SSAS ThreadPool Process MaxThreads property. The amount by which you increase this value depends on how much the CPU is under load during processing when this counter is 0. Increase the value and continue retesting until % Processor Time approaches 100%.

As you increase the number of available threads for processing, be aware that fewer threads are available for query execution. Therefore, you should use this option when you plan to restrict the processing schedule to times during which few or no queries typically execute.

Processing Pool Idle Threads > 0 If this counter’s value remains above 0 consistently during processing, there are too many idle threads. Start by decreasing the SSAS CoordinatorExecutionMode property from -4 to -8 and retesting.

Note CoordinatorExecutionMode property

For an in-depth explanation of the CoordinatorExecutionMode property, refer to the “Analysis Services Operations Guide” at https://msdn.microsoft.com/en-us/library/hh226085.aspx.

To tune threads for indexing operations, you should monitor performance counters by isolating activities on the SSAS server to ProcessIndex. The SSAS CoordinatorBuildMaxThreads property (editable only in the server’s Msmdsrv.ini file), which has a default value of 4, restricts the number of threads available to an individual aggregation processing job. If you increase this value, you might need to increase Thread Pool Process MaxThreads also. Increasing these values enables a higher degree of parallelism, but be sure to monitor the load on CPU and memory to ensure greater parallelism does not adversely impact the server in other ways.

To optimize partition processing, you should be using as close to 100% of CPU as possible. Try creating more partitions and processing them in parallel by using the Process Data operation. You can keep adding partitions until the Rows Read/Sec counter (in the MSAS13:Processing set) no longer increases. You can monitor the number of concurrently processing partitions by using the Current Partitions counter in the MSAS13: Proc Aggregations set.

Similarly, you can add more partitions to parallel execution of Process Index operations. Monitor the % Processor Total value while running the operation, and stop adding partitions when this value stops increasing. However, watch memory consumption to avoid paging.

Multidimensional processIng tuning

The goal of tuning processing operations is two-fold. First, you need to minimize the overhead required to process in order to reduce the processing time required and the consumption of server resources. Second, you need to find ways to optimize the retrieval of data from the relational source.

With regard to dimension processing, the time required is dependent on the number of members for each attribute and the number of attribute relationships. To find the attributes requiring the most processing time, look for long duration values for the BuildIndex event subclass in a trace. Not only do these attributes impact processing time, they also add to the sizes of the dimension, the cube, and aggregation store, and thereby increase storage requirements. Therefore, if processing is taking too long, consider removing attributes that are not essential.

Optimizations for processing include removing bitmap indexes for high-cardinality attributes to reduce the storage space and processing time these attributes require. An example of a high-cardinality attribute is a telephone number for a customer. When a high-cardinality attribute such as this is not the primary subject of analysis, and is always included in a query with another a lower cardinality attribute that has been filtered, consider setting the AttributeHierarchyOptimizedState property for the attribute to NotOptimized. This setting eliminates the bitmap indexing for the attribute.

During processing operations, SSAS generates SQL queries to retrieve data from a specified data source. Although you cannot directly modify the structure of these queries, unless you use query binding for partitions as described in Skill 4.3, you can influence the efficiency of these queries.

Dimensions do not allow you to use query binding in the object definition. However, if you use a view in the relational source for a dimension, you can add query hints and apply other query hints to tune the underlying SQL. In addition, if a snowflake dimension uses tables from separate database, consolidate the data into a single table to eliminate the use of the slower OPENROWSET function in the processing query.

Consider partitioning the relational source for a fact table. You can then create multiple partitions for a measure group in the multidimensional model and map each measure group partition to a separate relational partition. An alternative is to create multiple measure group partitions per relational partition. For example, if the relational source is partitioned by year, you can create measure group partitions by year or by month. Either way, the processing operation to load the measure group partition is restricted to a subset of the source table and thereby performs faster than if executing against an entire relational table that has not been partitioned.

Next, if you are using a view in the relational source or a named query in the data source view, eliminate the use of joins wherever possible for both dimensions and fact tables and consider indexing the view. The query to the source should retrieve data from a single table. Use extract-transform-load (ETL) processes to denormalize data structures where possible.

Last, add indexes to the relational tables. (Indexing a dimension table is not generally necessary unless it contains millions of rows.) The type of index you add depends on the type of table to process, as shown in Table 4-2.

TABLE 4-2 Relational source indexing strategies

Note Relational source optimization

You can find additional recommendations for further optimizing the relational source in the “Analysis Services Operations Guide,” which you can download from https://msdn.microsoft.com/en-us/library/hh226085.aspx.

Tabular performance issues

Resolution of tabular performance depends on the information you gather using the monitoring tools described earlier in this chapter. To improve query performance, analysis of DAX query plans helps you identify the query tuning opportunities. For tuning processing operations, trace events are helpful when you want to reduce processing time while performance counters are helpful for determining whether you need to tune the server’s usage of memory or CPU resources.

DAX query tuning

In general, the goal of optimizing DAX queries is to reduce the time required by the formula engine, which is the opposite of the optimization of MDX queries. By analyzing trace events as described in the “Identify bottlenecks in SSAS queries,” your first step is to determine whether the storage engine or the formula engine is the contributing factor to slow query performance.

The first place to check when searching for the cause of the bottleneck in the storage is the pseudo-SQL for the VertiPaq SE Query Begin or End events. The pseudo-SQL is not the actual code that executes, but a representation of the method used by the formula engine to query the data in memory. Review this code for any of the following more common problems in DAX queries:

Call back to formula engine This condition is identifiable when you see the CallBackDataID function in the pseudo-SQL of a VertiPaq SE Query Begin event like this:

Click here to view code image

WITH
$Expr0 := [CallbackDataID(IF ( 'Sale'[Total Sales]] > 100,
'Sale'[Total Sales]]))] . . .

This call appears at the beginning of one of several Vertipaq scans generated when executing the query shown in Listing 4-4. The use of CallBackDataID is not as efficient as filter predicates that use the WHERE clause or scans without filters because the work is pushed back to the single-threaded formula engine. Furthermore, the presence of the callback prevents caching of intermediate results in the VertiPaq cache. A successful rewrite of this query to remove the IF function and add a FILTER function produces a pseudo-SQL statement that no longer includes the CallBackDataID and instead uses a WHERE clause to define a simple predicate.

LISTING 4-4 Inefficient DAX query generating CallBackDataID

Click here to view code image

EVALUATE
SUMMARIZE (
    'Sale',
    'Date'[Calendar Year],
    "Total Sales", CALCULATE (
        SUMX (
            'Sale',
            IF ( 'Sale'[Total Sales] > 100, 'Sale'[Total Sales] )
        )
    )
)

Materialization When a query requires intermediate steps that are spooled, or stored in memory, for further calculation by the formula engine, the spooled data is not compressed and can consume a significant amount of memory greater than the size of the model. You can see this occur when the physical plan contains a Spool_Iterator operator with a high record count. Materialization is also recognizable when you see an Apply operator with multiple subordinate VertipaqResult operators. You can find an in-depth explanation of materialization and recommendations for resolution in “The VertiPaq Engine” at https://www.microsoftpressstore.com/articles/article.aspx?p=2449192&seqNum=6.

You can also see a spike in memory consumption when materialization occurs. This is the only time that a query puts pressure on server memory.

Too many VertiPaq queries Individually, a VertiPaq query can have a short duration measured in milliseconds, but hundreds of VertiPaq queries can require many seconds to complete. In this situation, you should try rewriting the query and review the new query plan. Changes to try include moving filters to an outer group of a nested calculation or replacing the FILTER function with CALCULATE or CALCULATETABLE.

When evaluating formula engine bottlenecks, study the DAX Query Plan for the following clues:

VertipaqResult sending high record count to next operator When you see VertipaqResult below an operator with a high value for its #Records property, you should try rearranging filters in the query to reduce the number of records returned by VertipaqResult. If the query has no filter, consider adding one;.

Slow or complex calculations Some DAX functions such as EARLIEST or SUMX have a greater performance impact than other functions, which can be unavoidable. However, the ideal is to rewrite a query to take advantage of the storage engine’s ability to group, join tables, filter, and apply aggregations. On the other hand, operations that sort or find the top N cannot take advantage of the storage engine. In these cases, try to minimize the size of the row set on which operations must be performed by filtering at the earliest opportunity.

As another step in your query tuning efforts, you should review the code and apply the following best practices where applicable:

Avoid the use of ISERROR and IFERROR functions because they force cell-by-cell mode.

If you need to test for one row before returning a value, do not use COUNTROWS because it requires multiple scans. Instead use HASONEVALUE or HASONEFILTER.

For complex calculations, use ADDCOLUMNS instead of SUMMARIZE in most cases, unless the SUMMARIZE includes the ROLLUP syntax or the use of ADDCOLUMNS would require multiple filters to achieve the same result.

Use ISEMPTY instead of ISBLANK because it does not require evaluation of the measure and thereby executes faster.

Exam Tip

You can find additional examples of problematic DAX constructs and a review of calculation best practices in the “Performance Tuning of Tabular Models in SQL Server 2012 Analysis Services,” which you can download from https://msdn.microsoft.com/en-us/library/dn393915.aspx. A good exercise to test your DAX query-writing and query plan evaluation skills is to recreate queries in the whitepaper by using the 70-768-Ch2 database and observe the effects of rewriting queries based on the recommendations.

Tuning to reduce tabular processing time

Before you start experimenting with changing your processing methods, you should first take steps to optimize the tabular model as described in the “Optimize and manage model design” section. Then retest processing operations on the optimized model.

With any of the considerations for tuning tabular processing, you should first break out processing operations by Process Data and Process Recalc, because they perform different tasks and consume resources differently. To monitor the amount of time required by each of these operations, set up a Profiler trace, look at the duration of Progress Report End events, and focus on the events with the longest duration. These events also identify the object or calculated column.

When you determine that the Process Data operation is the source of the processing bottleneck, you can tune processing by dividing a slow-processing table into multiple partitions and then using one of the following techniques:

Process multiple partitions as a group. SQL Server 2016 processes them in parallel by default.

Process only partitions with current or changed data rather than all partitions by using the Process Data operation in serial fashion if necessary and then perform one Process Recalc.

For even faster processing, use Process Add instead of Process Data.

When the Process Recalc operation is taking too long, determine if the high-duration processing is related to specific calculated columns. If so, can you perform the same calculation in the relational source by adding it as a derived column in a view or populating a column during ETL with the required values? Or can you optimize the DAX calculation?

Another way to resolve Process Recalc issues for long-running operations is to process a smaller group of objects and reduce parallelism by adding the MaxParallelism property to the operation as described at https://msdn.microsoft.com/en-us/library/mt697587.aspx.

Tuning to reduce tabular processing memory consumption

Your first step is to determine whether Process Data or Process Recalc is creating memory pressure by using the Memory Usage KB counter in the MSOLAP$TABULAR:Memory set. Normally, you should see the memory increase incrementally until it reaches the TotalMemoryLImit value set for the tabular instance. It should then gradually release memory as processing ends.

If the Process Data operation is using too much memory, try the following options:

Break up object processing into multiple transactions.

Reduce parallelism by using the MaxParallelism property.

Process each object in separate transactions.

Monitor the Page Writes/Sec counter in the Memory set during processing to determine if the Process Data operation is paging for an extended period of time. If considerable paging is occurring, perform any of the following steps to resolve or reduce this problem:

Add memory to the server.

Remove other applications or stop unnecessary services to free more memory for SSAS.

Place the pagefile.sys file on the fastest available disk.

Adjust the SSAS memory limits.

Reduce or lower parallelism by using more transactions or setting a low value for the MaxParallelism property.

Optimize the tabular model by removing or optimizing high-cardinality columns.

If the Process Recalc operation is using too much memory, try either of the following options:

Use a low value for MaxParellelism to reduce parallelism.

Remove or optimize calculated columns in the model.

Optimize CPU utilization during tabular processing

You can use thread-related performance counters to assess whether you can increase the server property ThreadPool Process MaxThreads to provide more threads to processing operations. Check the Processing Pool Job Queue Length and Processing Pool Idle Non-I/O Threads counters in the MSOLAP$TABULAR:Threads set. If Processing Pool Job Queue Length is greater than zero and Processing Pool Idle Non-I/O Threads is zero, there are not enough threads available to service the requests. If that is the case, keep increasing MaxThreads while monitoring these counters until Processing Pool Idle Non-I/O Threads is no longer zero.

Note Optimization of tabular processing operations

Refer to the “Performance Tuning of Tabular Models in SQL Server 2012 Analysis Services” at https://msdn.microsoft.com/en-us/library/dn393915.aspx for more advanced tuning options. These additional tuning techniques are not on the exam, but can help you solve more complex performance issues in your environment.

Configure usability limits

Besides tuning queries or configuring server properties to manage query performance, you can configure usability limits to control the user experience. In a multidimensional model, you should consider the behavior of the client application and the impact on the server when users browse a dimension with thousands of members. For both multidimensional and tabular models, consider implementing usability limits when users invoke drillthrough to view details for a specific cell value.

Attribute Member Groups

A good way to help users browse a multidimensional dimension that contains thousands of members is to create a hierarchy which groups these members into several levels such that the number of children on any given level is manageable within the user interface of the client application generating the query. If the data does not lend itself well to meaningful groupings, consider creating alphabetical groups. You can add a column to the dimension source (either by using a view in the dimension table or adding a named query or named calculation to the model’s Data Source View) that uses an expression such as LEFT("table"."name", 1) to create a group, and then create an attribute in the model’s dimension that uses this new column as its KeyColumn property. Then add this attribute as a parent level in a hierarchy for the attribute level containing all members.

Note Discretization properties

Another option is to use the DiscretizationMethod and DiscretizationBucketCount properties to create artificial groupings, although you have less control over the groupings when you use this approach. See “Group Attribute Members (Discretization)” at https://msdn.microsoft.com/en-us/library/ms174810.aspx for implementation details and limitations.

Drillthrough limits

With regard to managing drillthrough limits, you can configure the OLAP Query DefaultDrillthroughMaxRows property for the SSAS server to restrict the number of rows returned when the user performs a drillthrough operation on a cell. If the client application is Excel, the connection properties sets a default limit of 1,000 rows on drillthrough operations. For other client applications, you can usually specify a limit by including the MAXROWS argument in the DRILLTHROUGH clause as described in the documentation at https://msdn.microsoft.com/en-us/library/ms145964.aspx.

Optimize and manage model design

As described earlier in this chapter, there are many factors that can contribute to slow queries or processing. To the extent possible, you should consider ways that you can optimize a model prior to releasing it into production. However, over time, new query patterns emerge and data volumes increase as a natural evolution of the model in your environment. You should then use the results of performance monitoring tests to decide which aspects of your multidimensional or tabular model require optimization.

Exam Tip

You should be familiar with best practices in model design that optimize performance and understand how implementation of a design influences performance.

Multidimensional model optimization

In general, the goal of query optimization for multidimensional models is to reduce the amount of time spent by the storage engine. When performance analysis reveals the storage engine as a bottleneck, you should review the structure of dimensions, aggregations, and partitions for optimization opportunities.

Dimension optimization

The two aspects of a dimension design to optimize for better performing queries are attribute relationships and user-defined hierarchies. The steps required to create these objects are described in Chapter 1.

If you do nothing to adjust the attribute relationships after creating a dimension (unless it is a snowflake dimension with multiple tables), all attributes are related to the key attribute. In the attribute relationship diagram, you see the key attribute on the left side of the diagram and separate arrows connecting it to each of the other attributes. When the data model supports a many-to-one relationship between separate attributes, such as Sales Territory to Country or Subregion to Region in the City dimension, you should explicitly define the attribute relationship so that Country and Region no longer directly relate to the key attribute, City.

SSAS uses these relationships to build indexes so that the cross product between Sales Territory and Country members (such as Far West and United States, to name one pair) can be retrieved from the indexes without reading the key attributes. Without the relationship, a query on a large dimension is slower. Furthermore, when you design aggregations for the model, any aggregations created for Sales Territory can be used for queries requesting Country.

Queries are also optimized when you create a user-defined hierarchy for a natural hierarchy (defined in Chapter 1). SSAS materializes the hierarchy on disk to facilitate navigation and retrieve related members faster. For maximum benefit, be sure to create cascading attribute relationships for the attributes in the hierarchy. Note that unnatural hierarchies are not materialized on disk. Even without any performance, unnatural hierarchies are helpful for users who want to view specific combinations of members.

Establishing a user-defined hierarchy is also beneficial as a preparatory step for aggregation design. Without aggregations, queries that include a non-key attribute must compute measure values by retrieving the detail rows in the measure group (from the cube storage, not the relational fact table if you are using MOLAP storage) and applying the aggregate function defined for each measure to compute a tuple value at query time. On the other hand, with aggregations, these computed tuple values are calculated during processing and stored on disk so that SSAS can resolve queries faster. As an example, if aggregations exist for the Calendar Year level of a hierarchy and a query requests sales by year, the SSAS storage engine can retrieve four rows from the aggregation store rather than computing four rows from thousands of detail rows in the Wide World Importers DW cube.

Note Dimension optimization

You can find additional recommendations for optimizing dimensions in the Analysis Services MOLAP Performance Guide, available at https://msdn.microsoft.com/en-us/library/dn749781.aspx.

Aggregations

As explained in the “Multidimensional query monitoring” section, the first place the storage engine looks for data requested by the formula engine is the cache, and then it searches the aggregations stored on disk when the requested data is not already in cache. Although the structure is different, the storage of aggregations in SSAS is analogous to the use of summary tables in a relational database. Aggregations are precalculated when you perform a Process Full or Process Index operation as described in Skill 4.3.

The addition of aggregations to your multidimensional model is a balancing act. You need to have enough aggregations to speed up queries while completing the processing operations within the required window and providing the disk space required for aggregation storage. Too many aggregations can have an adverse effect on both processing and query performance. A best practice is to limit aggregation storage to 30 percent of the space required to store a measure group.

SSAS does not create aggregations by default. You must design aggregations and then perform a processing operation to load the aggregation store based on the design. However, it is difficult to anticipate which aggregations are most useful until you monitor query patterns. One strategy is to use the Aggregation Design Wizard in the cube designer as part of your initial deployment to production. The Aggregation Design Wizard helps you build aggregations by using an algorithm that evaluates members on each level of a hierarchy relative to the size of the measure group table (or partition, if you create multiple partitions with separate aggregation designs). If a query happens to hit an aggregation, query performance is better than it would be without the aggregation.

You can add an aggregation design to Sale measure group in the 70-768-Ch1 project by following these steps:

1. In SSDT, open the cube designer for the Wide World Importers DW cube and click the Aggregations tab.

2. In the cube designer toolbar, click Design Aggregations to launch the Aggregation Design Wizard, and then click Next.

3. On the Review Aggregation Usage page, keep the defaults for now. After you have a better understanding of how aggregation design works, you can configure the Aggregation Usage property for attributes on the Cube Structure page of the cube designer.

In general, these settings are used by the Aggregation Design Wizard to determine which attributes it should consider for aggregation by following these rules:

Full Requires SSAS to include the attribute in every aggregation or an attribute from a lower level in a hierarchy. As an example, if you set Calendar Year to Full, SSAS can create aggregations that include Calendar Year or Calendar Month. Set this value for a limited number of attributes, if any.

None Requires SSAS to exclude this attribute from all aggregations. Set this value for infrequently used attributes.

Unrestricted Allows SSAS to decide whether to include or exclude the attribute. Set this value manually for 5 to 10 of the most commonly used attributes.

Default Applies a default rule based on the characteristics of the attribute, as shown in Table 4-3.

TABLE 4-3 Default rules for aggregation usage

You can override the defaults or even the Aggregation Usage properties any time that you use the wizard. The purpose of defining the Aggregation Usage properties in advance is to store the preferred property value to save you time each time you use the wizard.

4. Click Next, and then click Count on the Specify Object Counts page of the wizard. The wizard counts rows in the measure group (or partition) and members in attributes so that it can assess mathematically which attributes benefit most from aggregations. Attributes with a lower count as compared to the measure group count are in this category. You can manually override counts if you currently have only a subset of data in the model.

5. Click Next. Although there are many choices for determining how the wizard should optimize aggregations on the Set Aggregation Options page, best practice is to select the Performance Gain Reaches option and keep the default of 30. Click Start to compute the aggregation design.

6. When the aggregation design is complete, click Next.

7. On the Completing The Wizard page, type a name for the design, SaleAggregations, select the Deploy And Process Now option, and click Finish.

You can use a profiler trace to determine which aggregates are useful for queries so that you can be sure to retain those aggregations as you fine-tune the design over time. These aggregate hits are identifiable by the Get Data From Aggregation event. You can also use a profiler trace to identify queries that can benefit from the addition of an aggregation. In that case, look for the granularity of requests in the Query Subcube Verbose event and use this information to manually create an aggregation as described in at “Improve SQL Server Analysis Services MDX Query Performance with Custom Aggregations” at https://www.mssqltips.com/sqlservertip/4432/improve-sql-server-analysis-services-mdx-query-performance-with-custom-aggregations/.

Another way to design aggregations that are tuned to actual query patterns is to run the Usage-Based Optimization Wizard. Before you can use this wizard, you must ensure the query log is enabled (which it is by default). The wizard analyzes the queries captured in the log and then designs aggregations to improve query performance for the targeted queries. You can add the new aggregation design to the existing design or replace the existing design.

Partitions

Rather than use the default structure of one partition per measure group, the use of multiple partitions in a measure group can be beneficial for both processing and query performance. See Skill 4.3 for more information about creating and processing measure group partitions.

One reason for faster query performance is the partition elimination SSAS performs when multiple partitions exist. That is, if a partition does not contain the data requested by the query, SSAS does not read the partition. Another reason is the ability for the storage engine to use separate threads in parallel to scan each partition. Thus, the combination of a smaller amount of overall space to scan and the ability to scan that reduced space in parallel, results in faster queries on partitioned data.

Furthermore, you can design different aggregation levels for each partition, which can increase the amount of extra storage required on disk, but the improvement in query speed can be a reasonable trade-off. As an example, you might design a higher level of aggregation for frequently-queried current data, but design fewer aggregations for less-frequently queried historical data.

You can also optimize storage for each partition. As described in Skill 4.3, you can specify by partition whether it is stored in MOLAP, HOLAP, or ROLAP mode, which determines whether data is restructured and moved from its relational source or kept in the source. In addition to specifying storage mode by partition, you can also place each partition on separate drives or even on separate servers to distribute disk IO operations.

Processing operations can also run faster. First, you can design partitions to separate historical and current data. Historical data is typically unchanging, which means you can process it once and leave it alone. Afterwards, you can then restrict processing to a smaller partition of current data, which is much faster than processing both types of data in a single operation. Second, even if you must process multiple partitions, you can process them in parallel which typically is a faster operation than processing partitions serially or processing a single large partition, assuming the server has adequate resources as discussed in previous sections of this chapter.

There is no requirement to design partitions symmetrically. As an example, you can create twelve partitions for each month in the current year, one partition for all twelve months in the previous year, and another partition for the five years preceding the previous year. Another benefit of this approach is the ability to maintain a moving window of data. You can easily drop off older data and add in new data without disrupting the data that you want to keep in the multidimensional model.

Note Partition strategy

The Analysis Services MOLAP Performance Guide, available at https://msdn.microsoft.com/en-us/library/dn749781.aspx, describes common design patterns for partitioning.

Tabular model optimization

The key to tabular model optimization for both queries and processing is the reduction of its memory footprint. The amount of memory required to process and store the tabular model in memory is directly related to the size of the tables and dictionaries for each column. The higher the cardinality of a column and, to a lesser extent, the larger the data type, the more memory is required. To reduce the memory footprint of your model, work through the following recommendations:

Remove columns that are not required for analysis or calculations.

Remove rows that are not required for analysis, such as rows for time periods in the past that no longer contribute helpful insight for analysis of more recent time periods.

Reduce cardinality by making any of the following changes:

Eliminate key columns from fact tables

Use ETL to consolidate snowflaked dimensions, lookup tables, and junk dimensions.

Split a date/time column into separate date and time columns (or even eliminate the time column if it is not required for analysis).

Eliminate calculated columns in large tables because no compression is applied to them.

If using a numeric column with decimal places, change the data type for the column in the data model to Currency unless you need to store more than four decimal places. You can still use the Decimal Number format so that values are not displayed as currency.

Note Focus on high-memory model objects

Remember that you can use the $System.discover_object_memory_usage DMV to analyze memory usage by object. Typically, the largest objects have high-cardinality columns.

Processing operations also benefit from these changes to the model. Monitor the Rows Read/Sec in the MSOLAP$TABULAR – Processing set to determine if processing can benefit from additional changes. If this value is 80,000, try making the following changes to the tabular model design:

Denormalize tables by eliminating joins to create a standard star schema structure.

When the relational source is partitioned, create matching partitions in the model partitions and ensure a one-to-one mapping between the relational and model partitions.

Tune the relational query. In tabular model partition definitions, you can use ORDER BY, CTEs, and stored procedures. The query is not modified by SSAS during processing.

Skill 4.3: Configure and manage processing

The configuration and management of processing in a multidimensional model is much different from processing in a tabular model. This section explains configuration options and decision points for processing objects in a multidimensional model only. The configuration of options and management of processing for a tabular model are described in Chapter 2.

This section covers how to:

Configure partition processing

Configure dimension processing

Use Process Default, Process Full, Process Clear, Process Data, Process Add, Process Update, Process Index, Process Structure, and Process Clear Structure processing methods

Configure Parallel, Sequential, and Writeback processing settings

Configure partition processing

As explained in Skill 4.2, the implementation of partitions is a useful strategy for improving query or processing performance or both for a multidimensional model. When you add a measure group to a multidimensional model, a single partition is created. You can add a new partition to a partition either by using the cube designer during the development cycle or by executing an XMLA script to define and add a partition to a model in production. After defining one or more partitions, you can then configure processing options for each partition separately. Partition processing includes several steps: loading data into a partition structure, building indexes, and building the aggregations as defined by the aggregation design assigned to the partition.

Partition Wizard

An easy way to add a partition to a cube before you deploy it into production is to use the Partition Wizard. You can find details about using this wizard at https://msdn.microsoft.com/en-us/library/ms179981.aspx. If you plan to define multiple partitions in a cube, you must first remove the definition of the initial partition and then add new partitions by following these steps:

1. In SQL Server Data Tools for Visual Studio 2015 (SSDT), open the 70-768-Ch1 project from the File menu, and then double-click Wide World Importers DW.cube to open its cube designer.

2. On the Partitions page of the cube designer, right-click the Sale partition, and click Delete to remove the partition from the cube.

When a cube contains multiple measure groups, the cube designer shows only the partitions for the first measure group initially. You can expand the other measure groups by double-clicking a measure group’s name and thereby view its partitions.

3. Click the New Partition link.

4. In the Partition Wizard, click Next, select the Sale check box in the Available Tables list, and then click Next.

In this example, there is only one fact table to use as a source for partitions in the Sale measure group. If you maintain separate fact tables for different segments of data in the relational source, such as one fact table per year, you can choose a specific table that SSAS binds to the partition.

There is no way to filter out data from a fact table when using the table-binding method. The partition receives all the data in the table. Bear in mind that if your fact table structure changes, you must manually update each partition definition to correspond. One way to insulate yourself from this problem is to bind partitions to views instead of tables.

5. If you want to keep the current table-binding setting for the partition, click Next on the Restrict Rows page of the Partition Wizard to continue to the next configuration step.

In lieu of table-binding a partition, you can define query-binding. To do this, select the Specify A Query To Restrict Rows check box and then modify the default query by appending a filter condition to the WHERE clause, [Invoice Date Key] <= '2015-12-31', and click Next.

Note Query binding

Although the usual practice is to append a filter condition to the SELECT statement when you implement query binding, you can rewrite the query in other ways to improve relational performance if you like. Remember, the query for each partition must return the same columns and no two partitions overlap. Each WHERE clause should ensure the query returns a unique set of rows. SSAS does not for duplicate rows across partitions. Therefore, if you advertently create overlapping partitions, the data in the cube will be wrong.

Also, query binding breaks the measure group’s relationship to the data source view. Consequently, if you make changes to the data source view later, those changes do not update the partition query. For this reason, configuring partitions with query binding should be one of the last development steps you perform.

6. On the Processing and Storage Locations page of the wizard, you choose where SSAS performs processing of the partition and where the process data is stored. The default processing location is the default SSAS server and the default storage location is the Data folder for the server instance.

For information about configuring and using a remote partition, see https://msdn.microsoft.com/en-us/library/ms174751.aspx.

7. Click Next to proceed to the Completing The Wizard page. On this page, you have the following aggregation options that execute after you click Finish on this page:

Design Aggregation Options For The Partition Now

Design Aggregations Later

Copy The Aggregation Design From An Existing Partition (copy from)

If you choose either of the latter two options, the Deploy And Process Now check box is active.

For this example, select the Design Aggregations Later option and click Finish.

8. Repeat the preceding steps to create another partition by using the same settings with the exception of using the following condition in the WHERE clause:

Click here to view code image

[Invoice Date Key] >= '2016-01-01'

Partition storage

Configuring the storage for a partition is not simply a designation of MOLAP, HOLAP, or ROLAP, the storage modes described in Chapter 1, but also a specification of how the partition responds to changes in the relational source, if at all. When you build a star schema to use as a source for the multidimensional model, a common approach is to schedule processing immediately after the ETL process successfully completes. You can instead implement proactive caching, a feature in SSAS to trigger processing based on rules that you define, or use ROLAP storage instead which requires processing only when the partition is initially deployed.

To define partition storage, open the Partitions page of the cube designer, select the partition row in the list below the Sale measure group, and click the Storage Settings link to display the following storage options:

MOLAP Stores detail data and aggregations in a multidimensional format. Data is refreshed only when you perform processing manually or on a scheduled basis. It is suitable when queries must be fast and access to real-time data is not required.

Scheduled MOLAP Stores detail data and aggregations in multidimensional format. Data is refreshed every 24 hours, although you have no control over the schedule.

Automatic MOLAP It stores detail data and aggregations in multidimensional format. As one form of proactive caching, data is refreshed automatically when the server detects changes to the source data. It waits for 10 seconds for activity in the source to end before beginning processing, but proceeds after 10 minutes if activity does not end. During processing, queries retrieve data from the existing database. This setting is useful when you do not want to maintain a separate processing infrastructure by using SSIS or scheduled SQL Server agent job.

Medium-latency MOLAP This stores detail data and aggregations in multidimensional format. It functions like Automatic MOLAP in that it refreshes data automatically after receiving notification of a data change in the relational source. It differs with regard to data latency. If the data in the existing MOLAP partition from which queries are answered is older than 4 hours, the partition switches to ROLAP mode and answers queries directly from the relational source until processing completes. Queries are slower until the new MOLAP partition is available.

Low-latency MOLAP It stores detail data and aggregations in multidimensional format. It is similar to medium-latency MOLAP except that the data latency is 30 minutes.

Real-time HOLAP This stores only aggregations in multidimensional format and keeps detail data in the relational source. Aggregations are updated when SSAS receives notification of a data change. During aggregation processing, queries that would normally be resolved by using aggregations switch to ROLAP queries until processing is complete.

Real-time ROLAP It keeps detail data and stores aggregations in the relational source. Use this storage mode when the source data changes frequently and queries require current data.

Custom setting You can modify the storage mode definition to establish a specific combination of silence interval, silence interval override, and data latency interval.

Exam Tip

Be prepared to understand which storage modes are appropriate for a specific scenario as the exam is likely to test your knowledge of this topic. Review the guidelines in “Set Partition Storage (Analysis Services-Multidimensional) at https://msdn.microsoft.com/en-us/library/ms175646.aspx. Also, see “Proactive Caching (Partitions)” at https://technet.microsoft.com/en-us/library/ms174769.aspx.

Partition aggregations

If the cube already includes aggregation designs, you can apply an existing aggregation to a new partition by opening the Aggregations page of the cube designer, clicking a measure group in the Standard View, and then clicking the Assign Aggregation Design in the cube designer toolbar. In the Assign Aggregation Design dialog box, select a design in the Aggregation Designs drop-down list and then select the check box for each partition to which you want to assign the aggregation.

Additional partition properties

When you select a partition on the Partitions page of the cube designer, you can view several properties associated with that partition. In particular, the following properties affect processing:

ErrorConfiguration Ideally, you find and resolve errors in the data during the extract-transform-load (ETL) process when you use a star schema as a data source as described in Chapter 1. At the partition level, this setting is set to Default. If you keep this setting, any error during processing causes the operation to fail. You can change this behavior to ignore the error and continue, ignore a specified number of errors before failing, or log the error and continue. Furthermore, if you ignore the error, you can exclude the row causing the error from the partition or you can include it and assign the row to the unknown member for each dimension to which the row cannot be matched. (Unknown members are explained in Chapter 1.)

Although you can configure error configuration when you start a processing operation, you can avoid the need to do so each time if you define the ErrorConfiguration property on the partition object.

Exam Tip

Be sure to understand your options for configuring error handling. For more information on this topic, review “Error Configuration for Cube, Partition, and Dimension Processing (SSAS – Multidimensional)” at https://msdn.microsoft.com/en-us/library/ms180058.aspx.

ProcessingMode The default value for this property is Regular, which means that the partition cannot be read by a storage engine thread in response to a query when you perform a Process Full operation on it until the operation is complete. As explained later in the “Use Process Default, Process Full, Process Clear, Process Data, Process Add, Process Update, Process Index, Process Structure, and Process Clear Structure processing methods” section of this chapter, the Process Full operation is a combination of two operations, Process Data and Process Index.

If you change the ProcessingMode to Lazy Aggregations, the partition is available for querying as soon as the Process Data operation is complete. The Process Index operation runs in the background if enough resources are available. During this time, queries to the partition run more slowly than they would if aggregations existed, but frequently-run queries result in cached data which has no dependency on aggregations. Therefore, the performance impact of lazy aggregations takes a greater toll on less-frequently queried data. The negative aspects of using this option include the lack of control over when the aggregations are built and the inability to synchronize a database until this operation is complete.

ProcessingPriority This property defines the priority of building the current partition relative to other partitions when ProcessingMode is set to Lazy Aggregations.

Slice When SSAS processes a MOLAP partition, it identifies the minimum and maximum range of values for each attribute associated with the fact data and stores this information as metadata that it uses for partition elimination during query operations. However, you should manually define the slice for ROLAP and proactive caching partitions. Manual definition of the slice is also recommended for MOLAP partitions when a measure group contains asymmetrical partitions, such as one partition with one year of data and another partition with multiple years of data. For examples, see “Set the Partition Slice Property (Analysis Services)” at https://msdn.microsoft.com/en-us/library/ms174764.aspx.

Writeback partition

In most cases, a multidimensional model is read-only. However, there are scenarios for which storing user input in the multidimensional model is beneficial, particularly budgeting and forecasting in financial models. This feature is known as writeback and requires the use of a client application that supports writeback, such as Excel. In the model, you can write-enable a partition only when its measure group contains measures that use the Sum aggregate function. You can secure the model to control which users have read-write access to restrict writeback capability to a limited group.

When you enable a writeback partition, which you can do by right-clicking the partition on the Partitions page of the cube designer and clicking Writeback Settings, you specify the name of the table in which to store user input and select either MOLAP or ROLAP as its storage mode. When a client application captures user input, SSAS stores this input in the specified table and does not add it to the relational source on which the partition is based. As queries retrieve data from the partition, the results from both the partition and the writeback table are merged together.

Note Writeback partitions

If you configure writeback or ROLAP processing as described in Chapter 4, “Configure and maintain SQL Server Analysis Services (SSAS),” the account you use must also have write permission.

You can find more information about implementing writeback partitions in “Set Partition Writeback” at https://msdn.microsoft.com/en-us/library/ms174823.aspx and “Write-Enabled Partitions” at https://msdn.microsoft.com/en-us/library/ms174750.aspx.

Partition processing configuration

You can launch partition processing in SSDT or SSMS by using the Process Partition dialog box. In the dialog box, you can select one of the following processing methods: Process Default, Process Full, Process Data, or Process Clear. You can also override error configuration and define batch settings. The processing methods and other configuration settings are described in the last two section of Skill 4.3.

In SSMS, you can generate an XMLA script based on the selections you make in the dialog box. A benefit of using an XMLA script is the ability to group a specific set of partitions to process into a single parallel processing operation or control sequencing of serial processing.

Note XMLA for partition processing

See “Processing Objects (XMLA)” at https://msdn.microsoft.com/en-us/library/ms187199.aspx and “Performing Batch Operations (XMLA)” at https://msdn.microsoft.com/en-us/library/ms186680.aspx for more details on using XMLA for processing operations.

Configure dimension processing

The goal of dimension processing is to retrieve data from a relational source, load the data into a dimension object, and then build indexes on it. Dimension processing is affected by the configuration of specific properties in the dimension definition described in this section.

ProcessingGroup property

By default, the ProcessingGroup property for a dimension is set to ByAttribute. As a result of this setting, processing operations for a dimension generate a series of SELECT DISTINCT statements to the relational source. When a dimension table is small, these statements execute relatively quickly.

When you want to minimize the impact of multiple statements on the relational source, you can change the ProcessingGroup property to ByTable. This setting causes SSAS to generate a single SELECT statement to retrieve data for all attributes at one time. However, because the values in an attribute that is not the key attribute are no longer unique, you must set the KeyDuplicate property in the error configuration to IgnoreError when you use this option. Consequently, you should take extra care to ensure the dimension data is clean as part of the ETL process.

Proactive caching

The ProactiveCaching property is another dimension-level property that impacts processing. You configure this property in the same way described for partitions in the previous section.

Attribute relationship types

Chapter 1 explained how to define attribute relationships and introduced the RelationshipType property, which is set to Flexible by default. Your other option is to set the property to Rigid. This property determines what happens during processing when the parent member for an attribute changes.

To better understand the implications of changing parent members, let’s consider a hypothetical hierarchy consisting of categories and stock items. In the stock item level, a stock item named Mug exists and is associated with Category A in the category level. After the stock item dimension is processed, the attribute relationship between Mug and Category is established. Later, a reorganization of stock items occurs and the relational source now associates Mug with Category B. The impact of this change on the multidimensional model depends on the type of processing operation performed and the RelationshipType property for the attribute relationship between the stock item and category levels.

Let’s start by considering the effect on processing when the relationship type is flexible and the Process Update operation is performed on the dimension. (The rationale for using this operation is explained in the next section.) When the change to Category B is detected, the indexes in the partitions in which dimension usage reflects a relationship between the partition measure group and the dimension are invalidated and aggregations are lost. This invalidation results in a slowdown of the processing operation. Furthermore, the indexes in the partitions must be rebuilt by launching a separate Process Index operation on each partition impacted by the invalidation. Until these indexes are rebuilt, queries are slower, although frequent queries benefit from caching.

On the other hand, if you attempt to perform a Process Update operation when the relationship type is rigid, the operation fails. You must instead perform a Process Full operation on the dimension which renders all related partitions invalid, not just indexes and aggregations. That means if you perform a Process Full operation on a dimension, you are committed to performing a Process Full operation on all related partitions before queries can retrieve data.

When deciding whether to set the RelationshipType value on each attribute relationship, there is no good or bad answer. Instead, you must consider business requirements that dictate the size of the processing window, the likelihood of changing the assignment of an attribute member to a new parent, and the impact of waiting to rebuild indexes and aggregations versus fully rebuilding a dimension and the cube.

Dimension processing configuration

The configuration of dimension processing is much like that of partition processing. In SSDT or SSMS, you use the Process Dimension dialog box to select one of the following processing methods: Process Default, Process Full, Process Clear, Process Data, Process Index, or Process Update. As with partitions, you can also override error configuration and define batch settings or generate an XMLA script. The processing methods and other configuration settings are described in the next two sections.

Use Process Default, Process Full, Process Clear, Process Data, Process Add, Process Update, Process Index, Process Structure, and Process Clear Structure processing methods

Processing multidimensional objects to bring the data to a current state should be performed as quickly and efficiently as possible. You can choose to process an entire database, a cube, a measure group, a partition, or a dimension. After you choose the object to process, you select one of the following processing methods:

Process Default SSAS determines the processing method required to bring the object to a fully processed state. If an object is unprocessed, SSAS performs a Process Full operation. If it is a processed partition but indexes or aggregations are not processed, SSAS performs only a Process Index operation.

Process Full The object is deleted and rebuilt. If the object is a dimension, you must also perform a Process Full on all measure groups related to the dimension before the cube can be queried.

Process Clear The data in the object and any of its dependencies are removed from the multidimensional model.

Process Data You can use this option loads data into either a dimension or a partition object. It is similar to Process Full in that it deletes and rebuilds the target object. However, it does not rebuild indexes for dimensions or partitions or aggregations for partitions. When you use this option to process a dimension, you must also perform a Process Data on all related partitions.

Process Add You can use this option only within an XMLA script. This effect of this option depends on the target object.

When you use this option for a dimension object, it compares the data in the relational table with the members in the dimension and adds any members that are not found in the dimension. It does not handle changes to existing members. Aggregations and indexes in partitions are preserved which makes this operation quick with no impact on other objects.

When you perform Process Add for a measure group or partition, you specify a source table or a query that contains only new rows. SSAS loads this data into a new partition that it creates, fully processes it, and then merges the partition into the target partition.

Process Update This option is applicable only to dimension objects. In this operation, the relational table data is compared to the dimension members. Members are deleted in the dimension object if they no longer exist in the source table, new members are added, and members are updated to reflect changes found in the relational data. If flexible relationship types exist on any level of a hierarchy, aggregations and indexes are dropped in related partitions.

Process Index You normally use this option after performing a Process Data operation on an object. It builds bitmap indexes for dimension objects and indexes and aggregations for partition objects. The relational data source is not accessed during this operation.

Process Structure This option is applicable only to cubes or mining structures. If a cube is not processed, Process Structure processes its dimensions and the cube. If the cube contains mining models, the mining models are not processed which can shorten processing time.

Process Clear Structure This option is applicable only to mining structures. You use it to clear training data from a mining structure.

The simplest possible approach to processing is to perform a Process Full on the entire database, but this approach is feasible only when there is adequate time now (and in the future) to process every dimension and partition and rebuild indexes and aggregations. If Process Full on the database is not an option, which is typical in most SSAS implementations, you need to perform processing on dimensions and measure groups or partitions individually.

With respect to dimensions, you can expedite processing by performing a Process Update only for dimensions that you know have changed, and then run Process Default on the cube to rebuild the indexes and aggregations dropped for flexible relationships. On the other hand, if a dimension source table never allows changes to existing rows and only adds new rows over time, you should use Process Add for faster processing. You may experience better performance if you perform Process Data and Process Index operations separately instead of performing Process Full when you encounter processing issues for a dimension, because it contains millions of members or many attributes.

You have several options for partition processing as well. Usually processing a partition is slower than dimension processing due to larger data volumes. Furthermore, data does not commonly change in partitions, but new rows are constantly arriving in fact tables. With this in mind, you must use Process Full for a new partition or when you need to reload an existing partition. If the partition size is small, you can minimize the time required for processing. When a partition is large, consider performing Process Data followed by Process Index to reduce the overall impact on the server although a powerful server might perform Process Full faster, because it can allocate threads to building indexes and aggregations while waiting for IO operations.

Another factor to consider for partition processing is the materialization of a reference dimension model as described in Chapter 1. To update the dimension during processing, SSAS creates a join between the fact table and the intermediate dimension table defined in the dimension usage. Whereas dimension members are normally loaded during dimension processing, the materialization of the referenced members occurs during partition processing and can degrade the performance of processing the partition. If the structure or data of this intermediate dimension changes at any time, you must run a Process Full on all partitions in the measure group because otherwise the current state of a changed intermediate dimension is not reflected in cube queries. The requirement to perform a Process Full operation is not required if you do not materialize the referenced relationship.

Regardless of which processing method you use, you can launch a processing operation in SSMS by right-clicking the target object in Object Explorer and clicking Process. In the Process dialog box for the selected object, select the operation to perform in the Process Options dialog box. Rather than click OK to initiate the processing operation, you can click the Script button and select one of the options to generate an XMLA script that defines the operation. You can save this script and use it in a SQL Server Agent job or a SQL Server Integration Services (SSIS) task to automate processing operations.

Exam Tip

You should be able to successfully match a processing strategy to a described scenario.

Configure Parallel, Sequential, and Writeback processing settings

The Process Partition or Process Dimension dialog box also allows you to configure the several settings related to processing operations. In the dialog box, click Change Settings to open the Change Settings dialog box. Then, on the Processing Options page of the Change Settings dialog box, you can choose to configure the following settings:

Processing Order The default processing order is Parallel. This option is preferable when you are processing multiple objects in a single batch and permits SSAS to speed up processing by performing the processing operations in parallel. When this option is selected, you can also change the Maximum Parallel Tasks setting from its default, Let The Server Decide, to one of the following values: 1, 2, 4, 8, 16, 32, 64, or 128. When you have limited server resources, you should set this value to 1.5 to 2 times the number of CPUs available on your server, rather than allow the server to decide. Regardless of the number of parallel tasks that you specify, the tasks are treated as a single transaction. Therefore, if any processing task fails, they all fail and the transaction rolls back.

If you prefer, you can select the Sequential option to force SSAS to process each object one after the other.

Transaction Mode You can change this setting only when you choose Sequential as the Processing Order option. The default is One Transaction which commits the changes only when all processing jobs succeeds. You can change it to Separate Transactions to roll back failed processing jobs on an individual basis.

Writeback Table Options In the drop-down list, you can select one of the following options:

Create SSAS creates the writeback table for the partition. This operation fails if the writeback table already exists. Use this option when your processes to move writeback data to another location or remove it also delete the writeback table.

Create Always SSAS creates the writeback table for the partition. It deletes an existing table and replaces it with a new one. Use this option for scenarios in which you have processes in place to migrate writeback data to the source table.

Use Existing SSAS keeps the existing table.

Process Affected Objects Select this check box when you want SSAS to automatically process objects that have a dependency on the object you have selected for processing in the current operation.

You can also configure error handling in the Change Settings dialog box. To do this, click the Dimension Key Errors tab to choose the error configuration to apply. If you keep the default option, Use Default Error Configuration, the processing operation relies on the error configuration defined in the object to be processed. To override the default configuration, select Use Custom Error Configuration and then configure the following settings:

Key Error Action If a key value does not exist, such as when a fact table references a dimension key that does not yet exist in the dimension, you can select the Convert To Unknown in the drop-down list to assign the Unknown Member in its place or select Discard Record to exclude the entire row from processing.

Processing Error Limit You can select the Ignore Errors Count option to prevent errors from ending the processing job or Stop On Error when you want to establish limits. In the latter case, you set the Number Of Errors value and then select either Stop Processing or Stop Logging in the On Error Action drop-down list.

Key Not Found When a key value appears in the partition source but is not found in the dimension, you can select one of the following actions: Report And Continue (default), Ignore Error, or Report And Stop.

Duplicate Key When the same attribute key appears more than once in the dimension source, you can select one of the following actions: Ignore Error (default), Report And Continue, or Report And Stop.

Null Key Converted To Unknown If you set Key Error Action to Convert To Unknown, this setting determines the effect on processing. You can choose one of the following actions: Ignore Error (default), Report And Continue, or Report And Stop.

Null Key Not Allowed If you set the Key Error action to Discard Record, this setting determines the effect on processing. You can choose one of the following actions: Report And Continue (default), Ignore Error, or Report And Stop.

Error Log Path Specify the path and file name for the error log which SSAS updates when you set the Report And Continue or Report And Stop for any error action.

Skill 4.4: Create Key Performance Indicators (KPIs) and translations

There are two last modeling tasks for which Exam 70-768 tests your knowledge. First, you must know how to create a key performance indicator (KPI). A KPI is the result of comparing one value to another and typically represents the status of progress of a measure’s value over time toward a specified goal. Second, you must how to define translations that display a model’s metadata by using the language associated with the locale of the client application.

Exam Tip

Be prepared to answer questions about the steps necessary to create KPIs or translations in either type of model and how to diagnose problems in MDX or DAX expressions used to define KPIs.

This section covers how to:

Create KPIs in multidimensional models and tabular models

Configure KPI options, including Associated measure group, Value Expression, Goal Expression, Status, Status expression, Trend, Trend expression, and Weight

Create and develop translations

Create KPIs in multidimensional models and tabular models

Although the process to create KPIs differs between multidimensional and tabular models, conceptually KPIs in both models serve the same purpose. A common scenario is to define one measure such as a sales goal and then compare another measure, total sales, to that goal. In this section, you review how to create a KPI for this scenario in a tabular model. The next section describes how to create the same KPI in a multidimensional model.

Before you can create a KPI in a tabular model, you must create the measures to compare. To create the measure and the KPI, perform the following steps:

1. The tabular model already has total sales, so create one more measure in the Sale table for the sales goal which is calculated by finding the prior year’s sales and adding 10% to that value, as shown, like this:

Click here to view code image

Sales Goal := CALCULATE([Total Sales], SAMEPERIODLASTYEAR('Date'[Date])) * 1.1

2. Set the new measure’s Format property to Decimal Number and the Show Thousand Separator property to True.

3. Next, right-click the Total Sales measure, and select Create KPI. In the Key Performance Indicator (KPI) dialog box, select Sales Goal in the Measure drop-down list, as shown in Figure 4-11, to designate it as the KPI target. As an alternative, you can specify an absolute value if the target is a constant value. SSAS computes the ratio between the base measure, Total Sales, to the target, Sales Goal, and then uses the result to determine the color to assign to the KPI. By default, the threshold for the color red is 40% and the threshold for the color yellow is 80%, and any ratio above 80% is assigned the color green.

FIGURE 4-11 Key performance indicator configuration in a tabular model

4. You can adjust the threshold boundaries by moving the sliders up or down the color bar or by typing in a new ratio in the box above each slider. You can also change the threshold type by selecting one of the multicolor rectangles in the middle-right side of the dialog box. Currently, the selection is the rectangle containing the red-yellow-green color sequence to indicate that low ratios are worse values and displayed as a red icon and high ratios are better values and displayed as a green icon. You can choose other color sequences when low ratios are better values and high ratios are worse values or when mid-range ratios are to be considered worst or best ratios while ratios that are furthest away from the mid-range are to be considered best or worst ratios, respectively.

5. At the bottom of the dialog box, you can select an icon style to represent the KPI in a client application. Some client applications might not support the icon you select, so be sure to test the KPI in the client application before finalizing your tabular model.

6. You can expand the Descriptions section of the dialog box to describe the KPI. This feature is useful when your client application supports the display of descriptions. Even if your client application does not support descriptions, consider adding a description to help other developers responsible for maintaining the model to understand its logic.

7. When finished, click OK to save the KPI. Test the results by clicking Analyze In Excel on the Model menu, and clicking OK in the dialog box that opens. In the PivotTable Fields list, select the Total Sales check box first and then select the Sales Goal check box.

8. Next, expand the KPIs folder, expand Total Sales, and select the Status check box. Because the Sales Goal measure is dependent on Date, drag the Calendar hierarchy from the PivotTable Fields list to the Rows pane below the list.

As you can see in Figure 4-12, the KPI icon changes color according to the ratio between Total Sales and Sales Goal for each year. For example, the goal, 59,390,639.50 for CY2016 is 110% of the CY2015 sales, 53,991,490.45, but total sales in CY2016 is 22,658,175.55, which is 38% of that year’s goal. Because this ratio is less than the 40% threshold defined for the KPI, the icon is red.

FIGURE 4-12 Key performance indicator icons in an Excel pivot table

Configure KPI options, including Associated measure group, Value Expression, Goal Expression, Status, Status expression, Trend, Trend expression, and Weight

A multidimensional KPI uses expressions to define its value, its target, the status icon, and a trend icon. You can also create a parent-child KPI hierarchy when you want to use a weighted allocation for scorecard-style applications.

To create the same KPI in a multidimensional model, perform the following steps:

1. Open the cube designer in SSDT, click the KPI tab, and then click New KPI on the cube designer toolbar.

2. In the Name box, type a name for the KPI: Sales Goal.

3. In the Associated Measure Group drop-down list, select Sale to relate the KPI only to this measure group’s related dimensions.

4. In the Value Expression box, a common practice is to use an existing measure or to create a calculated measure and then reference the expression as the KPI value. However, you can create a more complex expression here without creating a calculated measure. Because you can display the value in a client application or request it in a query, there is no benefit to gain by creating a calculated measure first. For this example, type the following expression:

Click here to view code image

[Measures].[Sales Amount Without Tax]

5. In the Goal Expression box, you can again use a calculated measure or define the expression directly here. The goal is the target value to which the Value is compared for determining a status. The expression you use must resolve as a tuple. For this example, type the following expression:

Click here to view code image

([Measures].[Sales Amount Without Tax],
PARALLELPERIOD([Invoice Date].[Calendar].[Calendar Year], 1) ) * 1.1

6. In the Status Indicator drop-down list, select Shapes. As you can see in the drop-down list, you have several options. However, you should test your selection in each client application that your users might use to determine whether it supports your selection.

7. The status expression determines which image the client application displays based on the image you use. In the case of Shapes, a red icon displays when the status expression returns a value of -1, a yellow icon displays when the value is 0, and a green icon displays when the value is 1. In general, your status expression must return a value somewhere between -1 and 1 and the status icon changes according to this value. In the Status Expression box, type the following expression to compare the KPI value to KPI goal thresholds of 40% and 80%:

Click here to view code image

CASE
WHEN KPIVALUE( "Sales Goal" ) <= KPIGOAL( "Sales Goal" ) * .4
THEN -1
WHEN KPIVALUE( "Sales Goal" ) <= KPIGOAL( "Sales Goal" ) * .8
THEN 0
ELSE 1
END

Note KPI functions

Notice the use of the KPIVALUE and KPIGOAL functions in this expression. These functions return the tuple for the underlying expressions for the Value and Goal expressions of a KPI. You can also use KPISTATUS, KPITREND, and KPIWEIGHT in expressions.

8. In the Trend Indicator drop-down list, you can select an icon to represent the direction of the KPI value’s trend over time. For this example, select Standard Arrow.

9. The trend expression determines which image the client application displays, if it supports the trend icon, for the KPI value at a point in time as compared to a prior period. Just as you do with the status expression, you must define an expression that returns a range from -1 to 1 where -1 is a downward trend and 1 is an upward trend. Due to the need to compare values over time, the expression is more complex than the one you use In the Trend Expression box, type the following expression:

Click here to view code image

CASE
WHEN IsEmpty(ParallelPeriod([Invoice Date].[Calendar].[Calendar Year],
      1, [Invoice Date].[Calendar]))
    THEN 0
WHEN [Measures].[Sales Amount Without Tax] >
    (ParallelPeriod([Invoice Date].[Calendar].[Calendar Year],
      1, [Invoice Date].[Calendar]), [Measures].[Sales Amount Without Tax])
    THEN 1
WHEN [Measures].[Sales Amount Without Tax] <
    (ParallelPeriod([Invoice Date].[Calendar].[Calendar Year],
      1, [Invoice Date].[Calendar]), [Measures].[Sales Amount Without Tax])
    THEN -1
  ELSE 0
END

10. Deploy the project to the server to add the KPI to the model, and then click the Browser View button in the cube designer toolbar. The KPI is visible, but no values display because you the presence of the PARALLELPERIOD function in the status and trend expressions require you to define a date context. In the Dimension drop-down list, select Invoice Date and select Calendar in the Hierarchy drop-down list. In the Filter drop-down list, expand All, and select CY2015. You can see the evaluation of the KPI values and associated icons, as shown in Figure 4-13.

FIGURE 4-13 KPI browser in SSDT

Scorecard applications often use hierarchies of KPIs in which a parent KPI reflects an accumulation of metrics from its children KPIs. As an example, you might have a Sales category in a scorecard that groups the Sales Goal KPI with Profitability and Customer Growth KPIs. These KPIs are not additive in the sense that you can add together the KPI value and goal results in a way that is meaningful. However, you can assign a weight to each child KPI and then assign status and trend expressions to the parent KPI that computes a weighted value such that when the status and trend of the child KPIs are predominantly positive, the parent KPI status and trend results are also positive.

To set up a parent KPI, define the value, goal, status, and trend expressions as you see fit. The Value Expression is required, but you can use NULL to avoid the need to compute a value. However, if you do this, the Browser View in SSDT does not display the Status or Trend icons despite the validity of the expression. You can instead test the parent KPI in another application, such as Excel. At minimum, you should define a status expression similar to this:

Click here to view code image

(KPIStatus("Sales Goal") * KPIWeight("Sales Goal")) +
(KPIStatus("Profitability") * KPIWeight("Profitability")) +
(KPIStatus("Customer Growth") * KPIWeight("Customer Growth"))

Your last step is to expand the Additional Properties section of each child KPI. You can then associate it with its parent KPI and set a hard-coded value or an expression for the Weight. The sum of the weights for all child KPIs should be 1. Optionally, you can add a Current Time Member if you want to associate the child KPI with a specific date context, such as you might if you have a calculated member in the cube that sets the current date.

Create and develop translations

Translations in SSAS models do not provide automatic conversions from one language to another, but provide a means by which you can provide alternate metadata for object names and descriptions.

Multidimensional translations

The development of translations in a multidimensional model is largely a manual process that you perform for each dimension and cube in the model.

To configure translations for a dimension, open the dimension designer, and click the Translations tab. Click the New Translation button and select a language from the list of available languages. In the dimension designer, which includes a column for the selected language, type in a caption for the dimension, each attribute names, each hierarchy, and each level.

If your relational source includes a column for a translated string, such as Spanish Day Of Week, as a companion to an English Day Of Week column, you can associate that column with the attribute. When a client application accesses the translated metadata for the cube, it can display the attribute members in the applicable translation column. To set this up on the Translations page, click the ellipsis button in the row containing an attribute object and then select the column containing the translated value in the Translation Columns list. Optionally, you can also configure collation and sort order properties.

You can also configure translations for cube metadata in the cube designer. Click the Translations tab on the cube designer, click New Translation, and select a language. Then, for each object, type the translated caption. You can do this for the cube, measure group, measures, calculated measures, cube dimensions, perspectives, KPIs, actions, named sets, and calculated members.

Note Multidimensional translations

You can see examples of defining translations in the dimension designer and cube designer and find instructions for testing translations in the cube browser in “Defining and Browsing Translations” at https://msdn.microsoft.com/en-us/library/ms166708.aspx.

Tabular translations

There is no interface in SSDT to define translations for a tabular model. Instead, you export a JavaScript Object Notation (JSON) file containing placeholders for translations, manually add in translated strings for each language that you want to support, and import the JSON back into the model.

To do this, point to Translations on the Model menu and click Manage Translations. Double-click the language for which you want to add translations to the model, and then select it in the Translations In The Model, click Export Selected Languages, and save the JSON file. You can add multiple languages to a single file, but you might find it easier to manage translations in individual files.

Your next step is to edit the translations section of the JSON file. For each object in the model, you will find a pair of properties like this set of translation properties for the City table:

Click here to view code image

"name": "City",
"translatedCaption": "",
"translatedDescription": ""

At a minimum, provide a translated string inside the double-quotes to the right of the TranslatedCaption property. The addition of a translation for TranslatedDescription is optional. When you have completed this task, return to SSDT, point to Translations on the Model menu and click Import Translations. You can set options to indicate whether you want to overwrite existing translations in the model, ignore invalid objects, write results to a log, or back up existing translations to a JSON file. You can test the results by selecting a language in the Culture drop-down list when you use the Analyze In Excel feature to browse the model.

Note Tabular translations

For more information about defining translations for a tabular model and an example of JSON file containing translated strings, review “Translations in Tabular models (Analysis Services)” at https://msdn.microsoft.com/en-us/library/mt654273.aspx.

Chapter summary

You can configure memory limits for multidimensional and tabular models in the Server Properties dialog box that you open in SSMS or in the msmdsrv.ini file for the SSAS instance. Three memory limits—LowMemoryLimit, TotalMemoryimit, and HardMemoryLimit—establish thresholds at which SSAS evaluates which objects to remove for memory to make room for new requests. Other limits are useful for managing memory allocations for building aggregations and processing partitions.

Both multidimensional and tabular models are NUMA-aware. You can configure server properties to define affinity masks by thread pool.

You can also use server properties to configure the disk layout for each server instance. Specifically, you can figure locations for database and metadata objects, log files, and temporary files.

Both multidimensional- and tabular-mode servers support a scale-out architecture. When a single server does not have adequate resources to support both querying and processing operations, you can create a distributed architecture and then use one of many methods to copy a database from a processing server to a query server.

You can use SQL Server Profiler to create traces that monitor events generated during MDX or DAX queries. A tabular server also supports the use of Extended Events to capture the same information available in the Profiler trace. The analysis events provides clues that you can use to troubleshoot slow queries. You can obtain logical and physical query plans for DAX queries, but query plans are not available for MDX queries.

You can use trace events to determine whether the formula engine or the storage engine is a query bottleneck.

Other tools that you can use to monitor both query and processing operations include Performance Monitor, DMVs, and Task Manager. You can also use SQL Server Profiler to trace processing events.

You can use the information gathered from traces, performance counters, and query plans to identify problems in your environment or model design that impact query or processing performance. Based on this information, you might need to tune slow queries, optimize the model, or tune the CPU or memory configuration of your server.

In addition to managing performance, you can manage usability of your model by controlling how users interact with dimensions that have a large number of members. In a multidimensional model, you can create an artificial grouping as a level in a hierarchy to make it easier for users to browse an attribute with large number of values. Another option is to use the discretization properties, which gives control over the groupings to SSAS. You can also set the server property OLAP Query DefaultDrillthroughMaxRows to restrict the number of rows returned for a drillthrough operation.

When you find that query or processing operations are too slow, there are steps that you can take to optimize the model. In a multidimensional model, you can typically achieve performance gains by fine-tuning the structure of dimensions, designing more or better aggregations for the model, or breaking apart a fact table into multiple measure group partitions. In a tabular model, you can optimize the model by adjusting structure of the relational source and removing or redefining objects in the model to reduce the overall memory footprint. You can also improve processing performance by creating partitions for large tables.

Before you process a partition, you must make many design decisions that affect how processing behaves, including the binding of the partition to a table or query, the storage mode, aggregation design, among other properties. When you launch processing for a partition, you configure the processing method, error configuration, and batch settings for transaction management.

Several properties in a dimension definition affect dimension processing. The ProcessingGroup property determines the type of SQL statement used to retrieve data for the dimension, the ProactiveCaching property controls whether SSAS manages data refreshes automatically, and the RelationshipType property on attributes affects how Process Update operation responds to changed data. When you launch processing for a dimension, you configure the same options as for partitions: processing method, error configuration, and transaction batch settings.

There are different processing methods that you can apply to objects at different levels of the database hierarchy. You should be familiar with the effect of each of the following process methods: Process Default, Process Full, Process Clear, Process Data, Process Add, Process Update, Process Index, Process Structure, and Process Clear Structure.

When you configure processing settings, you can define parallel or sequential processing, establish one transaction for all processing jobs or use separate transactions for each job, specify whether processing should create a writeback table for a write-enabled partition, or instruct SSAS to process dependent objects. You can also configure error handling.

Both multidimensional and tabular models support the definition of KPIs to compare a specified measure value to a target. Tabular KPIs are defined by using a graphical interface to select thresholds and assign icons to the status, which measures progress towards the target through the use of colors and icons. Multidimensional KPIs use MDX expressions to define the value, target, and status. Additionally, you can define a trend expression to compare the current value to a prior point in time. Last, you can establish a KPI hierarchy with weighted allocations to roll up child KPI status and trend values to a parent KPI.

Both multidimensional and tabular models support the addition of translation metadata to allow users in different locales to view metadata in their local language. In a multidimensional model, you type in the translations for captions of various objects, such as dimensions, attributes, measures, and so on. In a tabular model, you export a JSON file containing a set of model objects with placeholders for translations. After you update this file, you import it back into the model to make the translations available during browsing.

Thought experiment

In this thought experiment, demonstrate your skills and knowledge of the topics covered in this chapter. You can find answer to this thought experiment in the next section.

You are the BI developer at Wide World Importers and have inherited a multidimensional model and a tabular model that are partially developed. You have been tasked with fixing some issues identified during user acceptance testing and preparing the models for deployment to production. The following issues and requirements have been identified:

The users have complained that the list of customers is too long when add the Customer attribute from the multidimensional model to an Excel PivotTable. They would like to see groupings of the customers first by state, and then by customer name.

Sales staff need to know the current quantity on hand for stock items with as little data latency as possible, but they can wait 24 hours to see sales data in the multidimensional model. The processing time should be as fast as possible.

When you process the cube every 24 hours at 2 AM Greenwich Mean Time (GMT), users in that time zone complain they cannot access the cube.

One of the reports that queries the tabular model is taking several minutes to execute.

The Process Full operation on the tabular model is running out of memory before it can complete.

Based on this background information, answer the following questions:

1. What steps can you take to enhance the Customer dimension to meet the users’ requirements?

2. Recommend the most efficient partitioning and processing strategy for the multidimensional model.

3. How can you configure processing to enable users in the 2 AM GMT time zone to access the data as soon as possible after the data is refreshed while using a single server to support both processing and querying?

4. Which monitoring tool(s) do you use to troubleshoot the report performance?

5. Which monitoring tool(s) do you use to troubleshoot the tabular processing memory issue?

Thought experiment answer

This section contains the solution to the thought experiment.

1. You need to configure an artificial discretization for customers. If you use the DiscretizationMethod and DiscretizationBucketCount properties for the Customer attribute, you cannot group the customers by state. Instead, you must create a view in the source database that adds a derived column that uses an expression to extract a substring of the state from the Customer column, update the data source view in the SSDT project, and then add the new column as an attribute to the Customer dimension. You can then create a hierarchy that includes the grouping attribute at the top level and the Customer attribute at the bottom level.

2. You should keep the sales data in multiple MOLAP partitions and process only the most current partition every 24 hours to minimize the processing time required. The stock holding measure group should be configured as a ROLAP partition to provide near real-time data. Configure proactive caching to calculate aggregations automatically in the background.

3. Set each partition’s ProcessingMode property to LazyAggregations to allow users to query the cube as soon as a partition has completed processing. Data will be current, although queries will run more slowly until the aggregations have been built.

4. The first tool to use when troubleshooting query performance for a tabular model is a SQL Server Profiler or Extended Events trace so that you can analyze the logical and physical query plans for clues that can help you identify the root cause. Performance counters and DMVs are not helpful for query troubleshooting in general, although a spike in memory usage typically indicates a materialization issue.

5. Use the Performance Monitor to capture performance counters related to memory usage. Separate the processing steps into Process Data and Process Recalc to identify which operation is causing the out-of-memory problem. You can then use DMVs to identify objects consuming a large amount of memory.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 4. Configure and maintain SQL Server Analysis Services (SSAS)

Create new playlist

Sign In

Sign Up

Chapter 4. Configure and maintain SQL Server Analysis Services (SSAS)

Skill 4.1: Plan and deploy SSAS

Configure memory limits

Multidimensional-mode memory limits

Tabular-mode memory limits

Configure Non-Union Memory Access (NUMA)

Configure disk layout

Determine SSAS instance placement

Skill 4.2: Monitor and optimize performance

Monitor performance and analyze query plans by using Extended Events and Profiler

Multidimensional query monitoring

In-memory tabular query monitoring

Extended Events trace

DirectQuery query plans

Identify bottlenecks in SSAS queries

Monitor processing and query performance

Performance Monitor

Dynamic management views

Task Manager

Trace events for processing

Resolve performance issues

Multidimensional performance issues

MDX query tuning

Cache warming

Memory management

CPU utilization

Multidimensional processIng tuning

Tabular performance issues

DAX query tuning

Tuning to reduce tabular processing time

Tuning to reduce tabular processing memory consumption

Optimize CPU utilization during tabular processing

Configure usability limits

Attribute Member Groups

Drillthrough limits

Optimize and manage model design

Multidimensional model optimization

Dimension optimization

Aggregations

Partitions

Tabular model optimization

Skill 4.3: Configure and manage processing

Configure partition processing

Partition Wizard

Partition storage

Partition aggregations

Additional partition properties

Writeback partition

Partition processing configuration

Configure dimension processing

ProcessingGroup property

Proactive caching

Attribute relationship types

Dimension processing configuration

Use Process Default, Process Full, Process Clear, Process Data, Process Add, Process Update, Process Index, Process Structure, and Process Clear Structure processing methods

Configure Parallel, Sequential, and Writeback processing settings

Skill 4.4: Create Key Performance Indicators (KPIs) and translations

Create KPIs in multidimensional models and tabular models

Configure KPI options, including Associated measure group, Value Expression, Goal Expression, Status, Status expression, Trend, Trend expression, and Weight

Create and develop translations

Multidimensional translations

Tabular translations

Chapter summary

Thought experiment

Thought experiment answer

Table of Contents for
Chapter 4. Configure and maintain SQL Server Analysis Services (SSAS)