Cases for aggregates

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 8. Optimization and performance tuning 243

Number of users increases

The size of the data and aggregate caches is related to the dimensional space queried and

user behavior. If the number of users increases, accounting for users who are accessing data

that is unique to their session might be necessary. There might be more of the dimensional

space queried or the nature of the reports might require processing of different amounts of

data. Recompute the data cache size based on the updated number of named users for the

user factor.

Nature of reports and usage changes

In a majority of cases, the size of the data cache and aggregate cache can be determined

relative to the size of the member cache. The data cache and aggregate cache are

dependent on the number of rows in the fact table when, relative to the dimensional space of

a cube, there are many rows. Many rows means a dense fact table that requires more space

to store non-null values.

The data cache retains data until it is nearly full and then discards data that is least accessed,

because the data that users are interested in, or most likely to query, can change over time. If

the nature of reports changes in such a way that data that would have been cached was

discarded to make room for new data cache values, consider increasing the data cache to

accommodate and retain more values.

Presence of the aggregate cache

The data cache size can be reduced by the presence of an aggregate cache. The reason is

because an aggregate cache may retain many values, which would have been stored in the

data cache. If the aggregate cache contains aggregates that are recommended from

analyzing query workload information, the aggregate cache should have more data that is

relevant to the user. Consider re-computing the data cache for memory efficiency, accounting

for the presence of the aggregate cache.

Virtual cubes are directly accessed but base cubes are not

If a cube exists solely for use within a virtual cube, and it is not accessible from any reporting

packages, assigning a data cache to the cube might not be necessary, because data will be

cached in the virtual cube that is being accessed. It is only important to cache data for the

cubes that are accessible by users. Note that any cube, base or virtual, that is not accessible

through a reporting package, might have little need for a data cache.

If a virtual cube is built to combine history and recent data into a single cube, assigning a data

cache to the base cubes might make sense, because this way will ensure fast query

performance when the recent data cube is updated.

Virtual cubes do not have aggregate caches. Consider retaining the aggregate cache for the

base cubes so that it can satisfy queries that are ultimately issued to the underlying base

cube.

8.8.2 Cases for aggregates

This section describes how aggregates can enhance data warehouse and query

performance.

OLAP-style reports that show data aggregated at levels higher than the

grain of the fact

The expectation is that a vast majority of OLAP-style reports and analyses that run against

Cognos Dynamic Cubes show data at high levels of hierarchies, for example, report on the

244 IBM Cognos Dynamic Cubes

Gross Profit for Retailers Region and Products Product Line. Use of aggregates can

significantly improve performance of queries by providing data aggregated at levels higher

than the grain of the fact.

By using both in-memory aggregates and database aggregates, data warehouses of all sizes,

but especially medium, large, and extra large warehouses, can benefit from the greatly

decreased query times.

The expectation is that most data warehouses already have database aggregates in some

form: either pre-existing aggregates for use by other applications, or aggregates added by the

DBA to accelerate certain queries. The process to identify relevant and useful database

aggregates can be difficult and time-consuming. For warehouses that do not yet have

aggregates, or want to supplement existing database aggregates with in-memory and other

in-database aggregates, run the Aggregate Advisor to get recommendations. The DBA has

the flexibility to choose which of the recommended database aggregates to implement, but

can rely on output from the Aggregate Advisor that these aggregates are relevant. Also, after

the modeler incorporates the aggregate cube information into the model, Cognos Dynamic

Cubes can guarantee routing to these database aggregates.

The aggregate cache and the in-memory aggregates that it holds should be enabled, when

possible. In-memory aggregates are meant to be a turn-key aggregate management solution.

Recommendations that are provided by the Aggregate Advisor can be easily saved and

loaded the next time the cube starts.

Small data warehouses might be able to provide improved query performance with only

in-memory aggregates cache if the queries to the underlying database have sufficient

performance.

Improving overall query performance

The intent is to improve the overall query performance across queries in a normal workload

against Cognos Dynamic Cubes. Use of aggregates can improve performance for a large

majority of user queries. However, not all queries need to be covered by aggregates. There

might be some outlier queries that do not hit the aggregate cache or route to a database

aggregate, because either there is no aggregate that can satisfy this query or it is not run

often to justify having an aggregate to cover it.

In addition to the aggregate cache and database aggregate routing, the result set cache,

expression cache, data cache, query pushdown, and other features of Cognos Dynamic

Cubes also help to provide the fast query response times users expect.

For new cube deployments, where a known workload of queries might not be known, run the

Aggregate Advisor by using only its cube model-based analysis logic to get a set of

recommendations to start with and apply to the cube data source. Then, later, after a known

or a sample workload is determined and captured, run the Aggregate Advisor again. This

time, set it to consider workload information to get a new set of recommendations to use that

are more relevant to actual user queries.

Over time, if overall query performance becomes less than satisfactory, it might be because

the workload characteristics have since changed. Recapture the workload information and

rerun the Aggregate Advisor. See the 8.3.6, “Rerunning the Aggregate Advisor” on page 220

for more scenarios about when to rerun the Aggregate Advisor.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Cases for aggregates

Create new playlist

Sign In

Sign Up

Table of Contents for
Cases for aggregates