Chapter 8. Optimization and performance tuning 243
Number of users increases
The size of the data and aggregate caches is related to the dimensional space queried and
user behavior. If the number of users increases, accounting for users who are accessing data
that is unique to their session might be necessary. There might be more of the dimensional
space queried or the nature of the reports might require processing of different amounts of
data. Recompute the data cache size based on the updated number of named users for the
user factor.
Nature of reports and usage changes
In a majority of cases, the size of the data cache and aggregate cache can be determined
relative to the size of the member cache. The data cache and aggregate cache are
dependent on the number of rows in the fact table when, relative to the dimensional space of
a cube, there are many rows. Many rows means a dense fact table that requires more space
to store non-null values.
The data cache retains data until it is nearly full and then discards data that is least accessed,
because the data that users are interested in, or most likely to query, can change over time. If
the nature of reports changes in such a way that data that would have been cached was
discarded to make room for new data cache values, consider increasing the data cache to
accommodate and retain more values.
Presence of the aggregate cache
The data cache size can be reduced by the presence of an aggregate cache. The reason is
because an aggregate cache may retain many values, which would have been stored in the
data cache. If the aggregate cache contains aggregates that are recommended from
analyzing query workload information, the aggregate cache should have more data that is
relevant to the user. Consider re-computing the data cache for memory efficiency, accounting
for the presence of the aggregate cache.
Virtual cubes are directly accessed but base cubes are not
If a cube exists solely for use within a virtual cube, and it is not accessible from any reporting
packages, assigning a data cache to the cube might not be necessary, because data will be
cached in the virtual cube that is being accessed. It is only important to cache data for the
cubes that are accessible by users. Note that any cube, base or virtual, that is not accessible
through a reporting package, might have little need for a data cache.
If a virtual cube is built to combine history and recent data into a single cube, assigning a data
cache to the base cubes might make sense, because this way will ensure fast query
performance when the recent data cube is updated.
Virtual cubes do not have aggregate caches. Consider retaining the aggregate cache for the
base cubes so that it can satisfy queries that are ultimately issued to the underlying base
cube.
8.8.2 Cases for aggregates
This section describes how aggregates can enhance data warehouse and query
performance.
OLAP-style reports that show data aggregated at levels higher than the
grain of the fact
The expectation is that a vast majority of OLAP-style reports and analyses that run against
Cognos Dynamic Cubes show data at high levels of hierarchies, for example, report on the
244 IBM Cognos Dynamic Cubes
Gross Profit for Retailers Region and Products Product Line. Use of aggregates can
significantly improve performance of queries by providing data aggregated at levels higher
than the grain of the fact.
By using both in-memory aggregates and database aggregates, data warehouses of all sizes,
but especially medium, large, and extra large warehouses, can benefit from the greatly
decreased query times.
The expectation is that most data warehouses already have database aggregates in some
form: either pre-existing aggregates for use by other applications, or aggregates added by the
DBA to accelerate certain queries. The process to identify relevant and useful database
aggregates can be difficult and time-consuming. For warehouses that do not yet have
aggregates, or want to supplement existing database aggregates with in-memory and other
in-database aggregates, run the Aggregate Advisor to get recommendations. The DBA has
the flexibility to choose which of the recommended database aggregates to implement, but
can rely on output from the Aggregate Advisor that these aggregates are relevant. Also, after
the modeler incorporates the aggregate cube information into the model, Cognos Dynamic
Cubes can guarantee routing to these database aggregates.
The aggregate cache and the in-memory aggregates that it holds should be enabled, when
possible. In-memory aggregates are meant to be a turn-key aggregate management solution.
Recommendations that are provided by the Aggregate Advisor can be easily saved and
loaded the next time the cube starts.
Small data warehouses might be able to provide improved query performance with only
in-memory aggregates cache if the queries to the underlying database have sufficient
performance.
Improving overall query performance
The intent is to improve the overall query performance across queries in a normal workload
against Cognos Dynamic Cubes. Use of aggregates can improve performance for a large
majority of user queries. However, not all queries need to be covered by aggregates. There
might be some outlier queries that do not hit the aggregate cache or route to a database
aggregate, because either there is no aggregate that can satisfy this query or it is not run
often to justify having an aggregate to cover it.
In addition to the aggregate cache and database aggregate routing, the result set cache,
expression cache, data cache, query pushdown, and other features of Cognos Dynamic
Cubes also help to provide the fast query response times users expect.
For new cube deployments, where a known workload of queries might not be known, run the
Aggregate Advisor by using only its cube model-based analysis logic to get a set of
recommendations to start with and apply to the cube data source. Then, later, after a known
or a sample workload is determined and captured, run the Aggregate Advisor again. This
time, set it to consider workload information to get a new set of recommendations to use that
are more relevant to actual user queries.
Over time, if overall query performance becomes less than satisfactory, it might be because
the workload characteristics have since changed. Recapture the workload information and
rerun the Aggregate Advisor. See the 8.3.6, “Rerunning the Aggregate Advisor” on page 220
for more scenarios about when to rerun the Aggregate Advisor.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset