Monitor and Report on Table Sizes

Tables in an OnLine database seem to take on a life of their own, especially in a heavily used system with many developers. It is a twofold problem. First, there's the problem of proliferation of tables. This will be addressed later in a table warehouse program.

The second part of the space problem concerns the growth of existing tables. In other sections, we have seen various programs for performing data archiving and table cleaning, but to do any of these, the DBA needs to know where the problem lies. We need a way to track the size and space utilization of tables.

At first glance, this seems to be an easy task. Just write a report that looks at the system catalogs. There are a couple of problems to this, the main one being that the data in systables is not always accurate. The system catalogue systables has a field called npdata that nominally contains the number of pages used by the table. However, this table is updated only when the SQL statement UPDATE STATISTICS is run. On top of that, even when update statistics is run, the number in npdata is not always accurate. It seems to be an approximation.

There are only two reliable ways to get the accurate data that actually represents the real utilization of the disk. This involves using the OnLine utilities tbstat or tbcheck.

The tbstat -t output contains a column called npages. This is an accurate number and represents the actual number of pages on the disk that are currently allocated to the table and to all of its indexes. Some of these pages may be empty, but it doesn't matter. The table still has these pages allocated. The data about a table only shows up in tbstat -t if the table is in use when the utility is run. One way to do this is to do an isql retrieve from the menu on one terminal while running a tbstat on another. This is fine for one time checks but is not practical for compiling reports.

Another source of accurate data is tbcheck -pt. The output from this utility has a row called Number of pages allocated that gives an accurate page count. However, this utility has two drawbacks. The first is that it places a lock on the target table and prevents users from updating the table. This is even more disruptive because the report for a table of significant size can take a while to complete, keeping it locked for a longer period of time. This can be overcome, but not without manual intervention. The information that you want appears quickly after the utility is started. As soon as you see the data you want, you can break out of the utility by pressing CONTROL-C. You will not keep the table locked for long. The main problem, however, is the same as with the tbstat command. It doesn't lend itself well to creating reports.

The solution for running reports lies in the tbcheck -pe utility. This utility does not lock tables, so it can be run at any time without disrupting access to the database. It does have a few drawbacks, but they can be handled. The first drawback is that the output from tbcheck -pe uses a significant amount of space. The second drawback is that the data is arranged in the form of an extent-by-extent report and does not provide summary data for tables. If you've read this far, you can probably expect that the next thing we see will be a script that massages the data.

Actually, we'll be doing something a little more complicated here. Since the extent data can be useful, we'll create our own system table and keep the extent data there. First, create a database called dba (or whatever you want to call it, or even keep it in rootdbs if you insist). Place the following table extent_sizes, in the database:

create table extent_sizes
{database char(8),
owner char(8),
tabname char(20) not null,
extents decimal(7,1) ,
dbspace char (12) );
revoke all on extent_sizes from "public";
create index ext_tabname on extent_sizes (tabname);

The program that runs the tbcheck, massages the data, and then loads it back into the extent_sizes table is called last_table_sizes.

INFORMIX 92> cat last_table_sizes
$INFORMIXDIR/bin/tbcheck -pe  >! $INFORMIXDIR/last_tbcheckpe
cat $INFORMIXDIR/last_tbcheckpe | /bin/grep : | /bin/grep -v Chunk:  | /bin/grep -v WARNING |
/bin/grep -v SAM | /bin/awk '
BEGIN     {FS = " "} 
/DBSpace/ {dbspace = $4;next} 
{printf("%s %s
",$0,dbspace)} ' 
| awk '{print $1,$3,$4}' | /usr/bin/tr ":" "|" | /usr/bin/tr "." "|" | tr -s " " " " | tr " "
"|" > $INFORMIXDIR/tempsizes
$INFORMIXDIR/bin/isql dba << EOF
DELETE FROM dba:extent_sizes;
LOAD FROM "tempsizes" INSERT INTO dba:extent_sizes;
SELECT database, owner[1,3],tabname,dbspace, count(*) as extents, sum(extents) as pages FROM
extent_sizes GROUP BY database,owner[1,3],tabname,dbspace ORDER BY pages desc;
EOF
/bin/rm $INFORMIXDIR/tempsizes

A few notes about the last_table_sizes script are in order. The second section beginning with cat $INFORMIXDIR/last_tbcheckpe is one long pipeline. Where it hits an end-of-line, use the backslash () to continue the line. None of the punctuation here is a backtick (`). This script uses just single and double quotes.

The final section is an isq1 command that first cleans out all of the rows of extent_sizes and then repopulates the table using a load from the destination of the long pipeline above. The report goes to standard output. If you wish, you can put the script in your cron file to run every night. If so, use a command like this.

$INFORMIXLOCALBIN/last_table_sizes > $INFORMIXDIR/last_table_sizes &

The resulting report looks like this.

database owner tabname         dbspace          extents            pages
admin    dba   loader_data     slowdbspace           17         210795.0
admin    dba   vmx_config      dbspace1              12         135349.0
admin    dba   person_addr     dbspace1              28         127767.0
admin    dba   mem_grp_code    dbspace1              14         82542.0
admin    dba   ticket_desc     dbspace1              41         71839.0
admin    dba   access_pre      dbspace1               6         67556.0

If you wish, you can then delete the last_tbcheckpe file, but if you can afford the space, leave it around. This file, if updated daily, can provide much valuable information about the general layout of your system. You can get valuable space utilization data as well as table fragmentation information by looking at the file.

Another good thing to do in your cron file is to move the last_table_sizes file into an archive directory before creating the new one. You can then do a UNIX zip or a compress on the old file if you wish. Keeping the last_table_sizes data around for a while can let you track and report on table and extent growth over a period of time. This sort of data is most helpful when you are trying to forecast future database needs. I keep the data around for a year and have written INFORMIX-WingZ programs to extract the data and draw impressive charts and graphs of table size growth. This is great when you're going to management requesting more hardware!

While the last_table_sizes report gives you useful information in terms of absolute number of extents and number of pages, it is often good to get this information in percentages. This lets you answer questions such as "What percentage of my database does table XXX occupy?" Since we have now gone to the trouble of building and populating a table with extent sizes in it, and updating it nightly, why not get a little more mileage out of the data?

The last_percentage script is very similar to last_table_sizes, with the exception that this scripts that the extent_sizes table is already populated.

INFORMIX 99> cat last_percentage
$INFORMIXDIR/bin/isq1 dba << EOF
SELECT database, owner, tabname,dbspace[1,8], count(*)as extents ,sum(extents)/26165 as
percent
FROM dba: extent_sizes
GROUP BY database, owner, tabname,dbspace[1,8]
ORDER BY percent desc
EOF

This script gives you the following report:

database owner    tabname          dbspace      extents          percent

admin    dba      loader_data      slowdbsp          17             8.06
admin    dba      vmx_config       dbspace1          12             5.17
admin    dba      person_addr      dbspace1          28             4.88
admin    dba      mem_grp_code     dbspace1          14             3.15
admin    dba      ticket_desc      dbspace1          41             2.75

You will need to make one change in the script to customize it to your environment, or change the script to handle this automatically. The SELECT statement divides the sum (extents) by 26,165 in my sample. Your number will be different. Do a tbstat -d and look at the chunk data. Add up all of the size columns for chunks that you want to consider as the total available. Divide this number by 100 and use the result instead of 26,165. You may want to exclude your rootdbs and logfile chunks as not being available for use. You may also want to play a little with the figures to give you some "fluff factor." Users are going to scarf up any amount of space you have. You might want to keep a little bit hidden.

Sometimes, you want information similar to that from the table script, only with information about the actual space utilization. The following tabuse script simply runs the UNIX grep command against the last_tbcheckpe file:

INFORMIX 43> cat tabuse
#
#tabuse
echo " "
echo "ACTUAL DISK USAGE REPORT FOR TABLES IN ALL DATABASES"
echo This data is as of: 'ls -la $INFORMIXDIR/last_table_sizes | cut -c 32-44' based on the
$INFORMIXDIR/last_table_size file
echo " 
database owner tabname             dbspace           extents             2K pages"
echo "name                                              used                used"
echo  "=========================================================================="
echo " "
grep $1 $INFORMIXDIR/last_table_sizes

The spacing of the echo statements may need to be adjusted if you are typing the program into your system. Just put in or take out some spaces to make it line up correctly on your terminal. Depending on your UNIX version, you may also need to alter the cut statement. Adjust the columns until the cut returns only the date.

This script does not provide real-time data. If you have your last_table_sizes script running in cron, tell your users to look at the top line of the tabuse output to see when the last_tbcheckpe file was generated.

Since we're using a grep statement, you can invoke the script by typing the command tabuse any_fragment where any_fragment is any text that can be found on a line in the last_tbcheckpe file. Thus, you could query by table name, table name fragment, database, owner, or dbspace.

This can be a useful utility. It's an example of a general method of doing things that would otherwise be very difficult or take a lot of time. In cases where real-time data is not absolutely critical, generate a work file for the data during a slow period and query the work file during heavy use periods.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset