Chapter 8: Investigating Undo

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

CHAPTER 8

Investigating Undo

In the previous couple of chapters we discussed several undo segment topics. We’ve seen how they are used during recovery, how they interact with the redo logs, and how they are used for consistent, nonblocking reads of data. In this chapter, we’ll look at the most frequently raised issues with undo segments.

In this chapter the bulk of our time will be spent on the infamous ORA-01555: snapshot too old error, as this single issue causes more confusion than any other topic in the entire set of database topics. Before we do this, however, we’ll investigate one other undo-related issue: the question of what type of DML operation generates the most and least undo (you might already be able to answer that yourself, given the examples in the preceding chapter with temporary tables).

What Generates the Most and Least Undo?

This is a frequently asked but easily answered question. The presence of indexes (or the fact that a table is an index-organized table) may affect the amount of undo generated dramatically, as indexes are complex data structures and may generate copious amounts of undo information.

That said, an INSERT will, in general, generate the least amount of undo, since all Oracle needs to record for this is a rowid to “delete.” An UPDATE is typically second in the race (in most cases). All that needs to be recorded are the changed bytes. It is most common that you UPDATE some small fraction of the entire row’s data. Therefore, a small fraction of the row must be remembered in the undo. Many of the previous examples run counter to this rule of thumb, but that’s because they update large, fixed-sized rows and they update the entire row. It is much more common to UPDATE a row and change a small percentage of the total row. A DELETE will, in general, generate the most undo. For a DELETE, Oracle must record the entire row’s before image into the undo segment. The previous temporary table example (in Chapter 7), with regard to redo generation, demonstrated that fact: the DELETE generated the most redo, and since the only logged element of the DML operation on a temporary table is the undo, we in fact observed that the DELETE generated the most undo. The INSERT generated very little undo that needed to be logged. The UPDATE generated an amount equal to the before image of the data that was changed, and the DELETE generated the entire set of data written to the undo segment.

As previously mentioned, you must also take into consideration the work performed on an index. You’ll find that an update of an unindexed column not only executes much faster, it also tends to generate significantly less undo than an update of an indexed column. For example, we’ll create a table with two columns, both containing the same information, and index one of them:

EODA@ORA12CR1> create table t
  2  as
  3  select object_name unindexed,
  4         object_name indexed
  5    from all_objects
  6  /
Table created.

EODA@ORA12CR1> create index t_idx on t(indexed);
Index created.

EODA@ORA12CR1> exec dbms_stats.gather_table_stats(user,'T'),
PL/SQL procedure successfully completed.

Now we’ll update the table, first updating the unindexed column and then the indexed column. We’ll need a new V$ query to measure the amount of undo we’ve generated in each case. The following query accomplishes this for us. It works by getting our session ID (SID) from V$MYSTAT, using that to find our record in the V$SESSION view, and retrieving the transaction address (TADDR). It uses the TADDR to pull up our V$TRANSACTION record (if any) and selects the USED_UBLK column—the number of used undo blocks. Since we currently are not in a transaction, we expect it to return zero rows right now:

EODA@ORA12CR1> select used_ublk
  2     from v$transaction
  3     where addr = (select taddr
  4                   from v$session
  5                   where sid = (select sid
  6                                from v$mystat
  7                                where rownum = 1
  8                           )
  9                  )
 10  /

no rows selected

But the query will return a row after the UPDATE starts a transaction:

EODA@ORA12CR1> update t set unindexed = lower(unindexed);
72077 rows updated.

EODA@ORA12CR1> select used_ublk
  2    from v$transaction
  3   where addr = (select taddr
  4                   from v$session
  5                  where sid = (select sid
  6                                 from v$mystat
  7                                where rownum = 1
  8                              )
  9                )
 10  /

 USED_UBLK
----------
      151

EODA@ORA12CR1> commit;
Commit complete.

That UPDATE used 151 blocks to store its undo. The commit would free that up, or release it, so if we rerun the query against V$TRANSACTION, it would once again show us no rows selected. When we update the same data—only indexed this time—we’ll observe the following:

EODA@ORA12CR1> update t set indexed = lower(indexed);
72077 rows updated.

EODA@ORA12CR1> select used_ublk
  2    from v$transaction
  3   where addr = (select taddr
  4                   from v$session
  5                  where sid = (select sid
  6                                 from v$mystat
  7                                where rownum = 1
  8                              )
  9                )
 10  /

 USED_UBLK
----------
      854

As you can see, updating that indexed column in this example generated several times as much undo. This is due to the inherit complexity of the index structure itself and the fact that we updated every single row in the table—moving every single index key value in this structure.

ORA-01555: Snapshot Too Old Error

In Chapter 5, we briefly investigated the ORA-01555 error and looked at one cause of it: committing too frequently. Here we’ll take a much more detailed look at the causes and solutions for the ORA-01555 error. ORA-01555 is one of those errors that confound people. It is the foundation for many myths, inaccuracies, and suppositions.

Note ORA-01555 is not related to data corruption or data loss at all. It is a “safe” error in that regard; the only outcome is that the query that received this error is unable to continue processing.

The error is actually straightforward and has only two real causes, but since there’s a special case of one of them that happens so frequently, I’ll say that there are three:

The undo segments are too small for the work you perform on your system.
Your programs fetch across COMMITs (actually a variation on the preceding point). We covered this in Chapter 5.
Block cleanout.

The first two points are directly related to Oracle’s read-consistency model. As you recall from Chapter 4, the results of your query are preordained, meaning they are well-defined before Oracle goes to retrieve even the first row. Oracle provides this consistent point in time “snapshot” of the database by using the undo segments to roll back blocks that have changed since your query began. Every statement you execute, such as the following:

update t set x = 5 where x = 2;
insert into t select * from t where x = 2;
delete from t where x = 2;
select * from t where x = 2;

will see a read-consistent view of T and the set of rows where X=2, regardless of any other concurrent activity in the database.

Note The four statements presented here are just examples of the types of statements that would see a read-consistent view of T. They are not meant to be run as a single transaction in the database, as the first update would cause the following three statements to see no records. They are purely illustrative.

All statements that “read” the table take advantage of this read consistency. In the example just shown, the UPDATE reads the table to find rows where x=2 (and then UPDATEs them). The INSERT reads the table to find rows where X=2, and then INSERTs them, and so on. It is this dual use of the undo segments, both to roll back failed transactions and to provide for read consistency that results in the ORA-01555 error.

The third item in the previous list is a more insidious cause of ORA-01555 in that it can happen in a database where there is a single session, and this session is not modifying the tables that are being queried when the ORA-01555 error is raised! This doesn’t seem possible—why would we need undo data for a table we can guarantee is not being modified? We’ll find out shortly.

Before we take a look at all three cases with illustrations, I’d like to share with you the solutions to the ORA-01555 error, in general:

Set the parameter UNDO_RETENTION properly (larger than the amount of time it takes to execute your longest-running transaction). V$UNDOSTAT can be used to determine the duration of your long-running queries. Also, ensure sufficient space on disk has been set aside so the undo segments are allowed to grow to the size they need to be based on the requested UNDO_RETENTION.
Increase the size of or add more undo segments when using manual undo management. This decreases the likelihood of undo data being overwritten during the course of your long-running query. This method goes toward solving all three of the previous points. Note that this is definitely not the preferred method; automatic undo management is highly recommended.
Reduce the runtime of your query (tune it). This is always a good thing if possible, so it might be the first thing you try. It reduces the need for larger undo segments. This method goes toward solving all three of the previous points.
Gather statistics on related objects. This helps avoid the third point listed earlier. Since the block cleanout is the result of a very large mass UPDATE or INSERT, statistics-gathering needs to be done anyway after a mass UPDATE or large load.

We’ll come back to these solutions, as they are important to know. It seemed appropriate to display them prominently before we begin.

Undo Segments Are in Fact Too Small

The scenario is this: you have a system where the transactions are small. As a result, you need very little undo segment space allocated. Say, for example, the following is true:

Each transaction generates 8KB of undo on average.
You do five of these transactions per second on average (40KB of undo per second, 2,400KB per minute).
You have a transaction that generates 1MB of undo that occurs once per minute on average. In total, you generate about 3.5MB of undo per minute.
You have 15MB of undo configured for the system.

That is more than sufficient undo for this database when processing transactions. The undo segments will wrap around and reuse space about every three to four minutes or so, on average. If you sized undo segments based on your transactions that do modifications, you did all right.

In this same environment, however, you have some reporting needs. Some of these queries take a really long time to run—five minutes, perhaps. Here is where the problem comes in. If these queries take five minutes to execute and they need a view of the data as it existed when the query began, you have a very good probability of the ORA-01555 error occurring. Since your undo segments will wrap during this query execution, you know that some undo information generated since your query began is gone—it has been overwritten. If you hit a block that was modified near the time you started your query, the undo information for this block will be missing, and you will receive the ORA-01555 error.

Here’s a small example. Let’s say we have a table with blocks 1, 2, 3, . . . 1,000,000 in it. Table 8-1 shows a sequence of events that could occur.

Table 8-1. Long-Running Query Timeline

Time (Minutes:Seconds)	Action
0:00	Our query begins.
0:01	Another session UPDATEs block 1,000,000. Undo information for this is recorded into some undo segment.
0:01	This UPDATE session COMMITs. The undo data it generated is still there, but is now subject to being overwritten if we need the space.
1:00	Our query is still chugging along. It is at block 200,000.
1:01	Lots of activity going on. We have generated a little over 14MB of undo by now.
3:00	Our query is still going strong. We are at block 600,000 or so by now.
4:00	Our undo segments start to wrap around and reuse the space that was active when our query began at time 0:00. Specifically, we have just reused the undo segment space that the UPDATE to block 1,000,000 used back at time 0:01.
5:00	Our query finally gets to block 1,000,000. It finds it has been modified since the query began. It goes to the undo segment and attempts to find the undo for that block to get a consistent read on it. At this point, it discovers the information it needs no longer exists. ORA-01555 is raised and the query fails.

This is all it takes. If your undo segments are sized such that they have a good chance of being reused during the execution of your queries, and your queries access data that will probably be modified, you stand a very good chance of hitting the ORA-01555 error on a recurring basis. If this is the case, you must set your UNDO_RETENTION parameter higher and let Oracle take care of figuring out how much undo to retain (this is the suggested approach; it’s much easier than trying to figure out the perfect undo size yourself) or resize your undo segments and make them larger (or have more of them). You need enough undo configured to last as long as your long-running queries. The system was sized for the transactions that modify data—you forgot to size for the other components of the system.

With Oracle9i and above, there are two methods to manage undo in the system:

Automatic undo management: Here, Oracle is told how long to retain undo for, via the UNDO_RETENTION parameter. Oracle will determine how many undo segments to create based on concurrent workload and how big each should be. The database can even reallocate extents between individual undo segments at runtime to meet the UNDO_RETENTION goal set by the DBA. This is the recommended approach for undo management.
Manual undo management: Here, the DBA does the work. The DBA determines how many undo segments to manually create, based on the estimated or observed workload. The DBA determines how big the segments should be based on transaction volume (how much undo is generated) and the length of the long-running queries.

Manual undo management, where a DBA figures out how many undo segments to have and how big each should be, is where one of the points of confusion comes into play. People say, “Well, we have XMB of undo configured, but this can grow. We have MAXEXTENTS set at 500 and each extent is 1MB, so the undo can get quite large.” The problem is that the manually managed undo segments will never grow due to a query; they will grow only due to INSERTs, UPDATEs, and DELETEs. The fact that a long-running query is executing does not cause Oracle to grow a manual undo segment to retain the data in case it might need it. Only a long-running UPDATE transaction would do this. In the preceding example, even if the manual undo segments had the potential to grow, they will not. What you need to do for this system is have manual undo segments that are already big. You need to permanently allocate space to the undo segments, not give them the opportunity to grow on their own.

The only solutions to this problem are to either make it so that the manual undo segments are sized so that they wrap only every six to ten minutes, or make it so your queries never take more than two to three minutes to execute. The first suggestion is based on the fact that you have queries that take five minutes to execute. In this case, the DBA needs to make the amount of permanently allocated undo two to three times larger. The second (perfectly valid) suggestion is equally appropriate. Any time you can make the queries go faster, you should. If the undo generated since the time your query began is never overwritten, you’ll avoid ORA-01555.

Under automatic undo management, things are much easier from the ORA-01555 perspective. Rather than having to figure out how big the undo space needs to be and then preallocating it, the DBA tells the database how long the longest-running query is and sets that value in the UNDO_RETENTION parameter. Oracle will attempt to preserve undo for at least that duration of time. If sufficient space to grow has been allocated, Oracle will extend an undo segment and not wrap around—in trying to obey the UNDO_RETENTION period. This is in direct contrast to manually managed undo, which will wrap around and reuse undo space as soon as it can. It is primarily for this reason, the support of the UNDO_RETENTION parameter, that I highly recommend automatic undo management whenever possible. That single parameter reduces the possibility of an ORA-01555 error greatly (when it is set appropriately).

When using manual undo management, it is also important to remember that the probability of an ORA-01555 error is dictated by the smallest undo segment in your system, not the largest and not the average. Adding one “big” undo segment will not make this problem go away. It only takes the smallest undo segment to wrap around while a query is processing, and this query stands a chance of an ORA-01555 error. This is why I was a big fan of equi-sized rollback segments when using the legacy rollback segments. In this fashion, each undo segment is both the smallest and the largest. This is also why I avoid using “optimally” sized undo segments. If you shrink an undo segment that was forced to grow, you are throwing away a lot of undo that may be needed right after that. It discards the oldest undo data when it does this, minimizing the risk, but still the risk is there. I prefer to manually shrink undo segments during off-peak times if at all.

I am getting a little too deep into the DBA role at this point, so we’ll move on to the next case. It’s just important that you understand that the ORA-01555 error in this case is due to the system not being sized correctly for your workload. The only solution is to size correctly for your workload. It is not your fault, but it is your problem since you hit it. It’s the same as if you run out of temporary space during a query. You either configure sufficient temporary space for the system, or you rewrite the queries so they use a plan that does not require temporary space.

To demonstrate this effect, we can set up a small, but somewhat artificial test. We’ll create a very small undo tablespace with one session that will generate many small transactions, virtually assuring us that it will wrap around and reuse its allocated space many times—regardless of the UNDO_RETENTION setting, since we are not permitting the undo tablespace to grow. The session that uses this undo segment will be modifying a table, T. It will use a full scan of T and read it from “top” to “bottom.” In another session, we will execute a query that will read the table T via an index. In this fashion, it will read the table somewhat randomly: it will read row 1, then row 1,000, then row 500, then row 20,001, and so on. In this way, we will tend to visit blocks very randomly and perhaps many times during the processing of our query. The odds of getting an ORA-01555 error in this case are virtually 100 percent. So, in one session we start with the following:

EODA@ORA12CR1> create undo tablespace undo_small
  2  datafile '/tmp/undo.dbf' size 2m
  3  autoextend off
  4  /
Tablespace created.

EODA@ORA12CR1> alter system set undo_tablespace = undo_small;
System altered.

Now, we’ll set up the table T to query and modify. Note that we are ordering the data randomly in this table. The CREATE TABLE AS SELECT tends to put the rows in the blocks in the order it fetches them from the query. We’ll just scramble the rows up so they are not artificially sorted in any order, randomizing their distribution:

EODA@ORA12CR1> drop table t purge;
Table dropped.

EODA@ORA12CR1> create table t
  2  as
  3  select *
  4    from all_objects
  5   order by dbms_random.random;
Table created.

EODA@ORA12CR1> alter table t add constraint t_pk primary key(object_id);
Table altered.

EODA@ORA12CR1> exec dbms_stats.gather_table_stats(user, 'T', cascade=> true );
PL/SQL procedure successfully completed.

And now we are ready to do our modifications:

EODA@ORA12CR1> begin
  2      for x in ( select rowid rid from t )
  3      loop
  4          update t set object_name = lower(object_name) where rowid = x.rid;
  5          commit;
  6      end loop;
  7  end;
  8  /

Now, while that PL/SQL block of code is running, we will run a query in another session. That other query will read table T and process each record. It will spend about 1/100 of a second processing each record before fetching the next (simulated using DBMS_LOCK.SLEEP(0.01)). We will use the FIRST_ROWS hint in the query to have it use the index we created to read the rows out of the table via the index sorted by OBJECT_ID. Since the data was randomly inserted into the table, we would tend to query blocks in the table rather randomly. This block will only run for a couple of seconds before failing:

EODA@ORA12CR1> declare
  2      cursor c is
  3      select /*+ first_rows */ object_name
  4        from t
  5       order by object_id;
  6
  7      l_object_name t.object_name%type;
  8      l_rowcnt      number := 0;
  9  begin
 10      open c;
 11      loop
 12          fetch c into l_object_name;
 13          exit when c%notfound;
 14          dbms_lock.sleep(0.01 );
 15          l_rowcnt := l_rowcnt+1;
 16      end loop;
 17      close c;
 18  exception
 19      when others then
 20          dbms_output.put_line('rows fetched = ' || l_rowcnt );
 21          raise;
 22  end;
 23  /
rows fetched = 159
declare
*
ERROR at line 1:
ORA-01555: snapshot too old: rollback segment number 16 with name
"_SYSSMU16_587457654$" too small
ORA-06512: at line 21

As you can see, it got to process only 159 records before failing with the ORA-01555: snapshot too old error. To correct this, we want to make sure two things are done:

UNDO_RETENTION is set in the database to be at least long enough for this read process to complete. That will allow the database to grow the undo tablespace to hold sufficient undo for us to complete.
The undo tablespace is allowed to grow or you manually allocate more disk space to it.

For this example, I have determined my long-running process takes about 720 seconds to complete (I have about 72,000 records in the table, so at 0.01 seconds per row we have 720 seconds). My UNDO_RETENTION is set to 900 (this is in seconds, so the undo retention is about 15 minutes). I altered the undo tablespace’s data file to permit it to grow by 1MB at a time, up to 2GB in size:

EODA@ORA12CR1> alter database
  2  datafile '/tmp/undo.dbf'
  3  autoextend on
  4  next 1m
  5  maxsize 2048m;
Database altered.

When I ran the processes concurrently again, both ran to completion. The undo tablespace’s data file grew this time, because it was allowed to and the undo retention I set up said to.

EODA@ORA12CR1> select bytes/1024/1024
  2    from dba_data_files
  3  where tablespace_name = 'UNDO_SMALL';

BYTES/1024/1024
---------------
             21

So, instead of receiving an error, we completed successfully, and the undo grew to be large enough to accommodate our needs. It is true that in this example, getting the error was purely due to the fact that we read the table T via the index and performed random reads all over the table. If we had rapidly full-scanned the table instead, there is a good chance we would not have received the ORA-01555 error in this particular case. This is because both the SELECT and UPDATE would have been full-scanning T, and the SELECT could most likely race ahead of the UPDATE during its scan (the SELECT just has to read, but the UPDATE must read and update and therefore could go slower). By doing the random reads, we increase the probability that the SELECT will need to read a block, which the UPDATE modified and committed many rows ago. This just demonstrates the somewhat insidious nature of the ORA-01555 error. Its occurrence depends on how concurrent sessions access and manipulate the underlying tables.

Delayed Block Cleanout

This cause of the ORA-01555 error is hard to eliminate entirely, but it is rare anyway, as the circumstances under which it occurs do not happen frequently (at least not in Oracle8i and above anymore). We have already discussed the block cleanout mechanism (in Chapter 7), but to summarize, it is the process whereby the next session to access a block after it has been modified may have to check to see if the transaction that last modified the block is still active. Once the process determines that the transaction is not active, it cleans out the block so that the next session to access it does not have to go through the same process again. To clean out the block, Oracle determines the undo segment used for the previous transaction (from the block’s header) and then determines whether the undo header indicates that the transaction has been committed and, if so, when it committed. This confirmation is accomplished in one of two ways. One way is that Oracle can determine that the transaction committed a long time ago, even though its transaction slot has been overwritten in the undo segment transaction table. The other way is that the COMMIT SCN is still in the transaction table of the undo segment, meaning the transaction committed a short time ago, and its transaction slot hasn’t been overwritten.

To receive the ORA-01555 error from a delayed block cleanout, all of the following conditions must be met:

A modification is made and COMMITed, and the blocks are not cleaned out automatically (e.g., the transaction modified more blocks than can fit in 10 percent of the SGA block buffer cache).
These blocks are not touched by another session and will not be touched until our unfortunate query (displayed shortly) hits it.
A long-running query begins. This query will ultimately read some of those blocks from earlier. This query starts at SCN t1, the read-consistent SCN it must roll data back to in order to achieve read consistency. The transaction entry for the modification transaction is still in the undo segment transaction table when we begin.
During the query, many commits are made in the system. These transactions don’t touch the blocks in question (if they did, we wouldn’t have the impending problem as they would clean out the old transaction—solving the clean-out issue).
The transaction tables in the undo segments roll around and reuse slots due to the high degree of COMMITs. Most important, the transaction entry for the original modification transaction is cycled over and reused. In addition, the system has reused undo segment extents, preventing a consistent read on the undo segment header block itself.
Additionally, the lowest SCN recorded in the undo segment now exceeds t1 (it is higher than the read-consistent SCN of the query), due to the large number of commits.

When our query gets to the block that was modified and committed before it began, it is in trouble. Normally, it would go to the undo segment pointed to by the block and find the status of the transaction that modified it (in other words, it would find the COMMIT SCN of that transaction). If the COMMIT SCN is less than t1, our query can use this block. If the COMMIT SCN is greater than t1, our query must roll back that block. The problem is, however, that our query is unable to determine in this particular case if the COMMIT SCN of the block is greater than or less than t1. It is unsure as to whether it can use that block image or not. The ORA-01555 error then results.

To see this, we will create many blocks in a table that need to be cleaned out. We will then open a cursor on that table and allow many small transactions to take place against some other table—not the table we just updated and opened the cursor on. Finally, we will attempt to fetch the data for the cursor. Now, we know that the data required by the cursor will be “OK”—we should be able to see all of it since the modifications to the table would have taken place and been committed before we open the cursor. When we get an ORA-01555 error this time, it will be because of the previously described issue with delayed block cleanout. To set up for this example, we’ll use

The 4MB UNDO_SMALL undo tablespace.
A 16MB buffer cache, which is enough to hold about 2,000 blocks. This is so we can get some dirty blocks flushed to disk to observe this phenomenon.

Before we start, we’ll create the undo tablespace and the “big” table we’ll be querying:

EODA@ORA12CR1> create undo tablespace undo_small
  2  datafile '/tmp/undo.dbf' size 4m
  3  autoextend off
  4  /
Tablespace created.

EODA@ORA12CR1> create table big
  2  as
  3  select a.*, rpad('*',1000,'*') data

  4    from all_objects a;
Table created.

EODA@ORA12CR1> alter table big add constraint big_pk
  2  primary key(object_id);
Table altered.

EODA@ORA12CR1> exec dbms_stats.gather_table_stats(user, 'BIG' );
PL/SQL procedure successfully completed.

Note You might wonder why I didn’t use CASCADE=>TRUE on the gather-statistics call to gather statistics on the index created by default by the primary key constraint. That is because since Oracle 10g, a CREATE INDEX or ALTER INDEX REBUILD has implicit compute statistics added to it already whenever the table it is indexing is not empty. So, the very act of creating the index has the side effect of gathering statistics on itself. There’s no need to regather the statistics we already have.

The previous table will have lots of blocks as we get about six or seven rows per block using that big data field, and my ALL_OBJECTS table has over 70,000 rows. Next, we’ll create the small table the many little transactions will modify:

EODA@ORA12CR1> create table small ( x int, y char(500) );
Table created.

EODA@ORA12CR1> insert into small select rownum, 'x' from all_users;
25 rows created.

EODA@ORA12CR1> commit;
Commit complete.

EODA@ORA12CR1> exec dbms_stats.gather_table_stats(user, 'SMALL' );
PL/SQL procedure successfully completed.

Now, we’ll dirty up that big table. We have a very small undo tablespace, so we’ll want to update as many blocks of this big table as possible, all while generating the least amount of undo possible. We’ll use a fancy UPDATE statement to do that. Basically, the following subquery is finding the “first” rowid of a row on every block. That subquery will return a rowid for every database block identifying a single row on it. We’ll update that row, setting a VARCHAR2(1) field. This will let us update all of the blocks in the table (some 8,000 plus in the example), flooding the buffer cache with dirty blocks that will have to be written out (we have room for only 500 right now). We’ll make sure we are using that small undo tablespace as well. To accomplish this and not exceed the capacity of our undo tablespace, we’ll craft an UPDATE statement that will update just the “first row” on each block. The ROW_NUMBER() built-in analytic function is instrumental in this operation; it assigns the number 1 to the “first row” by database block in the table, which would be the single row on the block we would update:

EODA@ORA12CR1> alter system set undo_tablespace = undo_small;
System altered.

EODA@ORA12CR1> update big
  2     set temporary = temporary
  3   where rowid in
  4  (
  5  select r
  6    from (
  7  select rowid r, row_number() over
  8         (partition by dbms_rowid.rowid_block_number(rowid) order by rowid) rn
  9    from big
 10         )
 11   where rn = 1
 12  )
 13  /
3064 rows updated.

EODA@ORA12CR1> commit;
Commit complete.

OK, so now we know that we have lots of dirty blocks on disk. We definitely wrote some of them out, because we just didn’t have the room to hold them all. Next, we will open a cursor, but it won’t yet fetch a single row. Remember, when we open the cursor, the resultset is preordained, so even though Oracle did not actually process a row of data, the act of opening that resultset fixed the point in time the results must be “as of.” Now since we’ll be fetching the data we just updated and committed, and we know no one else is modifying the data, we should be able to retrieve the rows without needing any undo at all. But that’s where the delayed block cleanout rears its head. The transaction that modified these blocks is so new that Oracle will be obliged to verify that it committed before we begin, and if we overwrite that information (also stored in the undo tablespace), the query will fail. So, here is the opening of the cursor:

EODA@ORA12CR1> variable x refcursor
EODA@ORA12CR1> exec open :x for select * from big where object_id < 100;
PL/SQL procedure successfully completed.

EODA@ORA12CR1>
EODA@ORA12CR1> !./run.sh

run.sh is a shell script; it simply fired off nine SQL*Plus sessions using a command:

$ORACLE_HOME/bin/sqlplus eoda/foo @test2 1  &
$ORACLE_HOME/bin/sqlplus eoda/foo @test2 2  &
... (3-8 would go here )...
$ORACLE_HOME/bin/sqlplus eoda/foo @test2 9  &

where each SQL*Plus session was passed a different number (that was number 1; there was a 2, 3, and so on). In the prior script, ensure you replace the eoda/foo with the username and password for your environment. The script test2.sql they each ran is as follows:

begin
    for i in 1 .. 5000
    loop
        update small set y = i where x= &1;
        commit;
    end loop;
end;
/
exit

So, we had nine sessions inside of a tight loop initiate many transactions. The run.sh script waited for the nine SQL*Plus sessions to complete their work, and then we returned to our session, the one with the open cursor. Upon attempting to print it out, we observe the following:

EODA@ORA12CR1> print x
ERROR:
ORA-01555: snapshot too old: rollback segment number 17 with name
"_SYSSMU17_452567810$" too small
no rows selected

As I said, the preceding is a rare case. It took a lot of conditions, all of which must exist simultaneously to occur. We needed blocks that were in need of a cleanout to exist, and these blocks are rare in Oracle8i and above. A DBMS_STATS call to collect statistics gets rid of them so the most common causes—large mass updates and bulk loads—should not be a concern, since the tables need to be analyzed after such operations anyway. Most transactions tend to touch less than 10 percent of the blocks in the buffer cache; hence, they do not generate blocks that need to be cleaned out. If you believe you’ve encountered this issue, in which a SELECT against a table that has no other DML applied to it is raising the ORA-01555 error, try the following solutions:

Ensure you are using “right-sized” transactions in the first place. Make sure you are not committing more frequently than you should.
Use DBMS_STATS to scan the related objects, cleaning them out after the load. Since the block cleanout is the result of a very large mass UPDATE or INSERT, this needs to be done anyway.
Allow the undo tablespace to grow by giving it the room to extend and increasing the undo retention. This decreases the likelihood of an undo segment transaction table slot being overwritten during the course of your long-running query. This is the same as the solution for the other cause of an ORA-01555 error (the two are very much related; you experience undo segment reuse during the processing of your query). In fact, I reran the preceding example with the undo tablespace set to autoextend 1MB at a time, with an undo retention of 900 seconds. The query against the table BIG completed successfully.
Reduce the runtime of your query—tune it. This is always good if possible, so it might be the first thing you try.

Summary

In this chapter we investigated which statements generate the least and most undo. In general an INSERT generates the least amount, an UPDATE generates more than INSERT, and a DELETE generates the most undo.

The bulk of this chapter explored the causes of the infamous ORA-01555 error (snapshot too old). This error can occur because the undo tablespace has been sized too small. The DBA must ensure that the undo tablespace is sized large enough mostly eliminate this as a cause for the error. We also looked at how a delayed block cleanout can cause issues. If you’ve correctly sized your transactions and your undo tablespace, you will probably rarely run into this error. Tuning the query that throws the ORA-01555 error should always be one of the first methods employed to resolve the issue.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 8: Investigating Undo

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 8: Investigating Undo