Chapter 4 – Deep Dive Inside a Netezza Extent and Row

“Write a wise saying and your name will live forever.”

Anonymous

 

Netezza Performance – Three Things to Keep in Mind

image

Follow these three rules, and you will have a well oiled machine. This chapter will explain in detail exactly how to do it.

 

How Netezza Allocates Data Storage

image

The first thing to understand is that when data is first being loaded to a table, each SPU automatically allocates 3 MB of disk space. When the data assigned to that SPU fills up the 3 MB extent, then Netezza allocates another 3 MBs. Think of this a one suitcase, then two suitcases, then three suitcases and so on.

 

Extents and Zone Maps

image

The first thing to understand is that Netezza dynamically allocates space as a table grows. When a table is created and the data begins to be loaded, Netezza allocates 3 MBs for the data to be stored. This is called an extent. Only one table will be utilized for this extent. When the extent is filled, then Netezza allocates another 3 MB extent. Each extent has their own zone map so Netezza can check to see if the extent needs to be read when satisfying a query.

 

How Data Might Look in an Extent

image

I want you to imagine that we created a table that had only one column and that column was Order_Date. On January 1st, data was loaded. Notice in the examples that as each month is loaded, it continues to fill up the 128 KB pages and the Order_Date is ordered (because as each day was loaded it took up the next slot). Then, notice how the Zone Map has the min and max Order_Date. The Zone Map is designed to inform Netezza whether this block (Extent) should be read when this table is queried. If the SQL WHERE clause is only looking for data within a certain date range, then the Zone Map will let Netezza know if this Extent should be read or ignored. If the query is looking for data in April, then there is no reason to read extent 1.

 

Why Dates Are Generally Not Good Distribution Keys

“Be the change that you want to see in the world.”

Mahatma Gandhi

image

If you were to distribute by Month of Order_Date, then all of the January orders would go to the same SPU, and all of the February orders to a certain SPU, and this would continue for all months. Try and keep the concept that parallel processing is built for speed. When users want to find out about orders in January, then all of the processing is done on one SPU. We want all SPUs processing simultaneously and equal amount of the data.

 

When a Table is Created, a Table Header is Created

image

All SPUs have the exact same tables because when a table is created, a table header is immediately created on every SPU. When the data is loaded, each SPU will open a 3 MB extent on its disk and fill it with data.

 

Every SPU has the Exact Same Tables

image

The first thing that happens when a table is created is that a Table Header is created on every SPU. Each SPU now has the table, and they know the table’s columns and distribution key. So, each SPU has the exact same tables and the exact same number of tables. It is just like looking into a mirror. Each SPU holds a portion of the rows of every table that is hashed, and every SPU holds the exact same rows for tables that are Distributed by RANDOM. In this parallel processing system, every SPU does an equal amount of the work. Since each SPU has a portion of the data on a table, its portion is called a “slice”.

 

All Netezza Tables are spread across All SPUs

image

When data is loaded, each SPU holds a portion of the rows of every table that is hashed or randomly distributed.

 

The Table Header and the Data Rows are Stored Separately

image

The Table Header is created on each SPU when the table is created. The rows are stored in data blocks (3 MB Extents) when the data is loaded. Both are stored separately on each SPU. There is one zone map for each Extent.

 

A SPU Stores Rows of a Table inside a Data Block (Extent)

image

Each SPU stores the rows they own inside a data block. This is called an Extent in Netezza. The Extent consists of a 3 MB disk allocation. Above, you can see that this SPU is responsible for four rows, and those four rows are held in a single Extent.

 

To Read Rows, a SPU Moves the Data Block into Memory

image

To read data, a SPU must transfer the table header and the data block from inside its disk to its dedicated memory. Nothing happens on the disk except storage. The FPGA card is designed to eliminate certain blocks that the system is sure does NOT contain the data.

 

A Full Table Scan Means All SPUs must Read All Rows

image

A Full Table Scan means that all SPUs must transfer their data block from their disk into their dedicated memory, and then each SPU must read each row from the table starting from the first row they own to the last row.

 

The “Achilles Heel”, or Slowest Process, is Block Transfer

image

To read or write data, a SPU must move the data block from disk into its Memory. This is the Achilles heel of the system, and it is painfully slow!

 

Each Table has a Distribution Key

image

Each table has a column as its Distribution Key. When users query a table and use the Distribution Key column in their SQL, only a “Single SPU” is used.

 

A Query Using the Distribution Key uses a Single SPU.

image

Choosing a good Distribution Key results in one SPU being used in the query.

 

As Rows are Added, Another Extent is Added

image

A Netezza table can have trillions of rows, so an individual SPU might have millions, or even billions, of rows for a single table. As rows of a table are inserted inside a 3 MB extent the extent eventually fills. Once a extent fills, another extent is allocated.

 

A Full Table Scan Means All SPUs Read All Blocks

image

A Full Table Scan means that all SPUs must transfer their data block(s) (extent(s)) from their disk into their memory, and then read each block to evaluate the first row they own to the last row. Each SPU above process two blocks, so there is twice the transfer.

 

Distribution Key Query uses One SPU

image

SPU 2 was contacted and told to transfer its Employee_Table blocks into memory. Now, you see the importance of utilizing only a single SPU in tactical queries.

 

Using a CTAS to Improve Zone Map Selectivity

image

Now we’re cooking! We just sorted the data on our CTAS table. The Zone Maps are perfect for those who are querying the order_ctas_date table.

 

How A CTAS with ORDER BY Improves Queries

image

The zone maps will eliminate the need to read the majority of blocks (3 MB extents). Since the data is sorted on the column Order_Date, the zone maps min and max values have tight date ranges. Our query only wants January data, so the majority of data blocks won’t need to be read. The FPGA card reads the zone map, and only reads blocks with January data.

 

Each SPU Can Have Many Blocks for a Single Table

image

Each SPU has the same table header but contains different data rows for each table. Some tables are huge like the Order_Table. As more and more data was loaded, it continued to add more extents. The Customer_Table is smaller and uses just one extent.

 

A Full Table Scan Means All SPUs Read All Blocks

image

A Full Table Scan means that all SPUs must transfer their data blocks (extents) from their disk into their memory, and then read each block to evaluate the first row they own to the last row. Each SPU above process all blocks one at a time.

 

Quiz – How Many Blocks Move into SPU Memory?

image

If a Full Table Scan was needed to satisfy a query using the Order_Table, how many data blocks would need to be transferred from disk to memory per SPU? ________

 

Answer – How Many Blocks Move into SPU Memory?

image

If a Full Table Scan was needed to satisfy a query using the Order_Table, how many data blocks would need to be transferred from disk to memory per SPU? 24

 

Netezza Rowid, CreateXid, DeleteXid

image

 

An Update of Multiple Rows

image

 

How to Undo an Update of Multiple Rows

image

 

An Update Undo in Action

image

 

A Delete Example

image

 

A Delete Example Query

image

 

How to Undo a Delete

image

 

How to Undo a Delete In Action

image

 

An Insert Example

image

 

An Insert Example Query

image

 

How to Undo an Insert

image

 

How to Undo an Insert in Action

image

 

What is the Purpose of the GROOM Command?

image

The GroomCommand gets rid of the deletedrows so space is reclaimed!

 

The Groom Command Refreshes Zone Maps

image

 

Groom Command Syntax

There are only two different ways to run the GROOM command:

image

You can only run the GROOM command if you are the owner of the table or you have been given GROOM privilege. GROOM runs as a background process, so full table access is available while the records are being groomed.

 

Checking Groom Progress

image

Because GROOM TABLE operations are done as background processes, you may want or need to check the progress of currently running commands. There are two ways that you can do this and they are listed above.

 

Automatic Database Statistics

image

Netezza handles things automatically to take the stress off of the DBA and IT personnel.

 

Drop Table, Truncate and Delete Compared

image

If you want to delete all the rows in the table but maintain the table structure, it is recommended to Truncate the table. Otherwise, you will need to delete the rows and then run the GROOM command.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset