Chapter 4 – Deep Dive Inside a Netezza Extent and Row

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

“Write a wise saying and your name will live forever.”

Anonymous

Netezza Performance – Three Things to Keep in Mind

Follow these three rules, and you will have a well oiled machine. This chapter will explain in detail exactly how to do it.

How Netezza Allocates Data Storage

The first thing to understand is that when data is first being loaded to a table, each SPU automatically allocates 3 MB of disk space. When the data assigned to that SPU fills up the 3 MB extent, then Netezza allocates another 3 MBs. Think of this a one suitcase, then two suitcases, then three suitcases and so on.

Extents and Zone Maps

The first thing to understand is that Netezza dynamically allocates space as a table grows. When a table is created and the data begins to be loaded, Netezza allocates 3 MBs for the data to be stored. This is called an extent. Only one table will be utilized for this extent. When the extent is filled, then Netezza allocates another 3 MB extent. Each extent has their own zone map so Netezza can check to see if the extent needs to be read when satisfying a query.

How Data Might Look in an Extent

I want you to imagine that we created a table that had only one column and that column was Order_Date. On January 1^st, data was loaded. Notice in the examples that as each month is loaded, it continues to fill up the 128 KB pages and the Order_Date is ordered (because as each day was loaded it took up the next slot). Then, notice how the Zone Map has the min and max Order_Date. The Zone Map is designed to inform Netezza whether this block (Extent) should be read when this table is queried. If the SQL WHERE clause is only looking for data within a certain date range, then the Zone Map will let Netezza know if this Extent should be read or ignored. If the query is looking for data in April, then there is no reason to read extent 1.

Why Dates Are Generally Not Good Distribution Keys

“Be the change that you want to see in the world.”

Mahatma Gandhi

If you were to distribute by Month of Order_Date, then all of the January orders would go to the same SPU, and all of the February orders to a certain SPU, and this would continue for all months. Try and keep the concept that parallel processing is built for speed. When users want to find out about orders in January, then all of the processing is done on one SPU. We want all SPUs processing simultaneously and equal amount of the data.

When a Table is Created, a Table Header is Created

All SPUs have the exact same tables because when a table is created, a table header is immediately created on every SPU. When the data is loaded, each SPU will open a 3 MB extent on its disk and fill it with data.

Every SPU has the Exact Same Tables

The first thing that happens when a table is created is that a Table Header is created on every SPU. Each SPU now has the table, and they know the table’s columns and distribution key. So, each SPU has the exact same tables and the exact same number of tables. It is just like looking into a mirror. Each SPU holds a portion of the rows of every table that is hashed, and every SPU holds the exact same rows for tables that are Distributed by RANDOM. In this parallel processing system, every SPU does an equal amount of the work. Since each SPU has a portion of the data on a table, its portion is called a “slice”.

All Netezza Tables are spread across All SPUs

When data is loaded, each SPU holds a portion of the rows of every table that is hashed or randomly distributed.

The Table Header and the Data Rows are Stored Separately

The Table Header is created on each SPU when the table is created. The rows are stored in data blocks (3 MB Extents) when the data is loaded. Both are stored separately on each SPU. There is one zone map for each Extent.

A SPU Stores Rows of a Table inside a Data Block (Extent)

Each SPU stores the rows they own inside a data block. This is called an Extent in Netezza. The Extent consists of a 3 MB disk allocation. Above, you can see that this SPU is responsible for four rows, and those four rows are held in a single Extent.

To Read Rows, a SPU Moves the Data Block into Memory

To read data, a SPU must transfer the table header and the data block from inside its disk to its dedicated memory. Nothing happens on the disk except storage. The FPGA card is designed to eliminate certain blocks that the system is sure does NOT contain the data.

A Full Table Scan Means All SPUs must Read All Rows

A Full Table Scan means that all SPUs must transfer their data block from their disk into their dedicated memory, and then each SPU must read each row from the table starting from the first row they own to the last row.

The “Achilles Heel”, or Slowest Process, is Block Transfer

To read or write data, a SPU must move the data block from disk into its Memory. This is the Achilles heel of the system, and it is painfully slow!

Each Table has a Distribution Key

Each table has a column as its Distribution Key. When users query a table and use the Distribution Key column in their SQL, only a “Single SPU” is used.

A Query Using the Distribution Key uses a Single SPU.

Choosing a good Distribution Key results in one SPU being used in the query.

As Rows are Added, Another Extent is Added

A Netezza table can have trillions of rows, so an individual SPU might have millions, or even billions, of rows for a single table. As rows of a table are inserted inside a 3 MB extent the extent eventually fills. Once a extent fills, another extent is allocated.

A Full Table Scan Means All SPUs Read All Blocks

A Full Table Scan means that all SPUs must transfer their data block(s) (extent(s)) from their disk into their memory, and then read each block to evaluate the first row they own to the last row. Each SPU above process two blocks, so there is twice the transfer.

Distribution Key Query uses One SPU

SPU 2 was contacted and told to transfer its Employee_Table blocks into memory. Now, you see the importance of utilizing only a single SPU in tactical queries.

Using a CTAS to Improve Zone Map Selectivity

Now we’re cooking! We just sorted the data on our CTAS table. The Zone Maps are perfect for those who are querying the order_ctas_date table.

How A CTAS with ORDER BY Improves Queries

The zone maps will eliminate the need to read the majority of blocks (3 MB extents). Since the data is sorted on the column Order_Date, the zone maps min and max values have tight date ranges. Our query only wants January data, so the majority of data blocks won’t need to be read. The FPGA card reads the zone map, and only reads blocks with January data.

Each SPU Can Have Many Blocks for a Single Table

Each SPU has the same table header but contains different data rows for each table. Some tables are huge like the Order_Table. As more and more data was loaded, it continued to add more extents. The Customer_Table is smaller and uses just one extent.

A Full Table Scan Means All SPUs Read All Blocks

A Full Table Scan means that all SPUs must transfer their data blocks (extents) from their disk into their memory, and then read each block to evaluate the first row they own to the last row. Each SPU above process all blocks one at a time.

Quiz – How Many Blocks Move into SPU Memory?

If a Full Table Scan was needed to satisfy a query using the Order_Table, how many data blocks would need to be transferred from disk to memory per SPU? ________

Answer – How Many Blocks Move into SPU Memory?

If a Full Table Scan was needed to satisfy a query using the Order_Table, how many data blocks would need to be transferred from disk to memory per SPU? 24

Netezza Rowid, CreateXid, DeleteXid

An Update of Multiple Rows

How to Undo an Update of Multiple Rows

An Update Undo in Action

A Delete Example

A Delete Example Query

How to Undo a Delete

How to Undo a Delete In Action

An Insert Example

An Insert Example Query

How to Undo an Insert

How to Undo an Insert in Action

What is the Purpose of the GROOM Command?

The GroomCommand gets rid of the deletedrows so space is reclaimed!

The Groom Command Refreshes Zone Maps

Groom Command Syntax

There are only two different ways to run the GROOM command:

You can only run the GROOM command if you are the owner of the table or you have been given GROOM privilege. GROOM runs as a background process, so full table access is available while the records are being groomed.

Checking Groom Progress

Because GROOM TABLE operations are done as background processes, you may want or need to check the progress of currently running commands. There are two ways that you can do this and they are listed above.

Automatic Database Statistics

Netezza handles things automatically to take the stress off of the DBA and IT personnel.

Drop Table, Truncate and Delete Compared

If you want to delete all the rows in the table but maintain the table structure, it is recommended to Truncate the table. Otherwise, you will need to delete the rows and then run the GROOM command.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 4 – Deep Dive Inside a Netezza Extent and Row

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 4 – Deep Dive Inside a Netezza Extent and Row