Chapter 2 - Three Options for Teradata Table Design

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 2 - Three Options for Teradata Table Design

“Design is not just what it looks like and feels like. Design is how it works.”

- Steve Jobs

There are Three Options to Teradata Table Design

1. Traditional Teradata using a Primary Index:

80-90% of your tables will use the traditional Teradata design of simply having a Primary Index. These tables will distribute the rows of the table among the AMPs with a consistent and straightforward Hash Formula. The AMPs will sort their rows by the Primary Index value (kind of), and when a user queries the table using the Primary Index value to limit the rows returning, it will always be a “Single AMP retrieve”.

2. Partition the table which is called a PPI Table (Partition Primary Index):

Teradata allows a table to be created with a horizontal partition. A PPI table still has a Primary Index that distributes the rows among the AMPs just like a traditional table, but each AMP sorts the data on the partition column. All AMPs will be involved in retrieving an answer set, but each AMP only reads one or more of their partitions, thus no longer performing a Full Table Scan.

3. A Columnar Design:

In Teradata V14, the system allows for tables to be created in a Columnar design. The rows are distributed randomly but evenly among the AMPs because the table has no Primary Index. Each AMP still owns their own rows, but the rows break each column up into their own separate column container. In other words, if a query only requests a few columns in the query, then the AMP only moves a few small containers into FSG cache. This is considered vertical partitioning.

Teradata V14.10 Intelligent Memory

images

An AMP now has two types of memory. The first is FSG cache, and it is used for processing queries by bringing blocks from disk into FSG cache where processing occurs, as it always has.

The second is Intelligent memory, and it is used for VeryHot temperature data. This data stays in memory for days until it cools. It provides the fastest query capabilities on the hottest of data.

Teradata now brings a gift to the tables queried most often and that is Intelligent Memory. This feature keeps the most important tables in memory so there is no data transfer from disk (after the first time). This takes Teradata's “Achilles Heel” of disk to memory transfer and eliminates it thus making queries lightning fast. Not all tables are able to fit inside Intelligent Memory, but your design should consider this brilliant technique.

How Teradata Creates Traditional Tables

CREATE TABLE Employee_Table

( EmpNo INTEGER

,Dept_No INTEGER

,First_Name VARCHAR (12)

,Last_Name CHAR(20)

,Salary DECIMAL(10,2)

) UNIQUEPRIMARY INDEX (EmpNo) ;

When a table is created, the Primary Index is defined. 90% of your tables will use this design. Choosing the best column for the Primary Index is your number one strategy.

Notice, the last line of the CREATE Table example above, and you will see that EmpNo is defined as the Primary Index. This means that the rows that are loaded into the Employee_Table will be hashed and distributed to the AMP based solely on the value in the rows EmpNo. The column EmpNo will be responsible for the distribution, and if the column EmpNo is used by the user in the SQL to find a specific employee number (EmpNo), then only one AMP will be contacted to find the row.

Each Table has a Primary Index

images

CREATE TABLE Employee_Table

( EmpNo INTEGER

,Dept_No INTEGER

,First_Name VARCHAR (12)

,Last_Name CHAR(20)

,Salary DECIMAL(10,2)

) UNIQUEPRIMARY INDEX (EmpNo) ;

Each traditional Teradata table chooses a column to be the Primary Index. The Primary Index column is used to distribute the rows among the AMPs and it's how each AMP sorts the rows.

When users query a table and use the Primary Index column in their SQL to find a specific EmpNo, only a “Single AMP” is used. The Primary Index is your best friend.

A Query Using the Primary Index is a Single AMP Retrieve

images

Parsing Engine

Here is the plan AMP 2.

I know you have EmpNo 8 because EmpNo is the Primary Index, and I distributed it there.

Move your Employee_Table header and data block into FSG Cache, and retrieve the row.

Choosing a good Primary Index results in only a “Single AMP” being used in the query.

A Primary Index Query uses a Single AMP and Single Block

images

Parsing Engine

Here is the plan AMP 2.

I know you have EmpNo 12 because EmpNo is the Primary Index.

You should have two blocks. Only Transfer the block holding EmpNo 12 to your FSG Cache, and send me the results.

AMP 2 was contacted and told to only transfer the block that has EmpNo 12. Now, you see the importance of each AMP sorting their rows to limit transferring each block.

How Teradata Creates a PPI Table

CREATE TABLE Order_Table

( Order_Number INTEGER

,Customer_Number INTEGER

,Order_Date DATE

,Order_Total Decimal (10,2)

) PRIMARY INDEX(Order_Number)

PARTITION BY RANGE_N

(Order_Date BETWEEN

date ‘2013-01-01’ AND date ‘2013-12-31’

EACH INTERVAL ‘1'Month) ;

A Partitioned Primary Index (PPI) table has a Primary Index that distributes the rows among the AMPs, but they are not sorted by the Primary Index. Instead an AMP is instructed to sort the rows they own by the Partition.

In the above example, the first part of the CREATE Table statement looks just like the previous example. But, it is the latter part of the statement that you see the words “Partition By”. This table's rows will still be distributed among the AMPs via the Primary Index of Order_Number, but the AMPs won't sort by Order_Number. Each AMP will sort their rows by the partition which is Month of the Order_Date. Look at the next page to see a visual of the AMPs and their sorting of millions of rows.

PPI Table Sorting the Rows by Month of Order_Date

images

Each AMP above sorts their rows by Month (of Order_Date), so if a user queries and only wants to see the orders placed in March then each AMP just transfers the blocks with March orders. This is an all AMP retrieve, but each AMP only has to retrieve from a single partition which is the March Partition.

An All AMPs Retrieve By Way of a Single Partition

images

Parsing Engine

Calling all AMPs. Do NOT do a Full Table Scan!

You should each have 12 blocks (one per month). Move your March Partition block into your FSG Cache.

Give me all March Orders.

All AMPs are used to satisfy the query, but each AMP only reads a portion of their rows.

Creating a PPI Table with CASE_N

images

The above syntax represents the CASE_ N partitioning. Rows will be distributed to their AMPs via the Primary Index column, which is Order_Number, but they will be sorted on each AMP by Order_Total. If an Order_Total is < 1000, it will go into Partition 1. Any Order_Total between 1000 and 4999.99 will go in Partition 2, etc. The NO Case partition is for anything falling through (> 20,000), and the UNKNOWN partition is for NULL values.

A Visual of Case_N Partitioning

PARTITION BY CASE_N

(Order_Total < 1000,

Order_Total < 5000,

Order_Total < 10000,

Order_Total < 20000,

NO Case, UNKNOWN);

images

Three are six partitions for this table. Orders go into partitions based on Order_Total.

An All AMPs Retrieve By Way of a Single Partition

images

Parsing Engine

Calling all AMPs. Do NOT do a Full Table Scan!

You should each have 6 blocks (one per Case). Move your < 1000 Partition block into your FSG Cache.

Give me all Orders < $1000.

All AMPs are used to satisfy the query, but each AMP only reads a portion of their rows.

What does a Columnar Table look like?

images

The two tables above contain the same Employee data, but the bottom example is a Columnar table. Employee_Normal has 3 rows on each AMP with 5 columns. Employee_Columnar is split into 5 containers, and each container has one column.

A Comparison of Data for Normal Vs. Columnar

images

The normal table on top is one block containing three rows and five columns. The columnar table below has five blocks, each containing one column of three rows. Columnar tables are better when users query just a few columns and not all columns. The name of this game is speed because less volume is moved from disk to FSG Cache.

A Columnar Table is Best for Queries with Few Columns

images

Parsing Engine

Calling all AMPs.

You should each have 5 container blocks in your table named Employee_Columnar. Move only your Last_Name and Salary container blocks into your FSG Cache.

Give me all Last Names and Salaries.

All rows come back, but only two columns. We moved less than half the block volume.

Quiz – How Many Containers are in FSG Cache?

images

SELECT Salary FROM Employee_Columnar

WHERE EmpNo = 101 ;

images

There are 1,000 AMPs in the system!

There are no indexes on the table as it is a NoPI table.

How many containers will be placed into FSG Cache to satisfy the query?

Your mission is to think! Remember that the Parsing Engine always does the least work.

Answer – How Many Containers are in FSG Cache?

images

Parsing Engine

There are 1,000 AMPs in the system!

There are no indexes on the table as it is a NoPI table.

How many containers will be placed into FSG Cache to satisfy the query?

1001

All 1,000 AMPs place only their EmpNo container in FSG Cache, and then the AMP that finds Employee 101 will move their Salary container in. Did you think 2000? Why work that hard when you can do half of that?

Factors When Choosing Table Design

How will users use the Primary Index?
How many rows are in the table?
How many columns are in the table?
Will the users do more Full Table Scans or use a WHERE clause to bring back a few rows?
Will the users require most of the columns on the report or just a subset?
Will the users perform range queries on a particular column like date?
How can I best avoid a Full Table Scan?

Teradata Limits for V12, V13 and V14

images

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 2 - Three Options for Teradata Table Design

Create new playlist

Sign In

Sign Up

Chapter 2 - Three Options for Teradata Table Design

“Design is not just what it looks like and feels like. Design is how it works.”

Table of Contents for
Chapter 2 - Three Options for Teradata Table Design