0%

Book Description

The design of the data warehouse is the most important aspect, so this book is a fantastic guide on exactly how to do just that. This book brilliantly touches on the important things a designer needs to do to build a world-class data warehouse. They will clearly understand the environment and all options to enhance the performance based on the design. This will teach you great detail in a high-level manor that really focuses clearly so you won't get bogged down with too much minutia. It will allow you to understand the important issues so you can make clear decisions.

Table of Contents

  1. Cover Page
  2. Title Page
  3. Copyright Page
  4. About Tom Coffing
  5. About Leah Nolander
  6. Contents
  7. Chapter 1 - Physical Database Design in Layman's Terms
    1. What is Parallel Processing?
    2. The Basics of a Single Computer
    3. Teradata Parallel Processes Data
    4. Parallel Architecture
    5. The Teradata Architecture
    6. All Teradata Tables are Spread across All AMPS
    7. Understand that Teradata can Scale to Incredible Size
    8. Teradata Parallel Processing
    9. How Teradata Creates Tables
    10. Every AMP has the Exact Same Tables
    11. All Teradata Tables are Spread across All AMPs
    12. The Table Header and the Data Rows are Stored Separately
    13. An AMP Stores the Rows of a Table inside a Data Block
    14. To Read a Data Block an AMP Moves the Block into Memory
    15. A Full Table Scan Means all AMPs must Read all Rows
    16. The “Achilles Heel” or Slowest Process is Block Transfer
    17. Each Table has a Primary Index
    18. A Query Using the Primary Index is a Single AMP Retrieve.
    19. Using EXPLAIN
    20. As Rows are Added a Data Block will Eventually Split
    21. A Full Table Scan Means all AMPs must Read all Blocks
    22. An EXPLAIN that shows a Full Table Scan
    23. A Primary Index Query Uses a Single AMP and Single Block
    24. Using EXPLAIN in a NUPI Query
    25. Each AMP can have Many Blocks for a Single Table
    26. A Full Table Scan Means all AMPs must Read all Blocks
    27. Quiz – How Many Blocks Move into FSG Cache?
    28. Answer – How Many Blocks Move into FSG Cache?
    29. Quiz – How Many Blocks Move Using the Primary Index?
    30. Answer – How Many Blocks Move Using the Primary Index?
    31. Intelligent Memory (Teradata V14.10)
    32. Teradata V14.10 Intelligent Memory Gives Data a Temperature
    33. Data Deemed VeryHot Stays in each AMP's Intelligent Memory
    34. Intelligent Memory Stays in Memory
    35. What is the Goal of a Teradata Physical Database Design?
  8. Chapter 2 - Three Options for Teradata Table Design
    1. There are Three Options to Teradata Table Design
    2. Teradata V14.10 Intelligent Memory
    3. How Teradata Creates Traditional Tables
    4. Each Table has a Primary Index
    5. A Query Using the Primary Index is a Single AMP Retrieve
    6. A Primary Index Query Uses a Single AMP and Single Block
    7. How Teradata Creates a PPI Table
    8. PPI Table Sorting the Rows by Month of Order_Date
    9. An all AMPs Retrieve by Way of a Single Partition
    10. Creating a PPI Table with CASE_N
    11. A Visual of Case_N Partitioning
    12. An all AMPs Retrieve by way of a Single Partition
    13. What does a Columnar Table Look Like?
    14. A Comparison of Data for Normal Vs. Columnar
    15. A Columnar Table is Best for Queries with Few Columns
    16. Quiz – How Many Containers are in FSG Cache?
    17. Answer – How Many Containers are in FSG Cache?
    18. Factors when Choosing Table Design
    19. Teradata Limits for V12, V13 and V14
  9. Chapter 3 - Teradata - The Cold Hard Facts
    1. Teradata Parallel Processing
    2. Each Table has a Primary Index that is Unique or Non-Unique
    3. The Hash Map Determines which AMP will Own the Row
    4. A Unique Primary Index Spreads the Data Evenly
    5. The AMP Adds a Uniqueness Value to Create the Row-ID
    6. Each AMP Sorts their Tables by the Row-ID
    7. A Non-Unique Primary Index Skews the Data
    8. Comparing the Same Table with Different Primary Indexes
    9. Unique Primary Index Queries are a Single AMP Retrieve
    10. A Non-Unique Primary Index is also a Single AMP Retrieve
    11. Teradata has a no Primary Index Table Called a NoPI Table
    12. There are Normal Tables and then there are Partitioned Tables
    13. A Visual of One Year of Data with Range_N per Month
    14. Partitioning is Designed to Eliminate the Full Table Scan
    15. A Partition # and Row-ID = Row Key
    16. An AMP Stores its Rows Sorted in Only Two Different Ways
    17. AMPs Moves their Data Blocks into Memory to Read/Write
    18. Most Taxing Thing for an AMP is Moving Blocks into Memory
    19. Rows are Stored in Data Blocks which are Stored in Cylinders
    20. Rows for an AMP Stored inside a Data Block in a Cylinder
    21. An AMP's Master Index is Used to Find the Right Cylinder
    22. The Row Reference Array (RRA) Does the Binary Search?
    23. A Block Splits into Two Blocks at Maximum Block Size
    24. Data Blocks Maximum Block Size has Changed (V14.10)
    25. The New Block Split with Teradata V14.10
    26. The Block Split with Even More Detail in Teradata V14.10
    27. Teradata V14.10 Block Split Defaults
    28. There is One Master Index and Thousands of Cylinder Indexes
    29. Each Table has a 48-bit TableID
  10. Chapter 4 - How Joins Work Under the Covers
    1. Teradata Join Quiz
    2. Teradata Join Quiz Answer
    3. The Joining of Two Tables
    4. Teradata Moves Joining Rows to the Same AMP
    5. Imagine Joining Two NoPI Tables that have no Primary Index
    6. Both Tables are Redistributed to Join Rows on the Same AMP
    7. How do you Join if One Table is Big and One Table is Small?
    8. Duplicate the Small Table on Every AMP (Like a Mirror)
    9. What could you do if Two Tables Joined 1000 Times a Day?
    10. Joining Two Tables with the Same PK/FK Primary Index
    11. A Join with No Redistribution or Duplication
    12. A Performance Tuning Technique for Large Joins
    13. The Joining of Two Tables with an Additional WHERE Clause
    14. An Example of the Fastest Join Possible
    15. Using a Simple Volatile Table
    16. A Volatile Table with a Primary Index
    17. Using a Simple Global Temporary Table
    18. Two Brilliant Techniques for Global Temporary Tables
    19. The Joining of Two Tables Using a Global Temporary Table
    20. Quiz – How Much Data Moves across the BYNET?
    21. Answer – How Much Data Moves across the BYNET?
    22. Teradata V14.10 Join Feature PRPD
  11. Chapter 5 - Secondary Indexes – An Alternate Path to the Data
    1. Creating a Unique Secondary Index (USI)
    2. What is in a Unique Secondary Index (USI) Subtable?
    3. A Unique Secondary Index (USI) Subtable is Hashed
    4. How the Parsing Engine Uses the USI Subtable
    5. A USI is a Two-AMP Operation
    6. Creating a Non-Unique Secondary Index (NUSI)
    7. What is in a Unique Secondary Index (USI) Subtable?
    8. Non-Unique Secondary Index (NUSI) Subtable is AMP Local
    9. How the Parsing Engine Uses the NUSI Subtable
    10. Creating a Value-Ordered NUSI
    11. The Hash Map Determines which AMP will own the Row
    12. A Unique Primary Index Spreads the Data Evenly
    13. Quiz – Answer the Tough USI Questions
    14. Answer to Quiz – Answer the Tough USI Questions
    15. A Picture with a Base Table, USI, and NUSI Subtable
    16. Quiz – Tough Questions on the USI and NUSI Subtables
    17. Answer – Tough Questions on the USI and NUSI Subtables
    18. A Query Using an USI Only Moves Two Blocks
    19. A Query Using A NUSI Always Uses all AMPs
    20. Two Non-Unique Secondary Indexes (NUSI) on a Table
    21. A NUSI BITMAP Query (1 of 3)
    22. A NUSI BITMAP Theory (2 of 3)
    23. A NUSI Bitmap in Action (3 of 3)
    24. A Brilliant Technique for a Unique Secondary Index
    25. The USI for Partitioned Tables Points to the Row Key
    26. A Brilliant Technique for a Non-Unique Secondary Index
    27. The NUSI for Partitioned Tables Points to the Row Key
    28. The Bigger Quiz
    29. The Bigger Quiz Answers
    30. Multiple Choice DBA
    31. Multiple Choice DBA
    32. What are the Big Four Tactical Queries?
    33. What are the Big Four Tactical Queries?
  12. Chapter 6 – Physical Design
    1. The Four Stages of Modeling for Teradata
    2. The Logical Model
    3. The Logical Model can be Loaded inside Nexus
    4. First, Second and Third Normal Form
    5. Quiz – Choose that Normalization Technique
    6. Answer to Quiz – Choose that Normalization Technique
    7. Quiz – What Normalization is it Now?
    8. Answer to Quiz – What Normalization is it Now?
    9. The Employee_Table and Department_Table can be Joined
    10. The Employee_Table and Department_Table Join SQL
    11. The Extended Logical Model Template
    12. User Access is of Great Importance
    13. User Access in Layman's Terms
    14. User Access for Joins in Layman's Terms
    15. The Nexus shows Users the Table's Primary Index
    16. Data Demographics Tell us if the Column is Worthy
    17. Data Demographics – Distinct Rows
    18. Data Demographics – Distinct Rows Query
    19. Data Demographics – Max Rows Null
    20. Data Demographics – Max Rows Null Query
    21. Data Demographics – Max Rows Per Value
    22. Data Demographics – Max Rows Per Value
    23. Data Demographics – Typical Rows Per Value
    24. Typical Rows Per Value – Query 1 (Median)
    25. Typical Rows Per Value – Query 2 (Median)
    26. Row_Number with Qualify to Get the Typical Rows Per Value
    27. SQL to Get the Average Rows Per Value for a Column (Mean)
    28. Getting the Average Values Per Column
    29. Data Demographics – Change Rating
    30. Factors when Choosing Teradata Indexes
    31. Finding Table Sizes
    32. Finding Skew in the Tables in a Database
    33. Finding Skew in a Table
    34. Display the Distribution of a Column Per AMP
    35. Primary Index Data Demographics Candidate Guidelines
    36. Primary Index Access Considerations
    37. Answer -Three Important Primary Index Considerations
    38. The First Step is to Pick all Potential Primary Index Columns
    39. The First Step is to Pick all Potential Primary Index Columns
    40. The 2nd Step is to Pick all Potential Secondary Indexes
    41. Answer to 2nd Step to Picking Potential Secondary Indexes
    42. Now it is Time to Choose the Primary and Secondary Indexes
    43. 3rd Step is to Picking your Indexes
    44. Our Index Picks
  13. Chapter 7 - Denormalization
    1. Denormalization
    2. Derived Data
    3. Repeating Groups
    4. Pre-Joining Tables
    5. Storing Summary Data with a Trigger
    6. Summary Tables or Data Marts the Old Way
    7. Aggregate Join Index the New Way
    8. New Aggregate Join Index (Teradata V14.10)
    9. Horizontal Partitioning the Old Way
    10. Horizontal Partitioning the New Way
    11. Vertical Partitioning the Old Way
    12. A Vertical Partitioning Trick that is Old School
    13. Vertical Partitioning the New Way
    14. Temporary Tables - A Volatile Table with a Primary Index
    15. The Joining of Two Tables Using a Volatile Table
    16. Temporary Tables - Global Temporary Tables
    17. The Joining of Two Tables Using a Global Temporary Table
    18. Creating a Multi-Table Join Index
    19. Visual of a Join Index
    20. Creating a Single-Table Join Index
    21. Conceptual of a Single Table Join Index on an AMP
    22. Single Table Join Index Great for LIKE Clause
    23. Single Table Join Index with Value Ordered NUSI
    24. Sparse Join Index
    25. A Global Multi-Table Join Index
    26. Creating a Hash Index
  14. Chapter 8 – Things to Consider
    1. Teradata Mode Vs. ANSI Mode
    2. ANSI Mode Transactions
    3. Teradata Mode Transactions
    4. Multi-Statement Transaction in BTEQ
    5. Teradata Mode Default is a Set Table
    6. An Example of a Duplicate Row Error
    7. Creating a Set Table with a Unique Primary Index
    8. Creating a Set Table with a Unique Secondary Index
    9. ANSI mode Defaults to a Multiset Table
    10. Using a Volatile Table to Get Rid of Duplicate Rows
    11. Creating a Volatile Table
    12. You Populate a Volatile Table with an INSERT/SELECT
    13. The Three Steps to use a Volatile Table
    14. Why would you use the ON COMMIT DELETE ROWS?
    15. The HELP Volatile Table Command Shows your Volatiles
    16. A Volatile Table with a Primary Index
    17. The Joining of Two Tables Using a Volatile Table
    18. Creating a Secondary Index on a Volatile Table
    19. You can Collect Statistics on Volatile Tables
    20. The New Teradata V14 Way to Collect Statistics
    21. Four Examples of Creating a Volatile Table Quickly
    22. Four Advanced Examples of Creating a Volatile Table Quickly
    23. Creating Partitioned Primary Index (PPI) Volatile Tables
    24. CREATING a Global Temporary Table
    25. Using a Simple Global Temporary Table
    26. Two Brilliant Techniques for Global Temporary Tables
    27. The Joining of Two Tables Using a Global Temporary Table
    28. Making an Exact Copy a Table
    29. Making a NOT-So-Exact Copy a Table
    30. Copying a Table
    31. Troubleshooting Copying and Changing the Primary Index
    32. Copying Only Specific Columns of a Table
    33. Copying a Table and Keeping the Statistics
    34. Copying a Table with Statistics
    35. Copying a Table Structure with NO Data but Statistics
    36. What to COLLECT STATISTICS On?
    37. A Huge Hint that No Statistics have Been Collected
    38. The Basic Syntax for COLLECT STATISTICS
    39. COLLECT STATISTICS Examples for a Better Understanding
    40. The New Teradata V14 Way to Collect Statistics
    41. COLLECT STATISTICS Directly from Another Table
    42. How to Recollect STATISTICS on a Table
    43. Examples of COLLECT STATISTICS for V14
  15. Chapter 9 - Deep Dive Overhead for each Row
    1. Why Go Deep inside the Overhead of a Row?
    2. A Row Layout in Teradata
    3. Row Length
    4. Row ID
    5. How the Row Hash is Created for Each Row
    6. Unique Primary Indexes have Even Distribution
    7. The AMP Adds a Uniqueness Value to its Rows
    8. The Row-Hash is 32-bits and so is the Uniqueness Value
    9. Non-Unique Primary Indexes have Skewed Data
    10. Flag Byte
    11. Presence Byte
    12. Presence Byte is Used to show Null Values in each Row
    13. A Close-up Look at the Presence Byte for Nulls
    14. An Extreme example to Look at the Presence Byte for Nulls
    15. Quiz – How Many Presence Bits Used for these Columns?
    16. Answer – How Many Presence Bits Used for these Columns?
    17. Quiz – How Many Presence Bits Used with NOT NULL?
    18. Answer – How Many Presence Bits Used with NOT NULL?
    19. Quiz – Which bit will be Set to a One?
    20. Answer – Which bit will be Set to a One?
    21. Quiz – How Many Presence Bits Needed Now?
    22. Answer – How Many Presence Bits Needed Now?
    23. What Happens when we Need More than One Presence Byte?
    24. An Example that must use a 2nd Presence Byte
    25. An Example that must use a 2nd Presence Byte
    26. Quiz – Answer the Presence Bit/Byte Questions?
    27. Answers to the – Answer the Presence Bit/Byte Questions?
    28. Quiz - How Many Nullable Columns are Possible?
    29. Answer- How Many Nullable Columns are Possible?
  16. Chapter 10 - Compression
    1. Important Information about Compression
    2. Presence Bytes are also Used for Compression
    3. Why One Byte (8 bits) can Represent up to 255 Values
    4. Answers to One Byte (8 bits) can Represent up to 255 Values
    5. Now that you Understand that 8 Bits can Represent 0 – 255
    6. A Compression Example that Compresses Two Values
    7. A Compression Example that Compresses Three Values
    8. Quiz – Name that Compression Value
    9. The Next Important Concept in Compression
    10. Quiz – Can you Fill in the Compression Values?
    11. Answer – Can you Fill in the Compression Values?
    12. The Last Major Concept in Compression
    13. Quiz – Using One Presence Byte for Multiple Columns
    14. Answer – Using One Presence Byte for Multiple Columns
    15. Quiz – How Many Presence Bytes are Needed?
    16. Answer – How Many Presence Bytes are Needed?
    17. Advanced Quiz – Fill in the Presence Bits?
    18. Answer to Advanced Quiz – Fill in the Presence Bits?
    19. The Cost Vs. the Savings
    20. The Cost List of Compression
    21. A Deeper Dive into NULL Values
    22. Quiz - How Much Space Did we Just Save?
    23. Answer - How Much Space Did we Just Save?
    24. Advanced Quiz - How Much Space Did we Just Save?
    25. Advanced Quiz - How Much Space Did we Just Save?
    26. Using the DBC Tables in a Compression Experiment
    27. A Compression Test
    28. A Compression Test
    29. We then Moved all Eight Tables to Another Database
    30. Compression Reports with Nexus and SmartCompress
    31. We then Created Two Global Temporary Tables
    32. We then Created and Executed our Macro
    33. Report Comparing Compressed and NonCompressed Tables
  17. Chapter 11 – Top SQL Commands Cheat Sheet
    1. SELECT all Columns from a Table and Sort
    2. Select Specific Columns and Limiting the Rows
    3. Changing your Default Database
    4. Keywords that Describe you
    5. Select TOP Rows in a Rank Order
    6. A Sample Number of Rows
    7. Getting a Sample Percentage of Rows
    8. Find Information about a Database
    9. Find Information about a Table
    10. Using Aggregates
    11. Performing a Join
    12. Performing a Join Using ANSI Syntax
    13. Using Date, Time and Timestamp
    14. Using Date Functions
    15. Using the System Calendar
    16. Using the System Calendar in a Query
    17. Formatting Data
    18. Using Rank
    19. Using a Derived Table
    20. Using a Subquery
    21. Correlated Subquery
    22. Using Substring
    23. Basic CASE Statement
    24. Advanced CASE Statement
    25. Using an Access Lock in your SQL
    26. Collect Statistics
    27. CREATING a Volatile Table with a Primary Index
    28. CREATING a Volatile Table that is Partitioned (PPI)
    29. CREATING a Volatile Table that is Deleted after the Query
    30. Finding the Typical Rows per Value for Specific Column
    31. Finding out how Much Space you have
    32. How Much Space you have Per AMP
    33. Finding your Space
    34. Finding Space Skew in Tables in a Database
    35. Finding the Number of Rows per AMP for a Column
    36. Finding Account Information
    37. Ordered Analytics
  18. Back Cover