Chapter 17 – Table Create and Data Types

“A man who views the world at 50 the same as he did at 20 has wasted 30 years of his life.”

- Muhammad Ali

Kognitio Has Three Table Distribution Options

1)Round Robin Distribution (Default)

2)Hash Distributed

3)Replicated

Most Kognitio tables will use a Round robin distribution. This is the default when no distribution key is defined. The data is spread evenly across the nodes to maximize parallel processing.

Hash distribution. A column(s) are chosen as the distribution key. This column(s) is run through the Kognitio hashing algorithm to divide data among all of the nodes. Like values will hash to the same node. This is often done when two tables are joined together. When the join key is also the distribution key for both tables, the join works fast and efficient.

Replicated:  A table can be replicated in its entirety on all nodes. This is often done when a smaller table, such as a dimension table(s) will be joined to a larger table.

Kognitio makes creating tables and distribution easy.   Check out the fundamentals above.

A Table that is distributed via a Round Robin Technique

image

Most Kognitio tables will use a Round robin distribution. This is defined by the keyword RANDOM. The data is spread evenly across the nodes to maximize parallel processing.

Round Robin Technique is the Default

image

Most Kognitio tables will use a Round robin distribution. This is the default when no distribution key is defined. The data is spread evenly across the nodes to maximize parallel processing.

A Table that is distributed by Hash

image

A table that is HASHED has a distribution key that is distributed across the nodes using a Hash algorithm. Like values will go to the same node. This is important if two tables are joined together because the matching rows will be in the same memory pool, thus making the join faster.

Tables that join are excellent candidates for Hashed Tables

image

A table that is HASHED has a distribution key that is distributed across the nodes using a Hash algorithm. Like values will go to the same node. This is important if two tables are joined together because the matching rows will be in the same memory pool, thus making the join faster.

A Table that is distributed by Hash by Multiple Columns

image

A table that is HASHED has a distribution key that is distributed across the nodes using a Hash algorithm. Like values will go to the same node. A table can have multiple columns as the HASHED key. Above, both Subscriber_No and Member_No are concatenated and then hashed to form a single distribution key.

The Reasons for a Multi-Column HASHED Distribution Key

image

A table that is HASHED has a distribution key that is distributed across the nodes using a Hash algorithm. Like values will go to the same node. This is important if two tables are joined together because the matching rows will be in the same memory pool, thus making the join faster. The other reasons to do this is for better distribution or because your application uses multiple columns within each query.

Creating a Table that is replicated across all Nodes

image

A replicated table is copied to every node. If a State_Table has 50 rows, then all 50 rows would be copied to each and every node. This is done to make joining smaller tables that are replicated to join efficiently to larger tables that are either Hashed or Random distributed. A replicated table always has its matching rows on the same node because they are on every node.

The Concept is all about the Joins

image

If tables are replicated, then they are always on the same node as the rows they join. That is why a large Fact table will often be distributed by random or by hash and the smaller tables it joins to will be replicated. The setup of tables that join are all about ensuring that the matching rows are in the same memory pool. This is how the pros design their tables.

Creating a Table with Primary Key

image

A Primary Key can be established for a table and this will make the column or combination of columns unique. In the example above, we have created a Primary Key on the column Dept_No. No two Dept_No values can be the same. A secondary index subtable is established that will point to the base row for each Dept_No. This also provides fast Update and Delete capability. It also provides fast retrieval for a particular row.

Creating a Table with a UNIQUE constraint

image

A UNIQUE constraint can be established for a table and this will make the column or combination of columns unique. In the example above, we have created a UNIQUE constraint on the columns Product_ID and Sale_Date. No duplicate Product_IDs on the same Sale_Date can reside in the table. The system will kick out duplicates. A secondary index subtable is established that will point to the base row for each Product_ID and Sale_Date combination. This also provides fast Update and Delete capability. It also provides fast retrieval for a particular row.

How to create tables with Referential Integrity

image

The Department_Table has a column Dept_No that is the PRIMARY KEY for the table. We can use referential integrity to ensure that no row can be inserted into the Employee_Table, unless there is already a record giving the Dept_No details in the Department_Table. .If you are using referential integrity to maintain integrity on inserts, updates and deletes, all columns from all tables involved must be in RAM for this to be successful.

Not Null Constraints

image

The Not null constraint will not allow a Null value for that column. A not-null constraint is always written as a column constraint as in the example above. A bit is set in front of each row for all columns that have a NULL value possibility. If the bit is a 0 then there is a value for the column in the row, but if the bit is set to a 1 then the column value is NULL.

Creating a Table with Default Values

image

This definition of the Job_Table has a default value (‘No Description’) in Job_Desc column.

Creating a Table with a CHECK Constraint

image

This definition of the Course_Table has a check constraint on the Credits column, which checks that any value you enter is over 0.

Creating a Global Temporary Table

image

The default value is ON COMMIT DELETE ROWS for a Global Temporary Table. After the session ends the table and data will be deleted.

Important Temporary Table Information

image

The default value for ON COMMIT is different if you use the word TEMPORARY vs TEMP. CREATE TEMPORARY TABLE deletes the contents when the current transaction completes, whereas CREATE TEMP TABLE preserves them.

Creating a Table Image to place a Table in Memory

image

When you create the table by default a RAM image is created at the same time. This image needs to be dropped before you create another one. The CREATE TABLE IMAGE statement is how you place a table into memory, which is what Kognitio has been built to do. You can set up a RAM image of an entire table or choose only some of the columns and/or rows from a table. Any changes to the table through inserts, updates or deletes are reflected in RAM as well as on disk. Because the image is in RAM, queries run significantly faster on a table image.

CREATE OR REPLACE TABLE IMAGE

image

You can use the CREATE OR REPLACE TABLE IMAGE statement to set up or replace an existing RAM image of a table. You can also set it up to only choose selected columns or even rows.

DEFRAG TABLE IMAGE

image

The DEFRAG command allows historic rows held in RAM for a table to be discarded, freeing up memory. The way this used to be done was to drop and recreate the image, but the DEFRAG command will do it if more than 10% of the rows will actually be discarded. The keyword FORCE performs the DEFRAG no matter what percentage will be discarded.

Not Null Constraints

CREATE TABLE Employee_table

(

Employee_No integer NOT NULL,

Dept_No smallint,

Last_name char(20),

First_name varchar(12),

Salarydecimal(8,2) )

DISTRIBUTED BY(Employee_No);

The Not null constraint will not allow a Null value for that column. A not-null constraint is always written as a column constraint as in the example above.

Unique Constraints

CREATE TABLE Employee_table

(

Employee_No integer NOT NULL UNIQUE,

Dept_No smallint,

Last_name char(20),

First_name varchar(12),

Salarydecimal(8,2)

)

DISTRIBUTED BY(Employee_No);

A Unique constraint will ensure that the data contained in a column is unique within the entire table. Any duplicate values will be rejected. Remember, the table must be hash-distributed (not DISTRIBUTED RANDOMLY), and the constraint columns must be the same as (or a superset of) the table’s distribution key columns, just like the example above.

Primary Key Constraints

CREATE TABLE Employee_table

(

Employee_No integer PRIMARY KEY,

Dept_No smallint,

Last_name char(20),

First_name varchar(12),

Salarydecimal(8,2)

)

DISTRIBUTED BY(Employee_No);

Foreign keys are not supported. You
can declare them, but referential
integrity is never enforced.

When you create a primary key constraint, you are really creating a combination of a UNIQUE constraint and a NOT NULL constraint simultaneously. The table must be hash-distributed (not DISTRIBUTED RANDOMLY), and the primary key columns must be the same as (or a superset of) the table’s distribution key columns. If a table has a primary key, this column (or group of columns) is chosen as the distribution key for the table by default.

Check Constraints

CREATE TABLE Employee_table

(

Employee_No integer ,

Dept_No smallint CHECK (Dept_No < 500),

Last_name char(20),

First_name varchar(12),

Salarydecimal(8,2)

)

DISTRIBUTED BY(Employee_No);

A Check constraint will allow you to
specify that the value falls into a
certain range.

A Check constraint will allow you to specify that the value falls into a certain range. The column must satisfy a Boolean (truth-value) expression.

Create Table AS WITH DATA or WITH NO DATA

image

You can create a new table and specify WITH DATA or WITH NO DATA.

Another Version of Create Table AS

image

Above are some great examples to quickly CREATE a Table from another table.

CREATE Table FOR and FROM

image

The FOR keyword will create a new table, but the table will have no data. The FROM keyword will create a new table, but it will have the data also populated.

Create Table LIKE

image

The example above creates at table using the LIKE statement. It then loads the data with an INSERT/SELECT statement. You are now ready to query the new table.

String Data Types

CHAR(n) – Fixed length character, defined as CHAR(n) or CHARACTER(n) where n is an integer value defining the number of characters in the string. A CHAR with no length argument is by default a CHAR(1).

VARCHAR(n) – Variable length character, defined as VARCHAR(n), CHARACTER VARYING(n), CHAR VARYING(n) or VARCHAR2(n) where n is an integer value defining the maximum number of characters in the string. A VARCHAR with no length argument is by default a VARCHAR(255).

NCHAR– NCHAR and NVARCHAR are part of the SQL standard designed to implement a national character set that represents multi-byte characters.

BINARY and VARBINARY – The BINARY and VARBINARY types behave just like CHAR and VARCHAR except they should not have any type of conversion applied to its contents.

The maximum number of bytes in CHAR, BINARY, VARCHAR and VARBINARY columns is 32000. This does not apply to all character sets.

Numeric Data Types

INT1 or TINYINT – The size is 1 byte and the range of values goes from a negative -128 to a positive 127.

INT2 or SMALLINT– The size is 2 bytes and the range of values goes from a negative -32768 to a positive 32767.

INT4 or INT or INTEGER– The size is 4 bytes and the range of values goes from a negative -2147483648 to a positive 2147483647.

INT8– The size is 8 bytes and the range of values goes from a negative -9223372036854775808 to a positive 9223372036854775807.

REAL – The size is 4 bytes and it represents REAL Numbers.

FLOAT/DOUBLE PRECISION – The size is 8 bytes and they are stored in double precision IEEE floating-point format.

DECIMAL/NUMERIC (precision, scale) – The terms DECIMAL and NUMBER are interchangeable. The precision ranges from 1 to 18. If the precision is between 1 and 9 the storage is 4 bytes, but if the precision ranges from 10 to 18 bytes, the storage takes 8 bytes.

A Decimal (6,2) would represent a number such as 1234.56. The precision to the right of the decimal is 2.

Date, Time and Timestamp Data Types

DATE – The size is 4 bytes and the year ranges from 1-9999. The month ranges from 1-12. the Day ranges from 1-n.

TIME – The size is 4 bytes and the hour ranges from 0-23. The minute ranges from 059. The second ranges from 0-59.99.

TIMESTAMP – The size is 8 bytes and the year ranges from 1-9999. The month ranges from 1-12. the Day ranges from 1-n. The hour ranges from 0-23. The minute ranges from 059. The second ranges from 0-59.99.

INTERVAL – The syntax for an INTERVAL data type definition is INTERVAL start [TO end]. Interval data types examples include:

INTERVAL YEAR

INTERVAL YEAR TO MONTH

INTERVAL MONTH

INTERVAL DAY(3)

INTERVAL HOUR(4) TO MINUTE

INTERVAL DAY(5) TO SECOND

INTERVAL SECOND(5)

Interval values can be positive or negative.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset