Chapter 3 – The WHERE Clause

“I saw the angel in the marble and carved until I set him free.”

- Michelangelo

The WHERE Clause Limits Returning Rows

image

The WHERE Clause here filters how many ROWS are coming back. In this example, I am asking for the report to only rows WHERE the first name is Henry.

Case Sensitivity is Important

image

No rows were returned above because we used a lowercase 'henry' and the table contains 'Henry' with a capital letter.

Character Data needs Single Quotes, but Numbers Don’t

image

Character data (letters) need single quotes, but you need NO Single Quotes for Integers (numbers).

You Cannot Use the Alias in the Where Clause

image

When you ALIAS a column, you give it a new name for the report header, but a good rule of thumb is to refer to the column by the alias throughout the query.

NULL means NO DATA so Equals Null Returns No Rows

image

First thing you need to know about a NULL is it is unknown data. It is NOT a zero. It is missing data. Since we don’t know what is in a NULL, you can’t use an = sign. You must use IS NULL or IS NOT NULL.

Use IS NULL or IS NOT NULL for Null Values

image

First thing you need to know about a NULL is it is unknown data. It is NOT a zero. It is missing data. So, the only keywords that are effective when evaluating a Null value are IS NULL or IS NOT NULL. The Hadoop system does not put a question mark in the data. It is the tool that assigns the question mark.

NULL is UNKNOWN DATA so NOT Equal Won’t Work

image

We can’t compare NOT NULL with any equal sign. We deal with NULL values with IS NULL or IS NOT NULL keywords.

Use IS NULL or IS NOT NULL when dealing with NULLs

SELECT     *

FROM    Student_Table

WHEREClass_Code IS NOT NULL ;

image

Much like before, when you want to bring back the rows that do not have NULLs in them, you put an ‘IS NOT NULL’ in the WHERE Clause.

Using Greater Than

image

The WHERE Clause doesn’t just deal with ‘Equals’. You can look for things that are GREATER or LESSER THAN equal.

Using Greater Than or Equal To (>=)

image

The WHERE Clause doesn’t just deal with ‘Equals’. You can look for things that are GREATER or LESSER THAN along with asking for things that are GREATER/LESSER THAN or EQUAL to.

AND in the WHERE Clause

image

Notice the WHERE statement and the word AND. In this example, qualifying rows must have a Class_Code = ‘FR’ and also must have a First_Name of ‘Henry’. Notice how the WHERE and the AND clause are on their own line. Good practice!

Troubleshooting AND

image

What is going wrong here? You are using an AND to check the same column. What you are basically asking with this syntax is to see the rows that have BOTH a Grade_Pt of 3.0 and a 4.0. That is impossible, so no rows will be returned.

OR in the WHERE Clause

image

Notice above in the WHERE Clause we use OR. OR allows for either of the parameters to be TRUE in order for the data to qualify and return.

Troubleshooting Or

image

Notice above in the WHERE Clause we use OR. OR allows for either of the parameters to be TRUE in order for the data to qualify and return. The first example errors and is a common mistake. The second example is perfect.

Troubleshooting Character Data

image

This query errors! What is WRONG with this syntax? No Single quotes around SR.

Using Different Columns in an AND Statement

image

Notice that AND separates two different columns, and the data will come back if both are TRUE.

Quiz – How many rows will return?

image

SELECT *   FROM Student_Table

WHERE  Grade_Pt = 4.0 OR Grade_Pt = 3.0

AND             Class_Code = 'SR' ;

Which Seniors have a 3.0 or a 4.0 Grade_Pt average. How many rows will return?

A) 2

C) Error

B) 1

D) 3

We are looking for any senior who has a grade_pt of either 4.0 or 3.0. How many rows will return?

Answer to Quiz – How many rows will return?

image

SELECT *   FROM Student_Table

WHERE  Grade_Pt = 4.0 OR Grade_Pt = 3.0

AND              Class_Code = 'SR' ;

image

We are looking for any senior who has a grade_pt of either 4.0 or 3.0. How come we had two rows return?

What is the Order of Precedence?

image  

( )

image  

NOT

image  

AND

image  

OR

SELECT     *

FROMStudent_Table

WHERE  Grade_Pt = 4.0 OR Grade_Pt = 3.0

AND         Class_Code = 'SR' ;

Syntax has an ORDER OF PRECEDENCE. It will read anything with parentheses around it first. Then, it will read all the NOT statements. Next, the AND statements. FINALLY, the OR Statements. This is why the last query came out odd. Let’s fix it and bring back the right answer set.

Using Parentheses to change the Order of Precedence

image

This is the proper way of looking for rows that have both a Grade_Pt of 3.0 or 4.0 AND also having a Class_Code of ‘SR’. Only ONE row comes back. Parentheses are evaluated first, so this allows you to direct exactly what you want to work first.

An IN List is Another Technique

image

This is another technique when looking for rows that have both a Grade_Pt of 3.0 or 4.0 AND also having a Class_Code of ‘SR’. Only ONE row comes back. Parentheses are evaluated first and the IN list has parenthesis.

Using an IN List in place of OR

image

Using an IN List is a great way of looking for multiple values that you would normally have to separate with an OR.

The IN List Can Use Character Data

image

The IN Statement avoids retyping the same column name separated by an OR. The IN allows you to search the same column for a list of values. This works with character data as long as you use single quotes.

Using a NOT IN List

image

You can also ask to see the results that ARE NOT IN your parameter list.

BETWEEN is Inclusive

image

This is a BETWEEN. This allows you to see if a column falls in a range. It is inclusive, meaning that in our example, we will be getting the rows that also have a 2.0 and 4.0 in their column!

NOT BETWEEN is Also Inclusive

image

This is a NOT BETWEEN example. This allows you to see if a column does not fall in a range. It is inclusive, meaning that in our example, we will be getting no rows where the grade_pt is between a 2.0 and 4.0 in their column! The 2.0 and the 4.0 will also not return.

LIKE uses Wildcards Percent ‘%’ and Underscore ‘_’

image

The wildcard percentage sign (%) is a wildcard for any number of characters. We are looking for anyone whose name starts with SM! In this example, the only row that would come back is ‘Smith’. The next page will show an example of underscore.

LIKE command Underscore is Wildcard for one Character

image

The _ underscore sign is a wildcard for any a single character. We are looking for anyone who has an 'a' as the second letter of their last name.

Quiz –Who Has the Letter 'n' in the 3rd Position

Bring back all columns from the
Student_Table if the student has an 'n' in
the 3rd character of their first name.

The _ underscore sign is a wildcard for any a single character. We are looking for anyone who has an 'n' in the third letter of their first name.

Answer - Who Has the Letter 'n' in the 3rd Position

image

The _ underscore sign is a wildcard for any a single character. We are looking for anyone who has an 'n' as the third letter of their last name.

LIKE is Case Sensitive

image

No rows returned because nobody's name started with capital SM.

LIKE Command to Find the Last Character of a Last_Name

image

Hadoop returns the answer set even though there are spaces at the end of each Last_Name . This happens because it is character data.

LIKE Command to Find Multiple Characters

image

We found anyone who had the letters 'm' and 's' in their Last_Name column. We needed the LOWER function, otherwise 'Smith' and 'McRoberts' would not have returned because their first letters were capitalized.

LIKE Command to Find Either Character

image

We found anyone who had the letters 'w' or 'b' in their Last_Name column. We needed the LOWER function, otherwise 'Wilson' and 'Bond' would not have returned, because their first letters were capitalized.

Answer – What Data is Left Justified and What is Right?

SELECTEmployee_No, First_Name

FROMEmployee_Table

WHEREEmployee_No = 2000000;

image

All Integers will start from the right and move left. All Character data will start from the left and move to the right.

An Example of Data with Left and Right Justification

SELECT Student_ID, Last_Name

FROMStudent_Table ;

image

This is how a standard result set will look. Notice that the integer type in Student_ID starts from the right and goes left. Character data type in Last_Name moves left to right like we are use to seeing while reading English.

A Visual of CHARACTER Data vs. VARCHAR Data

image

Character data pads spaces to the right and Varchar uses a 2-byte VLI instead.

Use the TRIM command to remove spaces on CHAR Data Escape Characters

image

Some systems using the LIKE command won't return answers when you are looking for the last character of a name because spaces are at the end. On those systems you need to include the TRIM command. Hadoop will return the rows without the TRIM, but you are now aware of a new option.

RTRIM command Removes Trailing spaces on CHAR Data

image

Some systems using the LIKE command won't return answers when you are looking for the last character of a name because spaces are at the end. On those systems you need to include the RTRIM command. RTRIM only removes spaces on the RIGHT of the data. Hadoop will return the rows without the RTRIM, but you are now aware of a new option.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset