“I saw the angel in the marble and carved until I set him free.”
- Michelangelo
The WHERE Clause here filters how many ROWS are coming back. In this example, I am asking for the report to only rows WHERE the first name is Henry.
No rows were returned above because we used a lowercase 'henry' and the table contains 'Henry' with a capital letter.
Character data (letters) need single quotes, but you need NO Single Quotes for Integers (numbers).
When you ALIAS a column, you give it a new name for the report header, but a good rule of thumb is to refer to the column by the alias throughout the query.
First thing you need to know about a NULL is it is unknown data. It is NOT a zero. It is missing data. Since we don’t know what is in a NULL, you can’t use an = sign. You must use IS NULL or IS NOT NULL.
First thing you need to know about a NULL is it is unknown data. It is NOT a zero. It is missing data. So, the only keywords that are effective when evaluating a Null value are IS NULL or IS NOT NULL. The Hadoop system does not put a question mark in the data. It is the tool that assigns the question mark.
We can’t compare NOT NULL with any equal sign. We deal with NULL values with IS NULL or IS NOT NULL keywords.
SELECT *
FROM Student_Table
WHEREClass_Code IS NOT NULL ;
Much like before, when you want to bring back the rows that do not have NULLs in them, you put an ‘IS NOT NULL’ in the WHERE Clause.
The WHERE Clause doesn’t just deal with ‘Equals’. You can look for things that are GREATER or LESSER THAN equal.
The WHERE Clause doesn’t just deal with ‘Equals’. You can look for things that are GREATER or LESSER THAN along with asking for things that are GREATER/LESSER THAN or EQUAL to.
Notice the WHERE statement and the word AND. In this example, qualifying rows must have a Class_Code = ‘FR’ and also must have a First_Name of ‘Henry’. Notice how the WHERE and the AND clause are on their own line. Good practice!
What is going wrong here? You are using an AND to check the same column. What you are basically asking with this syntax is to see the rows that have BOTH a Grade_Pt of 3.0 and a 4.0. That is impossible, so no rows will be returned.
Notice above in the WHERE Clause we use OR. OR allows for either of the parameters to be TRUE in order for the data to qualify and return.
Notice above in the WHERE Clause we use OR. OR allows for either of the parameters to be TRUE in order for the data to qualify and return. The first example errors and is a common mistake. The second example is perfect.
This query errors! What is WRONG with this syntax? No Single quotes around SR.
Notice that AND separates two different columns, and the data will come back if both are TRUE.
SELECT * FROM Student_Table
WHERE Grade_Pt = 4.0 OR Grade_Pt = 3.0
AND Class_Code = 'SR' ;
Which Seniors have a 3.0 or a 4.0 Grade_Pt average. How many rows will return?
A) 2 |
C) Error |
B) 1 |
D) 3 |
We are looking for any senior who has a grade_pt of either 4.0 or 3.0. How many rows will return?
SELECT * FROM Student_Table
WHERE Grade_Pt = 4.0 OR Grade_Pt = 3.0
AND Class_Code = 'SR' ;
We are looking for any senior who has a grade_pt of either 4.0 or 3.0. How come we had two rows return?
|
( ) |
|
NOT |
|
AND |
|
OR |
SELECT *
FROMStudent_Table
WHERE Grade_Pt = 4.0 OR Grade_Pt = 3.0
AND Class_Code = 'SR' ;
Syntax has an ORDER OF PRECEDENCE. It will read anything with parentheses around it first. Then, it will read all the NOT statements. Next, the AND statements. FINALLY, the OR Statements. This is why the last query came out odd. Let’s fix it and bring back the right answer set.
This is the proper way of looking for rows that have both a Grade_Pt of 3.0 or 4.0 AND also having a Class_Code of ‘SR’. Only ONE row comes back. Parentheses are evaluated first, so this allows you to direct exactly what you want to work first.
This is another technique when looking for rows that have both a Grade_Pt of 3.0 or 4.0 AND also having a Class_Code of ‘SR’. Only ONE row comes back. Parentheses are evaluated first and the IN list has parenthesis.
Using an IN List is a great way of looking for multiple values that you would normally have to separate with an OR.
The IN Statement avoids retyping the same column name separated by an OR. The IN allows you to search the same column for a list of values. This works with character data as long as you use single quotes.
You can also ask to see the results that ARE NOT IN your parameter list.
This is a BETWEEN. This allows you to see if a column falls in a range. It is inclusive, meaning that in our example, we will be getting the rows that also have a 2.0 and 4.0 in their column!
This is a NOT BETWEEN example. This allows you to see if a column does not fall in a range. It is inclusive, meaning that in our example, we will be getting no rows where the grade_pt is between a 2.0 and 4.0 in their column! The 2.0 and the 4.0 will also not return.
The wildcard percentage sign (%) is a wildcard for any number of characters. We are looking for anyone whose name starts with SM! In this example, the only row that would come back is ‘Smith’. The next page will show an example of underscore.
The _ underscore sign is a wildcard for any a single character. We are looking for anyone who has an 'a' as the second letter of their last name.
Bring back all columns from the
Student_Table if the student has an 'n' in
the 3rd character of their first name.
The _ underscore sign is a wildcard for any a single character. We are looking for anyone who has an 'n' in the third letter of their first name.
The _ underscore sign is a wildcard for any a single character. We are looking for anyone who has an 'n' as the third letter of their last name.
No rows returned because nobody's name started with capital SM.
Hadoop returns the answer set even though there are spaces at the end of each Last_Name . This happens because it is character data.
We found anyone who had the letters 'm' and 's' in their Last_Name column. We needed the LOWER function, otherwise 'Smith' and 'McRoberts' would not have returned because their first letters were capitalized.
We found anyone who had the letters 'w' or 'b' in their Last_Name column. We needed the LOWER function, otherwise 'Wilson' and 'Bond' would not have returned, because their first letters were capitalized.
SELECTEmployee_No, First_Name
FROMEmployee_Table
WHEREEmployee_No = 2000000;
All Integers will start from the right and move left. All Character data will start from the left and move to the right.
SELECT Student_ID, Last_Name
FROMStudent_Table ;
This is how a standard result set will look. Notice that the integer type in Student_ID starts from the right and goes left. Character data type in Last_Name moves left to right like we are use to seeing while reading English.
Character data pads spaces to the right and Varchar uses a 2-byte VLI instead.
Some systems using the LIKE command won't return answers when you are looking for the last character of a name because spaces are at the end. On those systems you need to include the TRIM command. Hadoop will return the rows without the TRIM, but you are now aware of a new option.
Some systems using the LIKE command won't return answers when you are looking for the last character of a name because spaces are at the end. On those systems you need to include the RTRIM command. RTRIM only removes spaces on the RIGHT of the data. Hadoop will return the rows without the RTRIM, but you are now aware of a new option.