Chapter 17 – Hadoop EXPLAIN

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 16 – Statistical Aggregate Functions

Chapter 17 – Hadoop EXPLAIN

“Lucy, you got some ‘splainin’ to do!”

- Ricky Ricardo

There are Many Options to See an EXPLAIN Plan

Syntax: EXPLAIN [EXTENDED|DEPENDENCY|AUTHORIZATION] query

To get an EXPLAIN, you can either type the word EXPLAIN in front of any Hadoop SQL or click on the magnifying glass of Nexus. Hive provides an EXPLAIN command designed to show the user the execution plan for a query being explained. The EXPLAIN plan is designed to show the user exactly how Hive translates queries into MapReduce jobs.

Explain Output has Three Parts

An Abstract Syntax Tree is really the internal
source code that is represented as a tree of nodes.

The Explain has three parts, including the Abstract Syntax Tree, the plan dependencies and a description. The description of the stages shows the sequence of operations (operators) and their metadata, such as filter expressions, select expressions or output file names. Hive takes SQL and converts it into map reduce jobs so understanding the Explain can be valuable. Each Hive query gets converted into a sequence called a "Directed Acyclic Graph" of stages. These stages will either be a map/reduce stage or metastore/file system operations.

EXPLAIN EXTENDED and the Abstract Syntax Tree

The abstract syntax tree shows how Hive parsed the query into tokens and
literals, as part of the first step in building the query into the result set.

The Explain has three parts, including the Abstract Syntax Tree, the plan dependencies and a description. An Abstract Syntax Tree is really the internal source code that is represented as a tree of nodes. Each node represents constants or variables that are presented as a leaf with operators or statements that are the inner node portions. This is also called a "parse tree". Above, we have the nodes representing the Query, Insert and Select portion of the plan.

EXPLAIN EXTENDED Stage Plans and Stage Dependencies

Use the EXPLAIN EXTENDED in front of your SQL to get the Stage Dependencies and Stage Plans. The stage dependencies show the # of stages and the dependency relationship among stages. A stage could be a MapReduce job, a sampling stage, a merge stage, or a limit stage that Hive needs to do. Hive generally executes these stages one at a time, unless it can use parallel execution for some stages. For most queries, there is just one stage. The stage plan provides important information on how Hive is running the job, but it is verbose and complex. Understanding the how Hive parses and plans every query is not always useful, however it can help when analyzing complex or poorly performing queries.

EXPLAIN DEPENDENCY Keywords in an Explain

EXPLAIN DEPENDENCY

SELECT * FROM

Employee_Table

{"input_partitions":[],"input_tables":[{"tablename":"sql_class@employ

ee_table","tabletype":"MANAGED_TABLE"}]}

The dependencies above show that the table will be the Employee_Table and it is in the database SQL_Class (sometimes referred to as the schema) and that it is a managed table (not an External Table).

The use of DEPENDENCY in the EXPLAIN statement produces extra information concerning each of the inputs in the plan. It shows all the varying attributes for the inputs. The inputs contain all of the tables and any partitions that are being utilized. The dependencies will even show the parents in case a table is accessed via a view.

EXPLAIN AUTHORIZATION Keywords in an Explain

EXPLAIN AUTHORIZATION

SELECT * FROM

Employee_Table

The authorization phase shows the user, the operation, the input tables and the outputs that concern the authorization mechanisms that have been put in place by Hive.

Using a WHERE Clause Explains a Predicate

CREATE TABLE Employee_table

(Employee_No int

,Dept_No smallint

,Last_name string

,First_name string

,Salary decimal(8,2) ) ;

EXPLAIN SELECT * FROM Employee_Table

WHERE Employee_No = 2000000 ;

Stage-0

Fetch Operator

limit:-1

Select Operator [SEL_4528]

outputColumnNames:["_col0","_col1","_col2","_col3","_col4"]

Filter Operator [FIL_4530]

predicate:(employee_no = 2000000 ) (type: boolean)

TableScan [TS_4526]

alias:employee_table

The predicate means the value in the WHERE clause.

EXPLAIN with an ORDER BY Statement

EXPLAIN SELECT * FROM Department_Table

ORDER BY Department_Name ;

Vertex dependency in root stage

Reducer 2 <- Map 1 (SIMPLE_EDGE)

Stage-0Fetch Operator

limit:-1

Stage-1 Reducer 2

File Output Operator [FS_4558] compressed:false

Statistics:Num rows: 1 Data size: 185 Basic stats: COMPLETE Column stats: NONE

table:{"serde:":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe","input

format:":"org.apache.hadoop.mapred.TextInputFormat","output

format:":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"}

Select Operator [SEL_4557] outputColumnNames:["_col0","_col1","_col2","_col3"]

| Statistics:Num rows: 1 Data size: 185 Basic stats: COMPLETE Column stats: NONE

|<-Map 1 [SIMPLE_EDGE]

Reduce Output Operator [RS_4556] key expressions:_col1 (type: string) sort order:+

Statistics:Num rows: 1 Data size: 185 Basic stats: COMPLETE Column stats: NONE

value expressions:_col0 (type: smallint), _col2 (type: int), _col3 (type: string)

Select Operator [SEL_4555] outputColumnNames:["_col0","_col1","_col2","_col3"]

Statistics:Num rows: 1 Data size: 185 Basic stats: COMPLETE Column stats: NONE

TableScan [TS_4554] alias:department_table

Statistics:Num rows: 1 Data size: 185 Basic stats: COMPLETE Column stats: NONE

Notice now we are using 2 reducers. The table is a serde and the sort order is asc.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 17 – Hadoop EXPLAIN

Create new playlist

Sign In

Sign Up

Chapter 17 – Hadoop EXPLAIN

There are Many Options to See an EXPLAIN Plan

Explain Output has Three Parts

EXPLAIN EXTENDED and the Abstract Syntax Tree

EXPLAIN EXTENDED Stage Plans and Stage Dependencies

EXPLAIN DEPENDENCY Keywords in an Explain

EXPLAIN AUTHORIZATION Keywords in an Explain

Using a WHERE Clause Explains a Predicate

EXPLAIN with an ORDER BY Statement

Table of Contents for
Chapter 17 – Hadoop EXPLAIN