The PostgreSQL control structure is an essential part of the PL/pgSQL language; it enables developers to code very complex business logic inside PostgreSQL.
The general syntax of a variable declaration is as follows:
name [ CONSTANT ] type [ COLLATE collation_name ] [ NOT NULL ] [ { DEFAULT | := | = } expression ];
Let's see what the keywords mean:
name
: The name should follow the naming rules discussed in Chapter 3, PostgreSQL Basic Building Blocks. For example, the name should not start with an integer.CONSTANT
: The variable cannot be assigned another value after the initialization. This is useful in defining constant variables such as Pi.type
: The type of variable can be simple such as integer, user defined data type, pseudo type, record, and so on. Since a type is created implicitly on creating a table, one can use this type to declare a variable.NOT NULL
: Not null causes a runtime error to be raised if the variable is assigned to null. Not null variables must be initialized.DEFAULT
: Causes the initialization of the variable to be delayed until the block is entered. This is useful in defining a timestamp variable to indicate when the function is called but not the function pre-compilation time.The PostgreSQL PL/pgSQL function body is composed of nested blocks with an optional declaration section and a label. Variables are declared in the declare section of the block, shown as follows:
[<<label>>] [ DECLARE declarations] BEGIN statements END [label];
The BEGIN
and END
keywords are not used in this context to control transactional behavior, but only for grouping. The declaration section is used for declaring variables, and the label is used to give a name for the block as well as to give fully qualified names to the variables. Finally, in each PL/pgSQL function, there is a hidden block labeled with the function name that contains predefined variables such as FOUND
. To understand the function block, let us take a look at the following code, which defines the factorial function in a recursive manner:
CREATE OR REPLACE FUNCTION factorial(INTEGER ) RETURNS INTEGER AS $$ BEGIN IF $1 IS NULL OR $1 < 0 THEN RAISE NOTICE 'Invalid Number'; RETURN NULL; ELSIF $1 = 1 THEN RETURN 1; ELSE RETURN factorial($1 - 1) * $1; END IF; END; $$ LANGUAGE 'plpgsql'
The block defines the variable scope; in our example, the scope of the argument variable $1
is the whole function. Also, as shown in the example, there is no declaration section.
To understand the scope between different code blocks, let us write the factorial function in a slightly different manner, which is as follows:
CREATE OR REPLACE FUNCTION factorial(INTEGER ) RETURNS INTEGER AS $$ DECLARE fact ALIAS FOR $1; BEGIN IF fact IS NULL OR fact < 0 THEN RAISE NOTICE 'Invalid Number'; RETURN NULL; ELSIF fact = 1 THEN RETURN 1; END IF; DECLARE result INT; BEGIN result = factorial(fact - 1) * fact; RETURN result; END; END; $$ LANGUAGE 'plpgsql'
The preceding function is composed of two blocks: the variable fact
is an alias for the first argument. In the sub-block, the result
variable is declared with a type integer. Since the fact
variable is defined in the upper block, it can also be used in the sub-block. The result
variable can be used only in the sub-block.
The assignment operators :=
and =
are used to assign an expression to a variable, as follows:
variable { := | = } expression;
For the variable names, one should choose names that do not conflict with the column names. This is important when writing parameterized SQL statements.
The =
operator is supported not only for PostgreSQL version 9.4 but also for the previous versions. Unfortunately, the documentations for the previous versions does not mention it. Moreover, since the =
operator is used in SQL for equality comparison, it is preferable to use the :=
operator to reduce confusion.
In certain contexts, it is important to pick up the right assignment operator:
=
operator, as indicated in the documentation at http://www.postgresql.org/docs/current/interactive/sql-createfunction.html.:=
operator, as shown in the documentation at http://www.postgresql.org/docs/current/interactive/sql-syntax-calling-funcs.html#SQL-SYNTAX-CALLING-FUNCS-NAMED.The following example shows a case when one cannot use =
and =:
interchangeably:
CREATE OR REPLACE FUNCTION cast_numeric_to_int (numeric_value numeric, round boolean = TRUE /*correct use of "=". Using ":=" will raise a syntax error */) RETURNS INT AS $$ BEGIN RETURN (CASE WHEN round = TRUE THEN CAST (numeric_value AS INTEGER) WHEN numeric_value>= 0 THEN CAST (numeric_value -.5 AS INTEGER) WHEN numeric_value< 0 THEN CAST (numeric_value +.5 AS INTEGER) ELSE NULL END); END; $$ LANGUAGE plpgsql; car_portal=# SELECT cast_numeric_to_int(2.3, round:= true /*correct use of :=*/); cast_numeric_to_int --------------------- 2 (1 row) car_portal=# SELECT cast_numeric_to_int(2.3, round= true ); ERROR: column "round" does not exist LINE 1: SELECT cast_numeric_to_int(2.3, round= true );
The assignment expression can be a single atomic value such as pi = 3.14
, or it can be a row:
DO $$ DECLARE test record; BEGIN test =: ROW (1,'hello', 3.14); RAISE notice '%', test; END; $$
Finally, it can be a result of the evaluation of the SELECT
statement, as shown in the following example. Note that the SELECT
statement should not return a set:
car_portal=# DO $$ DECLARE number_of_accounts INT:=0; BEGIN number_of_accounts:= (SELECT COUNT(*) FROM car_portal_app.account)::INT; RAISE NOTICE 'number_of accounts: %', number_of_accounts; END;$$ LANGUAGE plpgSQL; NOTICE: number_of accounts: 1 DO
There are other techniques for assigning values to the variables from a query that returns a single row:
SELECT select_expressions INTO [STRICT] targets FROM ...; INSERT ... RETURNING expressions INTO [STRICT] targets; UPDATE ... RETURNING expressions INTO [STRICT] targets; DELETE ... RETURNING expressions INTO [STRICT] targets;
Often, the expressions are column names, while the targets are variable names. In the case of select into, the target can be of the type record.
The query INSERT … RETURNING
is often used to return the default value of a certain column; this can be used to define the ID of a primary key using the SERIAL
and BIGSERIAL
data types, which is shown as follows:
CREATE TABLE test ( id SERIAL PRIMARY KEY, name TEXT NOT NULL ); DO $$ DECLARE auto_generated_id INT; BEGIN INSERT INTO test(name) VALUES ('Hello World') RETURNING id INTO auto_generated_id; RAISE NOTICE 'The primary key is: %', auto_generated_id; END $$; --- the result of executing the above --NOTICE: The primary key is: 1 --DO
Finally, one could use qualified names to perform assignment; in trigger functions, one could use NEW
and OLD
to manipulate the values of these records.
PostgreSQL supports the IF
and CASE
statements, which allow a conditional execution based on a certain condition. PostgreSQL supports the IF
statement construct, as follows:
IF boolean_expression THEN statement [statement]... [ELSIF boolean_expression THEN statement [statement]...] [ELSIF boolean_expression THEN statement [statement]...]... [ELSE statement [statement]...] END IF;
The case statement comes in two forms, as follows:
CASE search-expression WHEN expression [, expression [ ... ]] THEN statements [ WHEN expression [, expression [ ... ]] THEN statements ... ] [ ELSE statements ] END CASE; CASE WHEN boolean-expression THEN statements [ WHENboolean-expression THEN statements ... ] [ ELSE statements ] END CASE;
To understand the IF
statement, let us assume that we would like to convert the advertisement rank to text, as follows:
CREATE OR REPLACE FUNCTION cast_rank_to_text (rank int) RETURNS TEXT AS $$ DECLARE rank ALIAS FOR $1; rank_result TEXT; BEGIN IF rank = 5 THEN rank_result = 'Excellent'; ELSIF rank = 4 THEN rank_result = 'Very Good'; ELSIF rank = 3 THEN rank_result = 'Good'; ELSIF rank = 2 THEN rank_result ='Fair'; ELSIF rank = 1 THEN rank_result ='Poor'; ELSE rank_result ='No such rank'; END IF; RETURN rank_result; END; $$ Language plpgsql; --- to test the function SELECT n,cast_rank_to_text(n) FROM generate_series(1,5) as foo(n);
When any branch of the IF
statement is executed due to the IF
condition being met, then the execution control returns to the first statement after END IF,
assuming the RETURN
statement is not executed inside this branch. If none of the conditions are met for IF
or ELSIF,
then the ELSE
branch will be executed. Also note that one could nest all the control structures; so, one can have an IF statement inside another one.
The following code snippet implements the preceding function using the CASE
statement:
CREATE OR REPLACE FUNCTION cast_rank_to_text (rank int) RETURNS TEXT AS $$ DECLARE rank ALIAS FOR $1; rank_result TEXT; BEGIN CASE rank WHEN 5 THEN rank_result = 'Excellent'; WHEN 4 THEN rank_result = 'Very Good'; WHEN 3 THEN rank_result = 'Good'; WHEN 2 THEN rank_result ='Fair'; WHEN 1 THEN rank_result ='Poor'; ELSE rank_result ='No such rank'; END CASE; RETURN rank_result; END; $$ Language plpgsql;
In the CASE
statement, if any branch of the case matches the selector, the execution of the case is terminated and the execution control goes to the first statement after CASE
. Moreover, in the previous form of CASE
, one cannot use it to match NULL
values, since NULL
equality is NULL
. To overcome this limitation, one should specify the matching condition explicitly using the second form of CASE statement, as follows:
CREATE OR REPLACE FUNCTION cast_rank_to_text (rank int) RETURNS TEXT AS $$ DECLARE rank ALIAS FOR $1; rank_result TEXT; BEGIN CASE WHEN rank=5 THEN rank_result = 'Excellent'; WHEN rank=4 THEN rank_result = 'Very Good'; WHEN rank=3 THEN rank_result = 'Good'; WHEN rank=2 THEN rank_result ='Fair'; WHEN rank=1 THEN rank_result ='Poor'; WHEN rank IS NULL THEN RAISE EXCEPTION 'Rank should be not NULL'; ELSE rank_result ='No such rank'; END CASE; RETURN rank_result; END; $$ Language plpgsql; --- to test SELECT cast_rank_to_text(null);
Finally, the
CASE
statement raises an exception if no branch is matched and the ELSE
branch is not specified, as follows:
DO $$ DECLARE i int := 0; BEGIN case WHEN i=1 then RAISE NOTICE 'i is one'; END CASE; END; $$ LANGUAGE plpgsql; ERROR: case not found HINT: CASE statement is missing ELSE part. CONTEXT: PL/pgSQL function inline_code_block line 5 at CASE
Iteration is used to repeat a block of statements to achieve a certain goal. With iteration, one often needs to specify the starting point and the ending condition. In PostgreSQL, there are several statements for iterating through the results and for performing looping, including LOOP
, CONTINUE
, EXIT
, FOR
, WHILE,
and FOR EACH
.
The basic LOOP
statement has the following structure:
LOOP Statements END LOOP;
To understand the
LOOP
statement, let us rewrite the factorial function, as follows:
CREATE OR REPLACE FUNCTION factorial (fact int) RETURNS BIGINT AS $$ DECLARE result bigint = 1; BEGIN IF fact = 1 THEN RETURN 1; ELSIF fact IS NULL or fact < 1 THEN RAISE EXCEPTION 'Provide a positive integer'; ELSE LOOP result = result*fact; fact = fact-1; EXIT WHEN fact = 1; END Loop; END IF; RETURN result; END; $$ LANGUAGE plpgsql;
In the preceding code, the conditional EXIT
statement is used to prevent infinite looping by exiting the LOOP
statement. When an EXIT
statement is encountered, the execution control goes to the first statement after the LOOP
. To control the execution of the statements inside the LOOP
statement, PL/pgSQL also provides the CONTINUE
statement, which works somewhat like the EXIT
statement. Thus, instead of forcing termination, the CONTINUE
statement forces the next iteration of the loop to take place, skipping any code in between.
The WHILE
statement keeps executing a block of statements while a particular condition is met. Its syntax is as follows:
WHILE boolean-expression LOOP statements END LOOP ;
The following example uses the while loop to print the days of the current month:
DO $$ DECLARE first_day_in_month date := date_trunc('month', current_date)::date; last_day_in_month date := (date_trunc('month', current_date)+ INTERVAL '1 MONTH - 1 day')::date; counter date = first_day_in_month; BEGIN WHILE (counter <= last_day_in_month) LOOP RAISE notice '%', counter; counter := counter + interval '1 day'; END LOOP; END; $$ LANGUAGE plpgsql;
PL/pgSQL provides two forms of the FOR statement, and they are used to:
The syntax of the for
loop statement is:
FOR index_name IN [ REVERSE ] expression1 .. expression2 [ BY expression ] LOOP statements END LOOP;
index_name
is the name of a local variable of the type integer. This local variable scope is the FOR
loop. Statements inside the loop can read this variable, but cannot change its value. Finally, one can change this behavior by defining the variable in the declaration section of the outer block. expression1
and expression2
must be evaluated to integer values; if expression1
equals expression2
then the FOR
loop is run only once.
The REVERSE
key word is optional, and it is used to define the order in which the range will be generated (ascending or descending). If REVERSE
is omitted, then expression1
should be smaller than expression2
, otherwise the loop will not be executed. Finally, BY
defines the steps between two successive numbers in the range. Note that the BY
value should always be a positive integer. The following example shows a FOR
loop iterating over a negative range of numbers in the reverse order:
DO $$ BEGIN FOR j IN REVERSE -1 .. -10 BY 2 LOOP Raise notice '%', j; END LOOP; END; $$ LANGUAGE plpgsql; --- output should be NOTICE: -1 NOTICE: -3 NOTICE: -5 NOTICE: -7 NOTICE: -9
To iterate through the result of a set query, the syntax is different, as follows:
[ FOR index_name IN { cursor | ( select_statement )} LOOP statement... END LOOP ;
index_name
is a local variable in the outer block. Its type is not restricted to simple types such as integer and text. However, its type might be a composite or a RECORD
data type. In PL/pgSQL, one could iterate over a cursor result or over a SELECT
statement result.
The following example shows all the database names:
DO $$ DECLARE database RECORD; BEGIN FOR database IN SELECT * FROM pg_database LOOP RAISE notice '%', database.datname; END LOOP; END; $$; --- output should be like NOTICE: template1 NOTICE: template0 NOTICE: postgres ...
The PostgreSQL return statement is used for terminating the function, and for returning the control to the caller. The return statement has different forms such as RETURN
, RETURN NEXT
, RETURN QUERY
, RETURN
expression, RETURN QUERY EXECUTE,
and so on. The return statement can return a single value or a set to the caller, as will be shown in this chapter. In this context, let us consider the following anonymous function:
postgres=# DO $$ BEGIN RETURN; RAISE NOTICE 'This statement will not be executed'; END $$ LANGUAGE plpgsql; DO postgres=#
As shown in the preceding example, the function is terminated before the execution of the RAISE
statement due to the RETURN
statement.
A void type is used in a function for performing some side effects such as logging; the built-in function pg_sleep
is used to delay the execution of a server process in seconds:
df pg_sleep List of functions Schema | Name | Result data type | Argument data types | Type ------------+----------+------------------+---------------------+-------- pg_catalog | pg_sleep | void | double precision | normal
PL/pgSQL can be used to return a single row from a function; an example of this type is the factorial function.
The return type can be base, composite, domain, pseudo type, or domain data type. The following function returns a JSON representation of a certain account:
CREATE OR REPLACE FUNCTION car_portal_app.get_account_in_json (account_id INT) RETURNs JSON AS $$ SELECT row_to_json(account) FROM car_portal_app.account WHERE account_id = $1; $$ LANGUAGE SQL; -- For test WITH inserted_account AS ( INSERT into car_portal_app.account VALUES (DEFAULT, 'first_name', 'last_name', '[email protected]','some_pass') RETURNING account_id ) SELECT car_portal_app.get_account_in_json (account_id) FROM inserted_account; --Result "{"account_id":15,"first_name":"first_name","last_name":"last_name","email":"[email protected]","password":"some_pass"}"
Set Returning Functions (SRFs) can be used to return a set of rows. The row type can either be base type such as integer, composite, table type, pseudo type, or domain type.
To return a set from a function, the keyword SETOF
is used to mark the function as an SRF, as follows:
CREATE OR REPLACE FUNCTION advertisement_in_last_7_days() RETURNS SETOF car_portal_app.advertisement AS $$ SELECT * FROM car_portal_app.advertisement WHERE advertisement_date >= now() -INTERVAL '7 days'; $$ LANGUAGE SQL; --To test SELECT * FROM advertisement_in_last_7_days()
If the return type is not defined, one could:
Let us assume that we would like to return only the car_id
and the account_seller_id
, as in this case, we do not have a data type. Thus, the preceding function can be written as:
CREATE OR REPLACE FUNCTION advertisement_in_last_7_days() RETURNS TABLE (car_id INT, seller_account_id INT) AS $$ BEGIN RETURN QUERY SELECT a.car_id, a.seller_account_id FROM car_portal_app.advertisement a WHERE advertisement_date >= now() -INTERVAL '7 days'; END; $$ LANGUAGE plpgSQL; --To TEST SELECT * FROM advertisement_in_last_7_days()
Note that in the preceding function, we use an alias; else, PL/pgSQL will raise an error, as follows:
ERROR: column reference "car_id" is ambiguous ... DETAIL: It could refer to either a PL/pgSQL variable or a table column. QUERY: SELECT car_id, seller_account_id FROM
Moreover, in this function, RETURN QUERY
is used only to show a different option in returning a query result.
The preceding function can also be written using the OUT
variables; actually, the return table is implemented internally as indicated by the error using the OUT
variables, as follows:
CREATE OR REPLACE FUNCTION advertisement_in_last_7_days(OUT car_id INT, OUT seller_account_id INT) RETURNS SETOF RECORD AS $$ BEGIN RETURN QUERY SELECT a.car_id, a.seller_account_id FROM car_portal_app.advertisement a WHERE advertisement_date >= now() -INTERVAL '7 days'; END; $$ LANGUAGE plpgsql;
The PL/pgSQL functions have several special variables that are created automatically in the top-level block. For example, if the function returns a trigger, then several variables such as NEW
, OLD,
and TG_OP
are created.
In addition to the trigger special values, there is a Boolean variable called FOUND
. This is often used in combination with DML
and PERFORM
statements to conduct sanity checks. The value of the FOUND
variable is affected by the SELECT
, INSERT
, UPDATE
, DELETE
, and PERFORM
statements. These statements set FOUND
to true
if at least one row is selected, inserted, updated, or deleted.
The PERFORM
statement is similar to the SELECT
statement, but it discards the result of the query. Finally, the EXECUTE
statement does not change the value of the FOUND
variable. The following examples show how the FOUND
variable is affected by the INSERT
and PERFORM
statements:
DO $$ BEGIN CREATE TABLE t1(f1 int); --- Set FOUND to true INSERT INTO t1 VALUES (1); RAISE NOTICE '%', FOUND; --- Set FOUND to false PERFORM* FROM t1 WHERE f1 = 0; RAISE NOTICE '%', FOUND; DROP TABLE t1; END; $$LANGUAGE plpgsql; --- output NOTICE: t NOTICE: f
In addition to the preceding query, one could get the last OID—object identifier—for an inserted row as well as the affected number of rows by using the INSERT
, UPDATE,
and DELETE
statements via the following commands:
GET DIAGNOSTICS variable = item;
Assuming that there is variable called i
of type integer, one can get the affected number of rows, as follows:
GET DIAGNOSTICS i = ROW_COUNT;