A book on a database wouldn’t be complete without an element of programming. However, I am not going to teach you how to write SQL; I assume you already know how. I am also not going to walk you through each and every feature Snowflake offers in this space. Rather, I am going to give you examples of the more useful elements of Snowflake, especially where it differs from the traditional relational database management systems (RDBMSs) you might be more familiar with.
Creating New Tables
There are a few ways to create new tables in Snowflake. It’s worth being aware of them as each one offers a solution to a different use case.
Create Table Like
Column names
Default values
Constraints
Importantly, it doesn’t copy any data so it’s very fast. It creates an exact shell of the original table. This can be handy in a wide variety of use cases, but especially when you need to create a table in another schema quickly and easily.
You may have done this in the past using SELECT....INTO along with WHERE 1=2 (or similar). This would have carried out a similar operation in, say SQL Server, although the constraints would not be included.
Create Table as Select
Create a table from an existing table with all columns and rows
Create a table from an existing table with a subset of columns and/or rows
Create a table by changing the names of columns and the data types of an existing table
Create a table by joining other tables together
Create Table Clone
This command is used when you want to create a new table with the same column definitions and the same data from the source table. The important point here is that, although your new table will contain all the data, no data is actually copied because it leverages cloning behind the scenes.
This command also allows you to create a table at a particular point in time. This can be really useful for testing purposes. For example, you may need to recreate an error found in production, which requires you to have a set of tables as they were prior to the error occurring. Using the CREATE TABLE CLONE command is one way of doing this.
Copy Grants
As part of the CREATE TABLE commands above you can optionally use the COPY GRANTS command . This will inherit any existing permissions on the table you are cloning from. However, it will not inherit any future grants. This can save a lot of time when recreating permissions on a cloned object, especially if the permissions need to be exactly the same.
Stored Procedures
The use of stored procedures in traditional databases is a cornerstone of development and has been for years. Most database processes I’ve worked on over the years prior to the cloud database are underpinned by stored procedures and with good reason.
They encapsulate business logic in code and store it in one place, promoting a modular approach to development and repeatability. This eases maintenance if logic changes in the future. You change it in one place instead of hunting through your database for code. They are also more secure that just embedding SQL code in applications, as stored procedures can take input and output parameters, reuse query cache plans, and more.
So, when Snowflake arrives as the new kid on the block, the first question database developers ask is, “Does it support stored procedures?” The sales guy says, “Yes, of course.” A collective breath of relief from the developers as they realize their database migration will be a doddle. “They are written in JavaScript,” the sales guy adds. “What? Sorry, I thought I misheard you. I thought you said JavaScript??!” the lead developer asks. “Yes, I did,” says the sales guy. The look of disbelief etched on the faces of every developer who thought JavaScript was the mainstay of web developers is worth remembering!
This was how I recall my first meeting with Snowflake regarding this point. Several people have told me a similar story when they first learned about stored procedures in Snowflake.
While the vast majority of the functionality Snowflake offers is easy to work with and use when compared to other database technologies, it’s really disappointing that stored procedures feel like a complete afterthought. Not only are they written in JavaScript, but they also have very limited debugging functionality, leaving you as a developer spending hours trying to ensure you have the correct number of quotes. I can only hope this is a candidate for improvement on the product development roadmap.
Looking to use stored procedure functionality comes up regularly when migrating from a more traditional RDBMS such as SQL Server, for example. Organizations have typically spent a lot of time and effort investing in their data warehouse and the associated stored procedures, so naturally they want to port their code over to Snowflake.
When it comes to JavaScript, you don’t really need to be concerned. I am by no means a JavaScript expert and you don’t need to be one either. You just need the recipe to work with. The JavaScript that Snowflake uses is bare bones. For example, you cannot reference third-party libraries within the stored procedures. Primarily, you just need the JavaScript elements to act as a wrapper around your SQL. The SQL is executed within this wrapper by calling functions in the JavaScript API. I hope this section will go some way to demystifying this topic for you!
Interestingly, stored procedures with the same name but with different input parameters are treated as different objects. Each input parameter you specify must have a data type associated with it.
Stored procedures allow for procedural logic, such as branching and looping. You can also create dynamic SQL statements and run them within the JavaScript API.
That is everything you need to create a basic stored procedure. It’s a lot easier if you use the JavaScript as a wrapper to begin with.
User-Defined Functions
Scalar functions return one output value for each input value.
Tabular (table) functions return a table of zero, one, or many rows for each input row.
Functions can be written in SQL or JavaScript. Java was recently introduced as a third option. At the time of writing, it was in preview mode, so I won’t go into it in detail.
You’ll be typically working with SQL unless you wish to do any branching or looping in your function. This is when you can use JavaScript to provide greater flexibility.
Scalar Functions
A typical example of a scalar function in the real world might be to add sales tax to a net sale value. This is a common way of ensuring you're not hard coding your sales tax rates into your code across your database.
Table Functions
Table functions can be very powerful. As the name suggests, rather than just returning a single value like a scalar function, they return a result set in a tabular format.
As part of an application I am developing on top of Snowflake, I need to be able to pass a customer ID into my function and return the customer’s address.
SQL Variables
SET: Used to initialize variables
UNSET: Used to drop or remove variables
SHOW VARIABLES: Used to view variables within the current session
It is worth noting that if you’re connecting to Snowflake from an application, you can initialize variables on the connection string by passing them in as arguments.
To view variables defined in the current session, you can use the SHOW VARIABLES command .
It is worth noting that variables are scoped to a session, so when a user ends their session, all variables are dropped. This means that nothing outside of the current session can access these variables. To explicitly drop a variable, you can run the UNSET command .
Transactions
Let’s conclude this topic with a short section on transactions. A transaction is a statement or, more typically, a collection of SQL statements executed against the database as a single unit. This allows you to structure your code within transactions to ensure the integrity of your database.
- 1.Check if the customer has enough funds to make the bank transfer.
- a.
If not, abort the transaction and return an error to the customer.
- b.
If so, subtract the funds from the customer’s account.
- 2.
Credit the receiving account with the funds from the customer’s account.
If the credit to the receiving account returned an error, you wouldn’t want to subtract the funds from the customer’s account. To prevent this, you include the statements for points 1 and 2 within a transaction. This provides the opportunity to roll back and undo the transaction if anything within it fails.
By default, Snowflake has the AUTOCOMMIT setting set to On. This means that each and every SQL statement that contains DDL or DML will automatically commit or roll back if it fails. This is known as an implicit transaction.
Transactions Within Stored Procedures
A transaction can be inside a stored procedure, or a stored procedure can be inside a transaction. However, you can’t (and probably wouldn’t want to) start a transaction in one stored procedure and finish it with another.
It helps to keep in mind that a transaction is specific to the connection it is running within. Executing a stored procedure will create its own transaction, so you cannot begin a transaction and then call a stored procedure that attempts to commit and roll back the same transaction within it.
If the commit is run following the execution of the stored procedure, then all statements within the stored procedure are also committed. If you roll back the transaction, this will also roll back all statements.
Locking and Deadlocks
It is also worth being aware that the use of transactions, either implicit or explicit, can acquire locks on a table. This can occur with UPDATE, DELETE, and MERGE statements. Thankfully, most INSERT and COPY statements write into new partitions, meaning existing data is not modified. This is really handy as it allows parallel table loading without needing to be concerned with transactions blocking each other.
If a query is blocked by another transaction running against a table, known as a deadlock, the most recent transaction will wait for a certain period of time. This is guided by a parameter setting called LOCK_TIMEOUT , which can be amended at a session level if required.
If the LOCK_TIMEOUT duration is exceeded, the most recent transaction is selected as the “deadlock victim” and is rolled back.
Transaction Tips
In practice, it is best to encapsulate related statements that need to complete (or fail) as a unit of work within explicit transactions. This makes it easier for other developers to identify them.
Breaking a process down into discrete transactions not only makes it easier to read, maintain, and manage, but it also reduces the likelihood of locking a table for longer than required. This helps to prevent any unwanted deadlocks on resources. It also means that, should a transaction fail, it gives you the flexibility to only roll back those elements of a process that absolutely need to be rolled back.
Another key point is to avoid very large transactions. Again, one reason for this is that it could lock the table out for long periods unnecessarily. If you can, break the process down into batches of transactions. This might be on numbers of rows, days, categories—essentially whatever works for you and your process.
Bringing It All Together
Let’s explore a practical example. You’ll make use of a stored procedure along with a task to show how to use stored procedures to solve your own business problems.
The Example Scenario
In this example, you want to load staging tables with some customer data from a source system. For your purpose, the Snowflake sample database will fulfil this role. You want to populate different staging tables with data relating to different US states. You then want to add this information to a task so you can run it on a schedule. You create a stored procedure to handle this for you. It’s worth noting that tasks can execute one SQL statement but, if you want to run the insert command along with logging to a control table, for example, then calling a stored procedure from a task is a common pattern.
Steps
You could build on this example by adding a stream to the source table, as discussed in Chapter 2. The task could then query the stream to check for any records before deciding to run the task and execute the stored procedure. This is the same technique you used in Chapter 2.
Summary
In this chapter, you explored different ways to create tables and when you should consider each one. You looked at user-defined functions, which, in SQL form, are pretty close to traditional RDBMSs.
Stored procedures are the biggest departure from what you might be accustomed to if you’re coming from a T-SQL or PL/SQL background from SQL Server or Oracle, respectively. One reason for this is the use of JavaScript to support branching and looping. I hope this chapter helped demystify some of this for you. If you “just” need to execute SQL within your stored procedure, you only need to consider using a basic JavaScript wrapper around your SQL code.
You also touched upon SQL variables and how to set and unset them. Using variables can become important when considering how to make your code portable between different environments. For example, you can use it to contain the values to source databases and schemas depending on the environment you are working within (e.g., development, test, or production).
Finally, you explored transactions. If you’re coming from a relational database world, these concepts should be familiar.
In the next chapter, you will examine what you need to do if you start to suffer performance issues in Snowflake. Sure, you can increase the sizes of your virtual warehouses, but adopting that approach isn’t always sustainable or the best way to cater to poorly written code or badly designed tables.