A fundamental pillar of good data management is having a robust approach to data access and security controls. In today’s world, securing, managing, and auditing access to sensitive data is paramount to keeping data safe.
As regulatory and IT security teams struggle to keep up with the rate of change, technology providers continue to offer new and innovative ways of supporting these efforts. Snowflake is no different. It offers a range of interesting capabilities in this space.
In this chapter, you will explore security and access management in Snowflake. You’ll start by understanding the role hierarchy along with how privileges are granted and inherited. You’ll also cover how to extend the out-of-the-box functionality by creating your own custom roles.
You’ll build upon these fundamental principles and investigate how Snowflake supports authentication and single sign-on (SSO) use cases. You’ll consider this from an end user perspective. You’ll also consider programmatic access for applications using service accounts.
You’ll move on from authentication to touch upon the network controls Snowflake provides and how to uplift this functionality if your organization dictates it.
I’ll introduce you to a neat approach to handling personally identifiable information (PII). Finally, I’ll wrap things up by discussing some advanced Snowflake security features, providing you with the comprehensive knowledge you need to ensure your Snowflake implementation is secure.
Roles
Predefined System Roles
Role | Description |
---|---|
ACCOUNTADMIN | The most powerful role available, it administers the Snowflake account. This role can view all credit and billing information. The other admin roles are children of this role: SYSADMIN and SECURITYADMIN. Snowflake recommends limiting the use of this account and restricting access to a minimum set of users. You should avoid setting this role as a default for any user. Additionally, I strongly recommend configuring multi-factor authentication (MFA) on this account, which I’ll cover later in this chapter. |
SYSADMIN | This role can create warehouses and databases and all objects within a database (schemas, tables, views, etc.) |
SECURITYADMIN | This role is designed for the administration of security. This includes the management of granting or revoking privileges to roles. |
USERADMIN | This role is used for creating roles and users and managing the privileges assigned to them. The USERADMIN role is a child of the SECURITYADMIN role. |
PUBLIC | The PUBLIC role is a default role that all users end up in automatically. This provides privileges to log into Snowflake and some basic object access. |
These system-defined roles provide a starting point for you to build upon to fit the needs of your own organization. Custom roles can be added to the hierarchy (using the SECURITYADMIN role) and rolled up to the SYSADMIN role. Customizing and extending the hierarchy is something I’ll cover in more detail later in this chapter.
As a user, you can be a member of more than just one role, which is typical across a range of applications you might have used previously. Some systems look at all the roles your user is assigned to in order to understand what permissions you have, before deciding to perform your requested action. Snowflake is different. It forces you to select a role before you run any queries.
You need to switch to either the SECURITYADMIN or USERADMIN role to perform this action. This behavior forces users to follow role-based access control tightly.
Role Hierarchy
Figure 4-1 illustrates the role hierarchy in Snowflake. It follows a role-based access control (RBAC) framework, which allows privileges to be granted between objects and roles. Roles contain users and therefore the objective of the RBAC approach is to simplify the maintenance of access control.
Roles can also be assigned to other roles, resulting in a role hierarchy. This creates a level of flexibility, but also introduces the risk of additional complexity if you’re approach isn’t well thought out.
RBAC is a widely used concept and, if you have had experience using it, you know it can get complicated very quickly. A poor design also leads to a lot of unnecessary effort when maintaining a hierarchy.
Inheritance
Not only can privileges on objects be assigned directly to roles, but they can also be inherited from other roles. To illustrate this, let’s use an example business requirement that needs to be in place to support a new project.
This single command grants the Marketing Insight team read-only privileges on the HR schema and data contained within it.
I am sure you’re starting to see that this provides not only a large amount of flexibility, but that it’s easy to create layers upon layers of roles resulting in something resembling a spider web! Over time, this can grow to be a highly complex beast. Unpicking it would require a lot of time and effort and is often avoided by IT teams due to the inherent risks involved.
If you have an approach like the one I am going to present in this chapter, it does provide an opportunity. Instead of creating each custom role from scratch, you can create a layer of abstraction where one layer of roles can inherit the privileges of another layer.
Objects
When a user creates an object in Snowflake, they effectively own that object and can grant access to it. This is known as discretionary access control (DAC). RBAC then comes into play as the owner grants privileges on the objects to specific roles.
Extending the Role Hierarchy
One important principal to follow here is to separate the roles that are used to grant access to objects from those roles that are used to contain users. Adhering to this principal provides greater flexibility, leverages inheritance, and streamlines the overall implementation effort.
Logically Separating Out Roles
Role Level | Role Type | Purpose |
---|---|---|
Level 0 | System Roles | The predefined system roles that all custom roles should roll up to. |
Level 1 | Domain Roles | Used when you require groups of Level 2 and 3 roles separated for different environments, such as Dev, Test, and Production. |
Level 2 | Functional Roles | Used to contain users. This may be mapped to roles within an identity provider (IdP) which contains groups of users. |
Level 3 | Access Roles | Used to assign privileges on the actual objects within Snowflake. |
I prefer this simple, clean approach to adopting the RBAC framework Snowflake provides. It’s important to note that although inheritance has been leveraged, it has been carried out in a controlled manner. Object access is controlled in one layer and user access is controlled in another. This promotes reuse of roles while easing and simplifying ongoing operational management.
User and Application Authentication
Snowflake provides a range of capabilities to support authentication for both users and applications.
Interactive authentication for individual users
Authentication for applications using system accounts
Before I get into the specifics for these two methods, I need to provide you with a primer in multi-factor authentication, Security Assertion Markup Language (SAML), and OAuth.
Multi-Factor Authentication
Multi-factor authentication provides an additional layer of security by requiring a user to provide a secondary form of authentication over and above the standard login to Snowflake. This secondary form of authentication is typically a mobile device.
Increasingly, MFA is being rapidly rolled out across a variety of applications that store sensitive data such as banking apps. It aims to secure access for any user and their device regardless of their location.
MFA is a built-in feature for all Snowflake editions. It’s supported by an external company called Duo (owned by Cisco) although the actual MFA process is managed by Snowflake. As a user, you are required to download and install the Duo Mobile application to use this feature.
Due to the level of access it has over your account, Snowflake recommends always setting MFA for the ACCOUNTADMIN role.
Although you cannot enroll users in MFA, as an Account Administrator you can provide support if they don’t have access to their mobile device, by either disabling MFA or granting a period when they can bypass MFA.
MFA Caching
To reduce the number of times users receive the MFA prompt, you can enable MFA caching. This works by storing a version of the token on the client’s machine, which is valid for up to four hours. This parameter value is available at the overall account level only and therefore enabled globally across all users enrolled in MFA.
Before enabling MFA caching, consult with your own IT Security and Compliance teams to check policies and standards.
Security Assertion Markup Language
In this section on SAML , I’ll give you enough of a primer so you can make sense of it and relate it to the authorization methods I’ll discuss later in the chapter. Even though I won’t be taking a deep dive into everything on SAML, be prepared for a lot of acronyms; unfortunately, they’re unavoidable!
SAML offers an open standard which identity providers can use to provide a single sign-on experience. In practice, this means a user needs to provide one set of credentials up front to gain access to several applications. Fewer usernames and passwords to remember must be a good thing, right? Popular examples of IdPs are Okta and Microsoft Active Directory Federation Services (ADFS). In fact, these two particular IdPs provide native support for federated authentication using SSO. Other vendors may require a level of customization depending on the vendor. This is outside the scope of our discussion in this book, however.
SAML 2.0 was introduced in 2005 and is the de facto standard in helping to maintaining security within a federated authentication environment.
In the background, SAML uses XML (Extensible Markup Language) to handle the sharing of credentials between the IdP and a service provider (SP) which, in your case, is Snowflake. (I did warn you about the number of acronyms!)
OAuth
OAuth is an open standard to support authentication. I mentioned that it was introduced in 2005. The world has obviously moved on somewhat since then. SAML was (and still is) primarily focused on enterprise-scale security whereas OAuth is more suited to the mobile experience. OAuth uses JSON behind the scenes and not XML; this is another clue to the point in time when they were both introduced!
OAuth 2.0 is the latest version and what Snowflake supports. Whenever I refer to OAuth, I am referring to version 2.0 in all cases.
OAuth bridges some shortcomings of SAML in this space. You’re probably more familiar with using OAuth then you think. You can tell Facebook that it’s ok for Twitter to post to your feed without having to provide Twitter with your Facebook password, for example. This limits the number of companies that need to store your password, reducing risk for you as the user. This is OAuth.
There’s a risk of getting into lots of detail with all the different ways this process can work across a variety of scenarios. As we’re focusing on Snowflake, I am not going to subject you to all the information, just the most relevant areas.
Resource owner: The person requesting access to the data
Resource server : The API that stores the requested data
Client application : The application requesting access to the data
Authorization server : Runs the OAuth engine, verifies the client, and generates the token
If you don’t provide your password, how can applications verify you? This is done through a token, generated at the authorization stage by a server which verifies the user and returns a token permitting access to the data.
You need to use OAuth when attempting to make a connection using another client application, such as Tableau, and want an SSO-style experience. Snowflake has several partner applications where Snowflake provides OAuth support. In these cases, Snowflake acts as both the resource and authorization server .
Key Pair Authentication
Another alternative to providing a password to authenticate is to use key pair authentication. This reduces the risk of someone discovering your password and gaining access to your account.
It works on the basis of private and public keys. A pair of these keys are generated. Anyone can know the public key, as on its own it is useless. The private key should be secured, however, as this key generates signatures. These signatures can only be generated by the person who has the private key, while the public key can verify the signature as genuine.
Once you’ve generated a key pair, the public key is managed by Snowflake, while the secret key should be managed in a secret management platform such as AWS Secret Manager or Azure Key Vault.
- 1.
A user provides a username and password through the web UI.
- 2.
Username and password with the addition of MFA
- 3.
SSO using SAML
- 4.
OAuth process allowing access to Snowflake data
- 5.
Key pair authentication
Options 1 and 2: If you need to provide a username and password (option 1), Snowflake advises against using this approach because it is the least secure mechanism to use. In this case, you should introduce MFA and follow option 2 as a preferred method of authenticating.
Option 3: This is used for human access in a federated authentication environment, in situations when your organization uses an IdP to manage access control centrally and provides an SSO experience for employees. In this scenario, when a user attempts to log into Snowflake, it requests authorization from the IdP for the user trying to log in. The IdP checks the privileges for that user and sends a response back to Snowflake, either allowing or denying access.
Option 4: OAuth is used when a user wants to access data through a client such as Tableau.
Option 5: Designed for programmatic or service account access. Using a key pair authorization approach combined with a secrets manager is the way to go here. The client uses its private key, while Snowflake uses public keys to decrypt and authenticate .
Storage Integration
In Chapter 2, you enabled public access to your S3 bucket so you could see how data ingestion worked from an external stage. In the real world, you need to lock down access to any external stage without exposing the security credentials to users in the process.
The way to achieve this is by creating something called a storage integration. This object generates an entity that links to an IAM entity, which is used to access your external cloud storage location.
There are a few major benefits to this. Firstly, you don’t need to share your credentials with the users who need to access your external stage to load or unload data. Secondly, you can create a single storage integration object that points to more than one external stage. Finally, you can grant access just to this single object within your cloud provider’s account, simplifying the administration.
For more details on how to create a storage integration object, you can visit the official Snowflake documentation page here: https://docs.snowflake.com/en/sql-reference/sql/create-storage-integration.html.
Network Policies
The migration of data workloads to the cloud brings new considerations when deciding how to secure access to the data from a networking perspective. Previously, traditional data platforms were protected through the virtue of being buried deep within layers of private network perimeters.
- 1.
Native out-of-the-box network security
- 2.
Network policies
- 3.
Cloud service provider capabilities
It is worth pointing out that option 1 is mandatory and always in play. You cannot turn the native features off. Options 2 and 3 can be used independently or together depending on your own security requirements.
Option 1: Native Network Security
For the vast majority of customers, the out-of-the-box network security Snowflake provides should meet the requirements. Therefore, this is the most common configuration you should expect to see.
Snowflake uses Transport Layer Security (TLS 1.2) encryption for all customer communication. It also uses something called OCSP (Online Certificate Status Protocol), which checks the validity and integrity of the certificates used for establishing the TLS tunnels.
Option 2: Network Policies
By default, Snowflake allows users or applications from any IP address to access the service. To provide an additional layer of security and allow or deny specific IP addresses access to Snowflake, you can use network policies.
Option 3: Cloud Service Provider Capabilities
This approach involves an additional layer of security by leveraging the cloud service provider’s (CSP) private networking features. Here you can use AWS PrivateLink or Azure PrivateLink, which establish a private, point-to-point connection for all client-side initiated connections.
Additionally, you can layer the network policies and the CSP capabilities options on top of each other. This offers a comprehensive approach to network security management when using Snowflake.
Handling PII Data
The European Union’s General Data Protection Regulation (GDPR) was published in 2016 and implemented in 2018. During this time, I was working as a Head of Data for a FTSE 100 Insurer.
This quickly became a hot topic within the organization as we stored a lot of data relating to our customers. Suddenly we were in a position when we had to interpret what this legislation meant, along with the impact it had on our business process and technical architecture.
Of course, we were not alone in this process. It impacted a huge range of sectors. We’re now in the position where a lot of the legislation is clearer and better understood. This understanding has led to approaches and architectural designs that are GDPR compliant. It’s less a function of the data warehouse and more a function of the design you adopt. Therefore, spending some thinking time up front can save a lot of headaches in the future.
One of the major issues we found when working to bring our data practices in line with GDPR was the right to erasure (more commonly known as the right to be forgotten) in GDPR Article 17. Once an individual requests an organization to delete their PII from its database, the organization has a short period of time (between 30 to 90 days) to remove this personal data from databases, copies, and backups.
- 1.
How do you ensure all PII data is removed from the database?
- 2.
How do you cleanly remove this PII data without breaking the integrity of your data model?
- 3.
How do you make this process as efficient as possible for users to administer?
Separately Storing PII Data
The first recommendation is to physically separate PII data from non-PII data. This data might be in one table or a group of tables between two different schemas, for example. This allows you to pinpoint all the PII data you hold in your data warehouse and wrap appropriate controls around it.
Imagine the issues you could run into if you need to remove values from certain columns of PII data within an existing table? Deleting the entire row of data should be avoided as this may potentially remove non-PII data that is valuable for reporting or analytical needs. By cleanly separating the PII data away, you reduce the risk in breaking your data model or removing data unnecessarily in the future,
You will need to consider how your users will access this data, especially if a group of users requires access to both PII and non-PII data. Therefore, you must have a way of joining both sensitive and non-sensitive data together again. One way to surface this data to your users is to use secure views, which check what role the user is in before deciding what data to return.
This solution works elegantly with the RBAC hierarchy, as you can create and grant (or deny) privileges on roles to be able to access the data in the PII schema and tables.
Removing Data in Bulk
As part of the separate PII schema described in the section above, I like to add a couple of columns that allow records to be flagged for removal along with a date your organization received that request.
This allows you to batch up the removal of records and execute the process periodically within the time frames set out by the GDPR legislation. It’s far easier to manage and cater for an operational process that runs periodically rather than each and every time you get a request from an individual to move their data.
You should also consider your Time Travel settings in your PII data in Snowflake. Depending on your settings, Snowflake may have a version of this record stored beyond 30 days, which could allow this data to be mistakenly restored after it’s been removed.
Auditing
A unique identifier for the individual (obviously something that doesn’t identify the customer!)
The date the delete request was received
The date it was flagged for removal in the database
The date it was deleted
The metadata this table stores allows you to periodically audit your PII data to ensure no data has been inadvertently restored or the wrong data has been removed. In the latter case, Time Travel should be able to help rather than hinder!
Controlling Access to PII Data
At this stage in the process you’ve separated your PII data from your non-PII data by placing the data in separate schemas. Now you need to consider how to manage user access to this data.
For this, you’ll need to create a minimum of two access roles, one to access sensitive data and one that cannot.
You can then combine data from the two schemas using a view and grant access to query the view to both roles.
Within the view definition you can make use of data obfuscation and the CURRENT_ROLE function to determine how data should be returned to the user executing the query.
It’s possible to improve on this approach further using Dynamic Data Masking. Available in Enterprise Edition and above, this feature allows you to create a dynamic masking policy as an object in the database. This useful feature helps you control who can create (and modify) masking policies while centralizing the rules in one place. The masking policy is applied dynamically to any queries that reference the columns with a policy attached, so it’s very powerful. This approach promotes reuse and makes it easier if you need to adjust the logic for a PII field in the future.
Row Access Policies
Row access policies allow you to secure data at the individual row level based on the role executing the query. They are an excellent edition to Snowflake to centralized business logic and simplify the management of fine-grained access control.
- 1.
Define a policy and optionally define a mapping table.
- 2.
Apply the policy to one or more tables.
- 3.
Query the data.
Example Scenario
Imagine you have a sales table containing sales for all territories. You are working on a requirement to restrict data access so that sales managers responsible for a territory can see data only for their own territory. In this instance, you have a SALES MANAGER role who needs to see all data, a SALES EMEA role who can only see EMEA data, and a SALES_APAC role who can only see APAC data. This is an ideal use case to use a row access policy.
Steps
Step 3: Load the data from the sales_data_sample.csv file (provided with the book) using the web UI into the Sales table you created in step 1. Ensure you use the FF_CSV file format you created in the previous step.
After running through this example, you can hopefully see how powerful this could be within your own environment. I really like the simplicity and the ease of use of this feature. You can extend this approach by adding a simple mapping table between the role and allowed values. In this case, you would recreate the row access policy and look up the allowed values in the mapping table.
Advanced Snowflake Security Features
Future Grants
Future grants allow you to put in place rules that tell Snowflake how to manage privileges for certain types of objects before they are created. The aim of this option is to promote consistency across who can access what, as well as to simplify this process so you don’t have to apply privileges against each new object as you create them.
Applying a future grant doesn’t prevent you from adding additional finer-grain access controls around objects. Think of it as providing a general, standardized layer of controls across common objects.
Furthermore, a user could add a future grant on all for a role on a schema. This means you’ll be safe in knowledge that users attached to that role will always have access to that schema in the future. This means that you can create or replace objects within the schema without the need to use the COPY GRANTS command (which I cover in Chapter 8) since all of the grants will already be in place.
Managed Access Schemas
By default, when users create objects in a schema, they become the object owner for that object. This means they can grant access on this object to other roles or grant ownership privileges to other roles.
In certain cases, you may want to centralize the management of privileges to only certain roles, such as the schema owner or anyone with the MANAGE GRANTS privilege, while preventing other users from making decisions on who to grant access to. Managed access schemas allow you to control the behavior in this way. To do this, you use the WITH MANAGED ACCESS keywords which execute the CREATE SCHEMA statement.
The key takeaway here is that by applying managed access schemas, the ownership of objects is moved away from the object owner to the schema owner (or anyone with the MANAGE GRANTS privilege).
Summary
Stricter regulations and increasing scrutiny have forced organizations into taking full responsibility for the data they hold. As a result, database architecture and data management practices need to be well thought out, comprehensive, robust, and transparent to the consumer and the regulator.
Some may argue that this new focus on treating data as an asset has been a long time coming. Evidence of this was the scramble to patch up holes in numerous data solutions in time for legislation such as GDPR. Ultimately this makes companies better at safeguarding data. The companies that do this best win their customer’s trust and avoid reputational damage.
You covered a lot of ground in this chapter, from the standard roles Snowflake provides and how you can elegantly extend them to authentication for both users and applications and how you can secure and manage PII data in your data warehouse.
The aim of this chapter was to provide you with in-depth knowledge of the security features Snowflake offers and to arm you with best practice approaches to allow you to design the best possible foundation from the outset.
In this chapter, you looked at how to protect and control access to your data in Snowflake. The following chapter builds nicely upon this one. In it you’ll look at how you can protect the data itself.