

 This whitepaper is for historical reference only. Some content might be outdated and some links might not be available.

# Implementing pool isolation
<a name="implementing-pool-isolation"></a>

 The pool model is often the most appealing to SaaS providers. The efficiency, agility, and cost profile of pool is frequently what motivates providers to deliver in this model. Of course, as we move resources into a shared model, we have a much more challenging isolation story to tell. There is often a fundamental mismatch between the tools and mechanisms that provide isolation and the nature of tenants consuming a shared resource. This is further complicated by the fact that each resource we need to isolate in the pool model may require a different approach to enforcing isolation. While these challenges are real, they should *not* represent an opportunity to somehow relax your isolation requirements. This just means you’ll have to work a bit harder to find the right combination of tools and construct to isolate some resources in a pooled model. 

 Before we dig into some specific pool isolation techniques, let’s get a clear picture of how the pool model changes our approach to isolation. Generally, when we talk about isolating AWS resources, we focus on how AWS Identity and Access Management (IAM) can be used to control the interactions between resources. For a silo model, in fact, IAM represents a perfectly good model for expressing your tenant isolation policies. With the pool model, though, using these IAM constructs can be a bit more involved. The diagram in Figure 12 provides illustration of how silo and pool require separate isolation mindsets. 

![\[Diagram showing IAM and scoping access.\]](http://docs.aws.amazon.com/whitepapers/latest/saas-tenant-isolation-strategies/images/iam-and-scoping-access.jpg)


Here’s you’ll see two different ways of apply IAM policies to scope access of compute constructs. On the left we have two siloed deployments where tenants are running in their own infrastructure. These tenants are both accessing some other resource (in this case storage). When these instances were deployed, they were configured with separate IAM instance profiles for each tenant (tenant 1 and tenant 2). Since this binding was created at deployment time, we can be sure that these instances will be prevented from accessing the resources of another tenant. 

 On the right you’ll see an example where we’ve deployed compute nodes in a pooled model. The compute that is running here will be running on behalf of all tenants. This reality directly impacts how we can scope the IAM profile for the compute that is deployed here. Instead of constraining the compute to a specific tenant, we must deploy these compute nodes with a profile that is open enough to support all tenants. This wider scope is where we run into the real challenges of the pool model. Now, we’ll need to come up with new ways to implement the scoping of access that is enforced by your SaaS solution. 

 Given this unique aspect of pool isolation, you’ll find that the options for implementing pool isolation will vary significantly. While it’s beyond the scope of this paper to explore all the permutations of pool isolation, we can examine some common patterns to get a better feel for the different strategies that are often applied. The sections that follow will provide an overview of these strategies. 

**Topics**
+ [Run-time, policy-based isolation with IAM](run-time-policy-based-isolation-with-iam.md)
+ [Scaling and managing pool isolation policies](scaling-and-managing-pool-isolation-policies.md)
+ [Pooled storage isolation strategies](pooled-storage-isolation-strategies.md)
+ [Application-enforced pool isolation](application-enforced-pool-isolation.md)
+ [Pool for any resource](pool-for-any-resource.md)
+ [Hiding the details of pooled isolation](hiding-the-details-of-pooled-isolation.md)

# Run-time, policy-based isolation with IAM
<a name="run-time-policy-based-isolation-with-iam"></a>

 In the pooled environment, SaaS providers will typically turn to IAM to find a strategy to isolate their resources. However, as noted above, you’ll need to be creative with how you apply IAM to achieve isolation in a pooled model. Instead of inheriting the IAM scoping of your compute node, you’ll need to introduce your own code that will provide run-time enforcement of your pooled isolation model. The diagram in Figure 13 provides a conceptual view of this model. 

![\[Diagram showing runtime acquired scoping.\]](http://docs.aws.amazon.com/whitepapers/latest/saas-tenant-isolation-strategies/images/runtime-acquired-scoping.png)


In this diagram, you’ll see that we have a microservice that needs to access some downstream resources (databases, S3 buckets, etc.). This microservice was deployed in a pooled model, which means that it will be processing requests from multiple tenants. The job of this microservice is to ensure that, as it processes these requests, it will apply constraints that will prevent tenants from crossing a boundary to another tenant’s resources. In the diagram, you’ll see that our microservice reaches out to the isolation manager to acquire a scoping context that is used to control interactions with and resources that are accessed by the code running in this microservice. 

 This conceptual model provides some view of the moving parts. However, to see this in action, we need to look at a more concrete strategy that explain how this context is express and applied. The diagram in Figure 14 provides a more in-depth look at how IAM can be used as part of this run-time scoping of access to tenant resources. 

 Here you’ll see the full lifecycle of configuring and applying policies in a run-time model. In the first step of this process, the tenant onboards to your system. During this process, they setup the user for our tenant as well as the IAM policies for that tenant (steps 2 and 3). Once the tenant has onboarded, we then hit the microservice of our application (step 4). Because this microservice is running in a pooled model, it has been deployed with a broad IAM scope that enables it to access resources for all tenants. Our job, then, is to look at each request that is sent to this service and narrow the scope of that request based to a single tenant. We do that by asking the isolation manager for a set of credentials that are specific to the current tenant (step 5). This isolation manager will look-up the IAM policies for the tenant (step 6) and generate a tenant scoped set of credentials that are returned back to the calling microservice. Finally, this microservice uses these credentials to access a database (step 7). With these new tenant-scoped credential, the code of the microservice will be prevented from accessing the resources of another tenant. 

![\[Diagram showing scoping with IAM policies.\]](http://docs.aws.amazon.com/whitepapers/latest/saas-tenant-isolation-strategies/images/scoping-with-iam-policies.png)


 In this model, we’re essentially saying that our microservice will have this tenant context applied each time it attempts to access another resource. This scoping is applied as a matter of an agreed upon convention where the microservice is expected to always acquire new credentials before accessing a tenant resource. 

# Scaling and managing pool isolation policies
<a name="scaling-and-managing-pool-isolation-policies"></a>

 While IAM policies provide powerful isolation constructs, they can also present SaaS providers with scaling challenges. If your system has a large number of tenants with a large population of policies, you may find that you will exceed the limits of the IAM service. You may also find it difficult to manage these polices as the number of tenants and the complexity of these policies grow. In these situations, some SaaS companies will attempt to alternate approach to how they generate and manage their IAM policies at run-time. 

 One approach to this challenge is to shift to a model where your IAM policies are generated in at run-time. The idea here is to have your system implement a mechanism that will examine the current context of a call and generate the required IAM policy on-the-fly. This moves the policies out of IAM (since they are transient) and enables you to address potential limits on the number of policies that are needed to support all of your tenants. The diagram in Figure 15 provides an overview of this dynamic policy generation mechanism. 

![\[Diagram showing dynamic policy generation using a token vending machine and token generator.\]](http://docs.aws.amazon.com/whitepapers/latest/saas-tenant-isolation-strategies/images/dynamic-policy-generation.jpg)


 In this flow, you’ll see that we start with the same isolation manager that we used in our prior example. However, instead of going directly the IAM to retrieve the policies need to scope access, we have a series of steps that are used to generate a policy. The isolation manager first makes a request to the token vending machine to get a tenant scoped token (step 1). It’s the job of the vending machine to go to the templates that you have pre-defined for your tenant isolation model (step 2). Think of these as template files that have all of the moving parts of a traditional IAM policy. However, key elements of the file are not filled in (those that represent our tenant context). You might, for example, fill in a table name or the leading key condition of an Amazon DynamoDB table with a tenant identifier. 

 Once you have the template that’s needed, you now call out to the token generator to request a token (step 3). In this step, we also provide the current tenant context. The token generator then fills the tenant details into the template, leaving us with a fully hydrated IAM policy (steps 4 and 5). Finally, the token generator uses this policy to generate a token that is scoped according to the provided policy. This token is returned back to the isolation manager (steps 6 and 7). Now, this token can be used to access resources with the tenant context applied. 

 By moving these policies into templates, you take on the added responsibility of assuring that these policies enforce your tenant isolation requirements. Ideally, the details of this mechanism will be mostly outside the view of developers so the potential for something to go wrong is reduced. 

 One upside here is the management profile of this model. Should you choose to change something about your isolation policies, the path to applying these changes will be much more straightforward since there won’t be a separate policy for each tenant. That, and you’ll own the content lifecycle of these policy templates (versioning and deploying them through your own pipeline). 

# Pooled storage isolation strategies
<a name="pooled-storage-isolation-strategies"></a>

 Isolating data in a pooled model is an area that gets lots of attention from SaaS providers. As data is co-mingled, SaaS developers become hyper-focused on identifying ways to ensure that each tenant’s data is protected. In fact, while many SaaS providers are intrigued by the cost, management, and agility profiles of the pool model, they will often default to a silo model purely to address expected pushback they may get from customers that will may be hesitant to accept pooling of their data. 

 The general notion of pool storage isolation (for any storage service) is that the data for all tenants is represented in a shared storage construct. The diagram in Figure 16 provides an illustration of pooled storage. 

![\[Diagram showing pooled storage.\]](http://docs.aws.amazon.com/whitepapers/latest/saas-tenant-isolation-strategies/images/pooled-storage.jpg)


Here you’ll see that we have a product microservice that is storing its data in a pooled model. The table has an index in the first column that represents the key for each tenant. All of the tenant product data resides in this one table. 

 With this model, the challenge of isolating the data becomes much more complex. How do you create some virtual view of this table that is constrained to just those rows that belong to a given tenant? Also, how will this isolation be realized spanning each of the AWS storage services? The reality is, each service may require its own unique approach to implement isolation in the pooled model. 

 To get a better sense of this variation, let’s start by looking at one example of how you might use IAM to implement pooled isolation with DynamoDB. As a fully managed storage service, DynamoDB offers you a rich collection of IAM mechanisms to control access to resources. This includes the ability to define a *leading key* condition in your IAM policy that can restrict access to the items in a DynamoDB table. The IAM policy shown in Figure 17 provides an example policy that demonstrates this approach to isolation. 

 The key area to focus on in this policy is the condition. This condition indicates that, when this policy is applied, all attempts to access the DynamoDB table will be limited to items that have key that matches the value of this leading key. So, in this case, the tenant identifier would be in the leading key, constraining access to data for a given tenant. 

![\[Screen capture showing an example IAM policy for DynamoDB isolation with leading keys.\]](http://docs.aws.amazon.com/whitepapers/latest/saas-tenant-isolation-strategies/images/dynamodb-isolation-with-leading-keys.jpg)


 Now, if we look at employing this same isolation model to Amazon Aurora PostgreSQL, you’ll see that the mechanism is quite different. With Aurora PostgreSQL, you cannot use IAM to scope access to data at the row level. Instead, you’ll need to use the row level security (RLS) feature of PostgreSQL to isolate your tenant data. The diagram in Figure 18 provides a simple example of how you’d setup RLS for a product table in your system. 

![\[Screen capture showing creating pooled isolation with PostgreSQL RLS.\]](http://docs.aws.amazon.com/whitepapers/latest/saas-tenant-isolation-strategies/images/pooled-isolation-with-postgresql-rls.jpg)


 The first step in configuring RLS is to alter your table to enable row level security for that table. Then, you’ll create an isolation policy for that that requires the tenant\$1id column to match the value of the current user (which is supplied contextually). Now, with these changes in place, all interactions with this table will be restricted to the rows that are valid for the current tenant. 

 In contrasting the DynamoDB and Aurora PostgreSQL approaches, you can see that you’ll need to do some exploration with each storage service that you are using to find a model that will let you achieve isolation. There are also cases where services may not offer a more granular isolation model. In these cases, you’ll have to introduce your own mechanisms to enforce your pool isolation policies. 

# Application-enforced pool isolation
<a name="application-enforced-pool-isolation"></a>

 Most of our attention so far has been on strategies for using IAM as the foundation our pooled isolation model. And, while IAM often represents a great fit for isolating resources, there can also be scenarios where IAM may not support the flavor of isolation that your application requires. This is where you may have to fall back and look at introducing other frameworks or tools to control access to your application’s pooled resources. 

 Application-enforced isolation typically includes some model where you express policies (much like you do with IAM). These same frameworks often include policy enforcement mechanisms that will sit between you and your resources, authorizing your access to the resources. The diagram in Figure 19 provides a high-level conceptual view of the moving parts that might be part of an application-enforced policy model. 

![\[Diagram showing application-enforced pool isolation.\]](http://docs.aws.amazon.com/whitepapers/latest/saas-tenant-isolation-strategies/images/application-enforced-pool-isolation.jpg)


 In this example, your tenant would authenticate against an identity provider and introduce some construct that will identify the policies that were defined for this specific user (this could also happen in a downstream process). The key here is that the policies would then be connected to your user’s identity, enabling downstream operations apply these policies in the context of a given user. Once you’ve authenticated, your identity would flow through the services of your system. Here there would need to be a library or process that would sit between your code and the resource you’re attempting to access, applying the policies that were bound to you as a user. 

**Note**  
This approach is only meant to represent a conceptual model. The strategies that are employed by each framework might take a different approach to expressing and applying their policies. 

 It’s worth noting that the boundaries of policy-based isolation and role-based access control (RBAC) often get blurred as part of this discussion. The tooling here, in fact, often contributes to this confusion. As a generality, though, we wouldn’t want to equate RBAC to tenant isolation. In many cases, RBAC has a functional mapping where user roles (defined by an application) are used to control access to a system’s functionality. That scope is different than drawing boundaries of isolation between the tenants of your system, which is less about a functional goal and more about preventing one tenant from accessing another tenant’s data. 

# Pool for any resource
<a name="pool-for-any-resource"></a>

 Our coverage of pool here highlights the fundamental moving parts of implementing a pooled strategy. However, it does not touch on how pool might land in every AWS service. That is beyond the scope of this paper. That being said, the concepts and tradeoffs of pool isolation tend to be similar for most resources. As you look at the range of additional AWS services, you’ll find yourself balancing the available isolation mechanisms with the efficiency of having a resource that is shared by tenants. In an ideal scenario, you could use a pooled model for every resource and still achieve all of your isolation goals. The reality is, though, you’ll find scenarios where the isolation model for some resources will be challenging. In these cases, this may push you toward a silo model. That, or you’ll absorb the effort to use some flavor of application-enforced isolation to realize your isolation goals. 

# Hiding the details of pooled isolation
<a name="hiding-the-details-of-pooled-isolation"></a>

 As we mentioned previously, one key aspect of the pool model is that it relies on developers to conform to the overall model. Developers must, as a matter of convention, acquire the scoped context before accessing resources. Given the importance of compliance with this model, you’ll often see companies creating mechanisms that simplify a developer’s ability to align with the isolation policies adopted as part of a SaaS offering. 

 The general approach here fits very much with common design best practices. This usually translates into the creation libraries, modules, or lightweight frameworks that are shared by teams. The goal here is to move the mechanics of acquiring a scoped context into shared constructs that can be leveraged across your team. This diagram in Figure 20 provides a conceptual view of this notion of hiding away the details of isolation. 

![\[Diagram showing using libraries for isolation standardization.\]](http://docs.aws.amazon.com/whitepapers/latest/saas-tenant-isolation-strategies/images/using-libraries-for-isolation-standardization.jpg)


 Here you’ll see that we have two microservices (product and order) that need to acquire credential to comply with the pooled isolation model of our system. What we’ve done here is moved all of the code and details of this process to shared libraries (these are not separate microservices). When our microservice needs scoped credentials, it will call into the isolation manager, passing in a JWT token that that was supplied to the microservices. This isolation manager will then get the tenantId from the token manager, which owns all the logic associated with cracking open the JWT and extracting tenant information. It will then get the policy for this tenant from the policy manager and use that policy to get a set of tenant-scoped credentials. These credentials would then be returned to the calling service. 

 There’s nothing especially unique about this approach. This is simply applying the basic strategy of ensuring that reusable constructs are extracted so they can be versioned and shared more universally by your team. The key concept here is that you should attempt to push as much of the details of tenant isolation away from the view of your developers, making as simple as possible for them to apply your isolation scheme. 

 How you choose to implement this could also be influenced by the stack or compute construct that your application uses. With Lambda, for example, it may make sense to move these libraries to Lambda Layers where these horizontal concepts are versioned separately and universally referenced by you Lambda functions. 

 You may also look to introduce mechanisms that will take this completely outside of the view of your developers, intercepting and acquiring these scoped credentials before you get into the implementation of your microservices. With some languages, for example, you could use aspects to intercept incoming requests, acquire the scoped credentials, and inject them into the microservice. With Lambda functions, there are various open source *wrapper* libraries that could be used to inject scoped credentials into a Lambda functions. For some, these strategies may provide a stricter model for enforcing isolation 