Jobs

AO Jobs
  • News

Top 5 Tips for hosting internal apps in the Cloud

Back to articles

Welcome to the Agent Experience Team! Here, we build and host software for our internal users in the contact centre and the wider business, so when you phone AO to discuss your order – our AO’ers are using software built in-house to help you with your queries. We are a part of the Customer Experience domain here in AO.

With the protection of customer data being front-and-centre in our mind, but still wanting to offer our AOers the advantages of cloud compute platforms by building scalable and robust applications (we’ve got a few AWS certified people in the team – cloud practitioners, developers and solution architects) – we have a few specific considerations in this team.

One thing that we have noticed as we’ve been building solutions is that most of the documentation and examples you’ll come across online are geared towards public-facing websites, with much fewer examples or blogs about how to create scalable applications for internal consumption.

With that in mind, here are our Top 5 Tips for building scalable applications for internal consumption. We hope you enjoy them, and please feel free to reach out with your comments or your own tips for building on AWS for internal apps.

Let’s get started…

1. Hosting lambda functions in a VPC

A VPC is a Virtual Private Cloud, or in effect, an isolated network within your AWS account. IPV4 ranges in a VPC exist in the private CIDR ranges – for example, 10.0.0.0/8 or 192.168.0.0/16.

When you create resources in a VPC, they exist in an isolated network. To be able to “talk to” these resources, you need to be on the same network. AWS offers several connectivity options to connect to the VPC and be able to ping/access your resources; but they all rely on up-front setup and layers of authentication. We’re fortunate in AO to have experts in our Infra and DevOps teams who help with the setup and sharing good practice, but it’s still really important for devs in the teams to understand how it all works.

We deploy as much as possible into VPC’s. Why? Because it puts a physical network boundary around your resources and gives you one layer of protection. We rely on many layers of protection, so that if one fails, we remain safe and uncompromised. The VPC with its associated security groups and Network ACL’s, is one of our most important safety nets.

When it comes to lambda functions, by default these will be run by AWS within a secure VPC with access to AWS services and the Internet. This is a Lambda-owned VPC, not connected to any other VPC’s (including the default one) in your account. However, it’s relatively simple to specify a VPC of your choice to run in (by specifying which subnets of that VPC you’d like to use), meaning you can make your serverless functions run in an environment where they can access all your other internal resources.

We rely heavily on infrastructure-as-code and tend to use the Serverless framework for deploying our lambda functions, so here’s an example of one running in a VPC (courtesy of serverless.com):

It’s worth remembering that when you’ve deployed your lambda into a VPC, it will use the route tables within that VPC to make network requests. So, if your lambda needs to consume services on the Internet, for example, your VPC will need a way to route traffic, through a NAT Gateway or some other solution. If your function has dependencies on other AWS services for example DynamoDb or S3, then you can use a VPC Endpoint for connectivity.

Pro tip: Make sure that the subnets you’re deploying to have a decent chunk of available IP’s, within the block allocated to them. If you choose a really small CIDR block with only a few dozen IP addresses available, you’ll severely restrict the concurrency of your lambda functions and will see weird EC2-type error messages in your Cloudwatch logs when it tries to scale, unsuccessfully. Lambda, as you know, scales on demand – but it still needs to be able to allocate addresses within the VPC to the underlying resources. So make sure you’ve got plenty so this isn’t an issue for you.

 

2. Hosting API’s inside your VPC

As mentioned we like to deploy as much as possible in our VPC. Once we’ve got our lambda functions in there (some are triggered from events such as messages being posted to queues), the next step is to host our API’s.

Our preference in the team is C#, .NET Core API’s running in Lambda.

On the surface of it, API Gateway also supports creating private API’s out of the box, but one particular challenge we had was associating custom domain names. For us, this was important for a couple of reasons. 1- The custom domain name is an outward statement of ownership. It includes our team name in the DNS name so others know who owns it. But more importantly: 2- It allows us to tear down a stack and rebuild, with a predictable name that will still be consistent.

When you create an API within a VPC, you’ll get the standard https://{restapi_id}.execute-api.{region}.amazonaws.com/{stage_name}/ format of URL, which if you ping, you should see resolves to your private CIDR range of your VPC. This might be enough for you – but bear in mind that if you do tear down the stack and rebuild it, you’ll get a fresh restapi_id, and your URL will change which could break dependant apps.

Unfortunately – the custom domain names facility of API gateway does not work out of the box with private API’s. So you need to follow a workaround.

The workaround involves you creating an application load balancer within your VPC (we have three – one for beta, staging and live). One of the great things about working at AO is that we have AWS Enterprise Support here. If you’ve never experienced AWS support before – it’s first rate, truly excellent. You can create live chat, telephone or ticket-based queries about anything – even general guidance questions. We lean on AWS Support to validate our theories when we’re unsure, and it’s great to know you have that true expertise at your fingertips.

So what was the workaround? Our final solution was something approximately similar to this Medium article posted by George Mao. We’ll let you read this article for the details and some architecture diagrams that help bring it to life. To echo George, hopefully one day this will be possible natively without having to workaround. But for now it gives us what we need – privately hosted APIs, in API Gateway, backed by lambda – with all the availability and auto-scaling benefits you get associated with that – with custom domain name support.

Pro tip: Take a look at AWS X-Ray, which allows you to analyse and debug production, distributed applications. In the Agent Experience Team, we have created some really interesting X-ray dashboards which show you where your bottlenecks are.

3. Hosting your React or other static-file-based front-end websites inside your VPC

You might be noticing a pattern here 🙂 Now we’ve got our lambda functions and our APIs running inside our VPC, the next requirement is to host our front-ends inside the VPC also. You might ask yourselves why – front ends to portal-type applications are pretty useless without being able to call any of the dependencies on which they rely, so what’s the risk?

As mentioned earlier, we build and rely upon many layers of security and never just one. Making a front-end publicly available can still give information to bad actors that you’d probably rather they didn’t have – locations of your APIs and source code which processes the responses (thus revealing the API structure) can be quite valuable information – plus, if you’re building software to give AO’s contact centres a competitive advantage in the industry, you don’t necessarily want that leaked.

We tried a few things – static website hosting via S3 buckets with IP address restrictions (which didn’t support HTTPS, so moving on to….) Cloudfront distributions with WAF firewalls in front of them (also with IP address restrictions) were our first favourites (they were serverless technologies, pay-by-usage, built for fast and reliable content delivery) – until the pandemic hit us and working from home because the main model. Maintaining allow-lists of hundreds of IPs for people, some of which rotated daily, soon became tiring.

Eventually, we opted for Elastic Beanstalk and have a small cluster of Amazon Linux 2 instances which host our React front-end behind an Application Load Balancer, with a Route 53 alias to the load balancer, giving us an https:// portal, strictly available only to internal users but without having to concern ourselves with IP restrictions.

We use a Bitbucket Pipelines Pipe to trigger the deployment of our application, so we can still get commit-to-deploy times in our CI/CD environment of under a couple of minutes.

We leverage Elastic Beanstalk’s automated version updates so that we don’t need to worry about server patching, and health checks attached to the auto-scaling groups ensure that if an instance becomes unhealthy it is quickly terminated and replaced. All of this gets largely orchestrated out-the-box for you with Elastic Beanstalk which frees our engineers to deliver value.

Finally – we can use small, cheap instance types. Because all they’re doing is serving static content and not doing any actual processing (which is done by our lambda functions / APIs), it means we can choose smaller instance types and pay less money.

Pro tip: Check out immutable deployments in Elastic Beanstalk, they replace your instances with brand new ones every deployment. One of the main advantages is that it forces you into all the setup of your instances to be scripted – logging on to an instance is no longer an option!

4. Using JWT authentication from Azure AD to secure your services

Our next tip is another security-related one. There’s a couple of principles here – firstly, never roll your own authentication – and secondly, don’t rely on API keys for security.

API keys sometimes seem like an obvious and popular choice –  “API Keys” actually offers somewhat a false sense of security, as it does sound like it would be a suitable choice for authentication. Also, API Gateway gives you an out-the-box solution to require a key to access a service and return 403 in the absence of one – so it feels like a natural fit for securing your API, right?

In truth – they’re good for logging, throttling or auditing usage of your APIs; but they’re unsuitable for securing them. Why? They’re usually long-lived, available plain-text in the AWS console, hard to keep track of who knows them, easy to commit to source control (violating: never commit secrets to source control) and a bit troublesome to rotate without having to coordinate deployments around rotation, or using secrets managers.

In Agent Experience, our users are corporate employees of AO – and our store of corporate users is Active Directory which, handily, we can access via Azure AD.

We register our portal as an app in Azure AD, then use the MSAL library to authenticate users. They use their existing corporate username/password and 2FA tokens, and we get back a JWT token which lasts for an hour or two. [Reference here and here]

Owen, a software engineer on Agent Experience, gives a little more about this solution for which he was the tech lead:

We have the frontend registered in the Azure Portal, requiring users to login to authenticate, where we have defined a scope to allow Graph API calls.

An example of this would be making a graph API call to retrieve groups that our user is a member of. Then we have our backend also registered within the Azure Portal and have added our frontend as an authorised client application.

We can then fetch separate JWT’s from our frontend to authenticate access to the graph API along with our custom backend.

One advantage of this is that a different registered application in the Azure Portal which another team owns would not be able to generate valid tokens for our API without explicit configuration.

If there ever was a need for another team to generate a token for our APIs, we could configure this as needed.

We’ve mentioned that we rely on many layers of security and here is another example.

In our backend APIs (already built within a VPC), we expect a bearer token to be provided in the form of this JWT token. We call a Microsoft API to validate the token – which ensures the token is correctly signed by a trusted issuer (Azure AD) and has the right permissions to invoke our API. If anything is wrong, we return a 403.

Pro tip: When it comes to security, you’re only as strong as your weakest link. If you use MSAL to authenticate access to your frontend, but leave your backend services callable without a token, you leave yourself open.

5. Running API tests in your pipeline

Our final tip for creating internal applications running in the cloud is how to test your API’s in a pipeline.

If you’ve deployed an API on a public-facing endpoint, then it’s a simpler task – because whether you’re using Bitbucket Pipelines, Github Actions, Azure DevOps Pipelines, TeamCity – or anything in between – you can probably figure out a task to invoke an API and test its response, using a framework of your choice.

However, when all of your services are running in the isolated network of a VPC, you might wonder how you’re able to run these tests.

If you find yourself trying to figure out how to connect your Bitbucket Pipelines image up to your VPC, perhaps you’re considering making it initialise a VPN connection as part of the image; or perhaps you’re trying to figure out how to expose your API through some sort of proxy – then I recommend you stop for a moment – there’s got to be a better way.

The purpose of deploying your APIs into a VPC was to keep them secure – trying to open holes undermines your security.

Instead, we shifted our pipelines into AWS Codepipeline. You can easily integrate these with Github or Bitbucket, if you don’t want to host your code in AWS CodeCommit.

However they’re yml-based pipelines which run inside your AWS account – and guess what – you can use it inside your VPC! Happy days – your tests can run and access all your VPC resources as they’re already on the right network.

Pro tip: Another advantage of using AWS Codepipeline for your full pipelines, is that you can make them run under an IAM execution role with the right permissions to create the resources your pipeline needs when it runs. No more extracting IAM keys and having to put them in environment variables!

Thanks for reading!

We hope that this has given you an insight into some ways of building robust, scalable software leveraging the advantages of the cloud but in a safe and ringfenced way, isolating for your internal users.

Remember to build in multiple layers of security as you go – we treat the safety and security of our customer data as our number one priority; and by building systems and designing architectures with many different layers of security and authentication, you should never find yourself with a single point of failure.