Deep Dive - Codiply.com

Multi-region multi-account deployments with AWS CDK Pipelines

Panagiotis Katsaroumpas, PhD — Wed, 25 Oct 2023 17:42:29 GMT

This post describes how you can create an AWS CDK project that defines resources in multiple regions and then automatically deploy it in multiple environments (AWS accounts) with CDK pipelines.

Full code

In this post, I am including the most important snippets of code. You can find the complete example project here github.com/codiply/multi-account-multi-region-cdk-pipeline-example. This is where you can find, for example, all the Python dependencies (see the requirements file), or all the imports.

In the code in this post, you will encounter parsed configuration objects. These objects are built from configuration files in YAML format. You can find the actual .yaml files and the configuration code in the GitHub repo in the config/ directory.

Accounts

A very common setup is to have 3 environments: Development (DEV), Preproduction (PRE) and Production (PRO). In addition, a 4th account (tools account) will be used to house the CI/CD pipeline.

In the examples below, I will use the following as AWS account numbers

DEV: 111111111111
PRE: 222222222222
PRO: 333333333333
TOOL: 999999999999

Manual steps

GitHub access token

I am hosting my code on GitHub. For Code Pipelines to be able to access the code, I generate a github access token with scopes repo and admin:repo_hook. This token needs to be stored as a secret in Secrets Manager in the tools account and in the region where the CI/CD pipeline will be deployed.

Bootstrap AWS regions for CDK

Bootstrapping of AWS accounts is described here. For how to install the AWS CDK toolkit see here.

In the account/region where the pipeline is deployed (the tools account), run

cdk bootstrap aws:/// \  --profile  \  --cloudformation-execution-policies arn:aws:iam::aws:policy/AdministratorAccess

In all account/region pairs where resources are deployed by the pipeline, run

cdk bootstrap aws:/// \     --profile  \    --cloudformation-execution-policies arn:aws:iam::aws:policy/AdministratorAccess \     --trust

Composition

Before diving into CDK code, here is a diagram summarising the composition of different classes. I hope this will aid you to understand how the pieces are put together.

CDK Code

Constructs

The smallest building block is a Construct. This is a reusable component containing one or more resources, for example, a VPC. (full code)

class Vpc(Construct):    def __init__(        self,        scope: Construct,        construct_id: str,        props: VpcProps,        **kwargs: typing.Any,    ) -> None:        super().__init__(scope, construct_id, **kwargs)        vpc_name = props.naming.prefix        if props.name_suffix is not None:            vpc_name += f"-{props.name_suffix}"        vpc = ec2.Vpc(            self,            "vpc",            ip_addresses=ec2.IpAddresses.cidr(props.cidr or "10.0.0.0/16"),            max_azs=props.max_availability_zones,            nat_gateways=props.nat_gateways,        )        cdk.Tags.of(vpc).add("Name", vpc_name)

Stacks

One level higher, we create Stacks, for example, a networking stack. (full code)

class NetworkingStack(cdk.Stack):    def __init__(        self,        scope: Construct,        construct_id: str,        props: NetworkingStackProps,        **kwargs: typing.Any,    ) -> None:        super().__init__(scope, construct_id, **kwargs)        Vpc(            self,            "vpc",            VpcProps(                naming=props.naming,                cidr=props.networking_config.vpc_cidr,                max_availability_zones=props.networking_config.max_availability_zones,                nat_gateways=props.networking_config.nat_gateways,            ),        )

Stages

One level higher, we have stages. Stages can span regions. Notice that I create several copies of the networking stack, one in each region. I set the region part of the environment env=cdk.Environment(region=region) within the stage. (full code)

class ArchStage(cdk.Stage):    def __init__(        self,        scope: Construct,        construct_id: str,        props: ArchStageProps,        **kwargs: typing.Any,    ) -> None:        super().__init__(scope, construct_id, **kwargs)        config = load_arch_config(deployment_id=props.deployment_id, environment_id=props.environment_id)        for region in config.stacks.networking.regions:            NetworkingStack(                self,                f"networking-{region}",                NetworkingStackProps(naming=config.naming, networking_config=config.networking),                env=cdk.Environment(region=region),            )

Pipeline Stack

Finally, I create a pipeline stack. (full code) This stack is going to be deployed to the Tools account.

First, I create the pipeline.

pipeline = pipelines.CodePipeline(    self,    "pipeline",    pipeline_name=f"{project_name}",    synth=pipelines.ShellStep(        "Synth",        input=pipelines.CodePipelineSource.git_hub(            repo_string=config.cicd_pipeline.github_repo,            branch=config.cicd_pipeline.git_branch,            authentication=cdk.SecretValue.secrets_manager(config.cicd_pipeline.github_token_secret_key),        ),        commands=[            "npm install -g aws-cdk",            "python -m pip install -r requirements/requirements.txt",            "cdk synth",        ],    ),    cross_account_keys=True,)

This is where the GitHub repo and the main branch are defined, together with the key in Secrets Manager where the GitHub token is stored.

Then I create one stage for each environment. If you have several stages per environment, you can create a Wave and add the stages to the wave. All stages in a wave are deployed in parallel.

for environment_id, account_config in props.accounts_config.accounts.items():    if account_config.is_enabled and not account_config.is_cicd_account:        wave = pipeline.add_wave(f"wave-{environment_id}")        account_config = props.accounts_config.accounts[environment_id]        account_id = account_config.account_id        if account_config.needs_manual_approval:            wave.add_pre(                pipelines.ManualApprovalStep(                    f"approve-{environment_id}", comment=f"Approve deployment to {environment_id}"                )            )        wave.add_stage(            ArchStage(                self,                f"{project_name}-{environment_id}",                props=ArchStageProps(deployment_id=props.deployment_id, environment_id=environment_id),                env=cdk.Environment(account=account_id),            )        )

Notice that for each Stage that I am instantiating, I am setting only the account in the environment env=cdk.Environment(account=account_id). The regions are set within the stage code.

For environments PRE and PRO, I have included a manual approval step.

Deploy the pipeline

In my example, I have configured the pipeline to be deployed in eu-west-1 and I create a networking stack in 2 regions eu-west-1 and eu-west-2.

When I list the stacks, I see the following.

> cdk lsexample-cdk-pipeline-pipelinecross-region-stack-999999999999:eu-west-2example-cdk-pipeline-pipeline/example-cdk-pipeline-dev/networking-eu-west-1example-cdk-pipeline-pipeline/example-cdk-pipeline-dev/networking-eu-west-2example-cdk-pipeline-pipeline/example-cdk-pipeline-pre/networking-eu-west-1example-cdk-pipeline-pipeline/example-cdk-pipeline-pre/networking-eu-west-2example-cdk-pipeline-pipeline/example-cdk-pipeline-prod/networking-eu-west-1example-cdk-pipeline-pipeline/example-cdk-pipeline-prod/networking-eu-west-2

First, you have to make sure that you have pushed your code to the main branch of your GitHub repo. Then, deploy the pipeline stack.

cdk deploy example-cdk-pipeline-pipeline

This will create the Code Pipeline project, and it will trigger it.

From that point forward, you do not need to deploy anything manually, not even when you make changes to the pipeline stack. The Code Pipeline project is self-mutating.

While developing, you can deploy individual stacks from your development environment by passing the full name of the stack. For example:

cdk deploy example-cdk-pipeline-pipeline/example-cdk-pipeline-dev/networking-eu-west-1

When using cdk deploy, you will need to have the right credentials in your environment for the account you are deploying to. Alternatively, if you are using profiles then pass --profile .

Running the pipeline

The pipeline runs automatically when new code is merged into the main branch. You can also trigger it manually by clicking Release Change in the page of the pipeline Management Console.

After deploying to DEV, the pipeline will wait for manual approval before deploying to PRE. The same will happen before deploying to PRO environment.

Cleanup

Delete manually all Cloudformation stacks. Do not forget to do that for all AWS accounts and regions. See the output of cdk ls to see all stacks that need to be deleted.

Alternatively, you can do the following for each stack from the command line

cdk destroy --profile

Wrap up

In this post, I described how you can create a CDK project that creates resources in multiple regions, and then I deployed the project in several environments (AWS accounts) with CDK Pipelines.

Do not forget to check the full code in the GitHub repo: github.com/codiply/multi-account-multi-region-cdk-pipeline-example

String interpolation in YAML with Python

Panagiotis Katsaroumpas, PhD — Fri, 13 Oct 2023 22:00:00 GMT

String interpolation is not a feature of YAML. In this post, I will present a quick way to perform string interpolation in your configuration files written in YAML format. For that, I will use Jinja syntax to define the placeholders in key values and process the .yaml files with Python.

Goal

My goal is to

merge several .yaml configuration files into a single configuration object,
the configuration files are processed in order, and later configs could potentially overwrite the values of keys defined in previously processed configs,
use placeholders in values that reference the values of other keys in the same or different .yaml file.

Example

Let's see an example. If I have 2 .yaml files that are loaded in the following order

project:  name: project  environment: devstorage:  bucket: "{{ project.name }}-{{ project.environment }}-{{ aws.account_id }}"

project:  name: yaml-interpolationaws:  account_id: "123456789"user:  username: "codiply"  user_arn: "arn:aws:iam::{{ aws.account_id }}:user/{{ user.username }}"  storage_path: "s3://{{ storage.bucket }}/{{ user.username }}"

I want the final result to be a Python dictionary (benedict dictionary specifically) that contains the configuration of this YAML file

aws:  account_id: '123456789'project:  environment: dev  name: yaml-interpolationstorage:  bucket: yaml-interpolation-dev-123456789user:  storage_path: s3://yaml-interpolation-dev-123456789/codiply  user_arn: arn:aws:iam::123456789:user/codiply  username: codiply

The code

For the implementation, I am using benedict and Jinja2, specifically the following versions

python-benedict==0.32.1Jinja2==3.1.2

The imports are

import reimport typingfrom benedict import benedictfrom jinja2 import BaseLoader, Environment

I work with two representations:

A list of strings, each string containing the content of a YAML file. The order of this list is important when there are duplicate keys.
A merged nested dictionary with all settings combined. This will serve as the "context" for doing the string interpolation.

For loading the YAML files and merging them into a single dictionary I use benedict which already gives me the functionality for loading and merging dictionaries. The code is

def _merge_configs_to_dict(yaml_texts: typing.List[str]) -> benedict:    merged_config = benedict()    for text in yaml_texts:        config = benedict.from_yaml(text)        merged_config.merge(config, overwrite=True, concat=True)    return merged_config

Notice that contents are processed in the order they are passed in, and due to the setting overwrite=True, duplicate keys are overwritten. The setting concat=True controls the behaviour for key values that are lists. In this case, I am appending elements to the list if they exist in multiple configs, but you can choose to overwrite the whole list with the new list.

Once I have a context object loaded, I can attempt to render each one of the YAML texts with Jinja

def _render_jinja(text: str, context: benedict) -> str:    template = Environment(loader=BaseLoader(), autoescape=False).from_string(text)    return template.render(context)def _render_yaml_texts(yaml_texts: typing.List[str], context: benedict) -> typing.List[str]:    return [_render_jinja(yaml_text, context) for yaml_text in yaml_texts]

To tell if there are more placeholders left in the YAML, it is easier to work with the text representation.

def _exists_string_to_interpolate(yaml_texts: typing.List[str]) -> bool:    for text in yaml_texts:        if "{{" in text:            return True    return False

The idea is to go back and forth between the two representations (YAML text and dictionary/context) making string interpolations until there are no more interpolations to be made. If there are cyclic dependencies, the stopping condition will never be met. For that reason, I set a maximum number of iterations and I stop after the maximum number of passes. I raise an exception if at that point there are still placeholders left.

def _combine_configs_with_string_interpolation(ordered_yaml_texts: typing.List[str], max_passes: int = 8) -> benedict:    yaml_texts = ordered_yaml_texts    pass_number = 1    while pass_number <= max_passes and _exists_string_to_interpolate(yaml_texts):        context = _merge_configs_to_dict(yaml_texts)        yaml_texts = _render_yaml_texts(yaml_texts, context)        pass_number += 1    if _exists_string_to_interpolate(yaml_texts):        remaining_expressions = _find_all_remaining_placeholders(yaml_texts)        raise Exception(            f"Unable to extrapolate all strings after {max_passes} passes. "            "Check for cyclic references. "            f"Remaining expressions are {', '.join(remaining_expressions)}."        )    return _merge_configs_to_dict(yaml_texts)

For better debugging of cyclic dependencies, I find and report all placeholders that have not been replaced with a value. This function is below

def _find_all_remaining_placeholders(yaml_texts: typing.List[str]) -> typing.List[str]:    remaining = set()    for text in yaml_texts:        remaining.update(re.findall("{{.*}}", text))    return list(remaining)

To load the YAML texts, given some paths, the code is

def _load_yaml_texts(ordered_paths: typing.List[str]) -> typing.List[str]:    yaml_texts = []    for path in ordered_paths:        if os.path.isfile(path):            with open(path, "r") as file:                yaml_texts.append(file.read())    return yaml_textsdef load_config(ordered_yaml_paths: typing.List[str]) -> benedict:    yaml_texts = _load_yaml_texts(ordered_yaml_paths)    config = _combine_configs_with_string_interpolation(yaml_texts)    return config

Finally, to create the dictionary for a set of config filenames, I do

config = load_config(["/some/path/a.yaml", "/some/path/b.yaml"])

Most likely this config will be used in your Python code, so a dictionary is a good representation. Or you can pass the dictionary to a constructor of a more typed object.

If you wish to get the rendered config as a single YAML file, you can simply do

config.to_yaml()

and store the result in a file.

Limitations

There are a few limitations

You cannot reference a key that contains a list. This is not exactly a limitation, because the goal is to do string interpolation. If you are here because you need to reference a list, then you should most likely be looking into anchors and aliases that are part of the YAML specification.
Depending on how deep is the graph of references, 8 passes might not be sufficient. You can raise the number of maximum passes to a bigger number.
If you have a cyclic dependency and a high number of maximum passes, the code is going to construct very large strings.

To demonstrate the last point, with the simplest cyclic dependency, this is what happens at each step

# Originalsection:  key1: "{{ section.key2 }}-a"  key2: "{{ section.key1 }}-b"# 1st passsection:  key1: "{{ section.key1 }}-b-a"  key2: "{{ section.key2 }}-a-b"# 2st passsection:  key1: "{{ section.key1 }}-b-a-b-a"  key2: "{{ section.key2 }}-a-b-a-b"# 3rd passsection:  key1: "{{ section.key1 }}-b-a-b-a-b-a-b-a"  key2: "{{ section.key2 }}-a-b-a-b-a-b-a-b"# 4th passsection:  key1: "{{ section.key1 }}-b-a-b-a-b-a-b-a-b-a-b-a-b-a-b-a"  key2: "{{ section.key2 }}-a-b-a-b-a-b-a-b-a-b-a-b-a-b-a-b"

and the strings for these 2 values grow exponentially in size.

Create VPC with IPv6 subnets in AWS CDK

Panagiotis Katsaroumpas, PhD — Fri, 29 Sep 2023 22:00:00 GMT

This post describes how you can create a VPC with subnets that support IPv6 with AWS CDK in Python.

At the moment of writing, the VPC L2 (layer 2) construct in AWS CDK does not support IPv6 subnets, therefore I have created the VPC from scratch using L1 constructs (aka CFN Resources).

The code

First, I define a properties object that holds configuration.

import typingimport aws_cdk as cdkfrom aws_cdk import aws_ec2 as ec2from constructs import Constructfrom pydantic import BaseModelclass VpcIpv6Props(BaseModel):    vpc_name: str    vpc_ipv4_cidr_block: str    number_of_azs: int

I define a VpcIpv6 construct and, within it, I start by creating a VPC using the L1 constrcut.

class VpcIpv6(Construct):    def __init__(        self,        scope: Construct,        construct_id: str,        props: VpcIpv6Props,        **kwargs: typing.Any,    ) -> None:        super().__init__(scope, construct_id, **kwargs)        vpc = ec2.CfnVPC(            self,            "vpc",            cidr_block=props.vpc_ipv4_cidr_block,            enable_dns_support=True,            enable_dns_hostnames=True,            tags=[cdk.CfnTag(key="Name", value=props.vpc_name)],        )        self.vpc = vpc

What follows, is code that is defined within __init__() function within VpcIpv6.

I associate an IPv6 CIDR range to the VPC.

ec2.CfnVPCCidrBlock(self, "ipv6cidr", vpc_id=vpc.attr_vpc_id, amazon_provided_ipv6_cidr_block=True)

I create an Internet Gateway, and I attach it to the VPC.

internet_gateway = ec2.CfnInternetGateway(    self, "igw", tags=[cdk.CfnTag(key="Name", value=f"{props.vpc_name}-igw")])ec2.CfnVPCGatewayAttachment(    self,    "igw-attachment",    vpc_id=vpc.attr_vpc_id,    internet_gateway_id=internet_gateway.attr_internet_gateway_id,)

I create an Egress-Only Internet Gateway and attach it to the VPC. This is the equivalent of a NAT Gateway in IPv4, and the good news is that the Egress-Only Internet Gateway has no fixed hourly cost like NAT Gateway.

egress_only_internet_gateway = ec2.CfnEgressOnlyInternetGateway(self, "egress-only-igw", vpc_id=vpc.attr_vpc_id)

I create a Route Table for the public subnets. I create a default route to the Internet Gateway for IPv4 and IPv6.

public_subnet_route_table = ec2.CfnRouteTable(    self,    "public-subnet-route-table",    vpc_id=vpc.attr_vpc_id,    tags=[cdk.CfnTag(key="Name", value=f"{props.vpc_name}-public")],)ec2.CfnRoute(    self,    "public-subnet-default-route-ipv4",    destination_cidr_block="0.0.0.0/0",    route_table_id=public_subnet_route_table.attr_route_table_id,    gateway_id=internet_gateway.attr_internet_gateway_id,)ec2.CfnRoute(    self,    "public-subnet-default-route-ipv6",    destination_ipv6_cidr_block="::/0",    route_table_id=public_subnet_route_table.attr_route_table_id,    gateway_id=internet_gateway.attr_internet_gateway_id,)

I create a Route Table for private subnets. I create a default route for only IPv6. The private subnet in this example is a IPv6-only subnet.

private_subnet_route_table = ec2.CfnRouteTable(    self,    "private-subnet-route-table",    vpc_id=vpc.attr_vpc_id,    tags=[cdk.CfnTag(key="Name", value=f"{props.vpc_name}-private")],)ec2.CfnRoute(    self,    "private-subnet-default-route-ipv6",    destination_ipv6_cidr_block="::/0",    route_table_id=private_subnet_route_table.attr_route_table_id,    egress_only_internet_gateway_id=egress_only_internet_gateway.attr_id,)

I get the list of availability zones. I slice the VPC CIDR range in smaller ranges to be allocated to subnets. Note that for IPv6 subnets must be allocated a /64 range.

all_available_azs = cdk.Fn.get_azs()vpc_ipv6_cidr_block = cdk.Fn.select(0, vpc.attr_ipv6_cidr_blocks)ipv6_cidr_blocks = cdk.Fn.cidr(vpc_ipv6_cidr_block, 2**8, "64")ipv4_cidr_blocks = cdk.Fn.cidr(props.vpc_ipv4_cidr_block, 2**4, "12")

Finally, I loop through the availability zones and I create a public and a private subnet in each. Public subnets are dual-stack, they are given both IPv4 and IPv6 CIDR blocks. Private subnets are IPv6-only. I associate each subnet with the corresponding route table.

for az_index in range(props.number_of_azs):    az_no = az_index + 1    public_subnet = ec2.CfnSubnet(        self,        f"public-subnet-{az_no}",        vpc_id=vpc.attr_vpc_id,        cidr_block=cdk.Fn.select(2 * az_index, ipv4_cidr_blocks),        ipv6_cidr_block=cdk.Fn.select(2 * az_index, ipv6_cidr_blocks),        availability_zone=cdk.Fn.select(az_index, all_available_azs),        map_public_ip_on_launch=True,        assign_ipv6_address_on_creation=True,        tags=[cdk.CfnTag(key="Name", value=f"{props.vpc_name}-public-{az_no}")]    )    ec2.CfnSubnetRouteTableAssociation(        self,        f"public-subnet-{az_no}-route-table-association",        route_table_id=public_subnet_route_table.attr_route_table_id,        subnet_id=public_subnet.attr_subnet_id,    )    private_subnet = ec2.CfnSubnet(        self,        f"private-subnet-{az_no}",        vpc_id=vpc.attr_vpc_id,        ipv6_cidr_block=cdk.Fn.select(2 * az_index + 1, ipv6_cidr_blocks),        availability_zone=cdk.Fn.select(az_index, all_available_azs),        ipv6_native=True,        tags=[cdk.CfnTag(key="Name", value=f"{props.vpc_name}-private-{az_no}")]    )    ec2.CfnSubnetRouteTableAssociation(        self,        f"private-subnet-{az_no}-route-table-association",        route_table_id=private_subnet_route_table.attr_route_table_id,        subnet_id=private_subnet.attr_subnet_id,    )

This is how you instantiate the construct

VpcIpv6(    self,    "vpc",    VpcIpv6Props(        vpc_name="my-ipv6-vpc",        vpc_ipv4_cidr_block="10.0.0.0/16",        number_of_azs=3,    ),)

Connectivity tests

After you have included the above construct in your CDK project and have deployed it, it is time to test IPv6 connectivity from public and private subnets.

I create 2 EC2 instances of type t3 (you need the latest generation), one in the public subnet (with a public IP address for both IPv4 and IPv6) and one in the private subnet. I am using the same key pair for both instances.

First I add the SSH key to the authentication agent

ssh-add ~/path/to/your/key.pem

Then I remote to the public EC2 instance (make sure the Security Group allows SSH access). I do not have IPv6 connectivity at home, that's why I am using the IPv4 public IP address to access the public instance. The -A is important so that you can jump from the public instance onto the private instance.

ssh -A ec2-user@

As a first test, I check that the public EC2 instance can access Google over IPv6.

wget http://ipv6.google.com

Then I copy the IPv6 address of the private EC2 instance, and on the same terminal, I jump from the public EC2 instance to the private EC2 instance via IPv6.

ssh ec2-user@

I repeat the test

wget http://ipv6.google.com

Congratulations! You have created public and private subnets with IPv6 support.

Cloudformation Resource Creation and Deletion Order

Panagiotis Katsaroumpas, PhD — Wed, 28 Dec 2022 08:20:39 GMT

In this post, I will try to test experimentally and understand the order in which CloudFormation resources are created, updated and deleted. I will explore 3 different approaches, with and without CloudFormation resource dependencies, and finally nested stacks with dependencies.

Scenarios

These are the 3 scenarios:

Without dependencies between CloudFormation resources,
with dependencies between CloudFormation resources using the DependsOn attribute (or the equivalent syntax in CDK), and finally
by placing resources in nested stacks and enforcing dependencies at the nested stack level.

Experimental setup

The complete code used in this experiment can be found in this AWS CDK project on github.

I am using CloudFormation custom resources that allow me to run custom code and record the exact time a resource was created/updated/deleted. I record this information in an Amazon Timestream database.

The project contains a core stack, that defines the Timestream database and the custom resource provider. The lambda function of the custom resource provider handles the create/update/delete events for the custom resources and records the following information in the database

Time of event
The name of the resource
The version of the resource (this is an attribute that I change to trigger an update)
The type of the operation (create, update, delete)
The approach, i.e. one of the 3 scenarios mentioned above

This is a sample of records in the database

In the CDK project, apart from the core stack, I create one stack for every approach. Each stack creates a configurable amount of custom resources, this number is set to 10.

These are all the stacks created by the project. If you deployed my code and you see many nested stacks with long names, flip the switch "View nested" to off to see only non-nested stacks.

The experiment has 3 phases

Deploy all stacks: trigger the creation of resources
Increment the version number and deploy again: trigger the update of resources
Delete the stacks from the CloudFormation console: trigger the deletion of resources

In the last step, I do not delete the core stack yet. The core stack contains the Database with the collected data.

The results

It is time for the data to speak!

Scenario 1: Without dependencies

In this scenario, we do not enforce any dependencies between the resources.

The resources are created in no particular order, they are created in parallel.

I increment the version of the resources, which triggers an update. The updates happen in no particular order.

Finally, I delete the specific stack and the resources are deleted in parallel in no particular order.

Scenario 2: With dependencies

In this scenario, we use the DependsOn attribute

Resources:  Resource2:    ...  Resource1:    ...    DependsOn: Resource2

or in CDK the equivalent is

resource1.node.add_dependency(resource2)

Specifically, in our example, we create a chain of dependencies where resource-(n+1) depends on resource-n .

As we see in the data, the resources are created one by one in order from resource-1 to resource-10.

The update is done in the same order from resource-1 to resource-10

while the deletion is performed in reverse order, i.e. starting with the last resource created resource-10 and finishing with resource-1 .

Scenario 3: Nested stacks with dependencies

In this scenario, I place the resources within nested stacks. Nested stacks would usually contain several resources, but in this example, I only place a single resource per nested stack.

I create a similar chain of dependencies with scenario 2, but this time at the nested stack level. In CloudFormation this is done with the same DependsOn attribute (a nested stack is just another resource in the parent stack).

In CDK, this is done like this

nested_stack1.add_dependency(nested_stack2)

See here for the complete code of stack, including how to create nested stack with CDK.

Nested stacks are useful if your main CloudFormation stack is hitting one of the limits, e.g. size of the template or the number of resources per template.

Similarly to scenario 2, the resources are created in order from resource-1 to resource-10

Update of resources happens in the same order they have been created

and deletion happens in the reverse order, i.e. starting with the last resource created resource-10.

Summary of results

In this post, I tested experimentally the order of creation/update/deletion of resources in CloudFormation.

When no dependencies are defined between resources, operations happen in parallel in no specific order.

Next, I defined dependencies using the DependsOn attribute with 2 different approaches: dependencies at resource-level, or dependencies between nested stacks containing the resources.

When A <- B <- C <- D (B depends on A, C depends on B , ...), then in both approaches

Resources are created in order A, B, C, D
Resources are updated in order A, B, C, D
Resources are deleted in order D, C, B, A

CloudWatch RUM App Monitor with AWS CDK

Panagiotis Katsaroumpas, PhD — Sun, 05 Jun 2022 12:35:01 GMT

CloudWatch RUM allows you to monitor your web application and analyze user sessions in near real-time. In this short post, I will describe how you can automate the creation of a RUM App Monitor with AWS CDK.

CDK Code

I am using CDK V2, and the imports you will need are

import * as cdk from 'aws-cdk-lib';import * as rum from 'aws-cdk-lib/aws-rum';import * as cognito from 'aws-cdk-lib/aws-cognito';import * as iam from 'aws-cdk-lib/aws-iam';

I am only considering the case where an application has just anonymous users. First, I create a Cognito Identity Pool and enable unauthenticated identites.

const applicationName = `example.com`;const cwRumIdentityPool = new cognito.CfnIdentityPool(this, 'cw-rum-identity-pool', {  allowUnauthenticatedIdentities: true,});

I create an IAM role that can be assumed by Cognito for unauthenticated users.

const cwRumUnauthenticatedRole = new iam.Role(this, 'cw-rum-unauthenticated-role', {  assumedBy: new iam.FederatedPrincipal(    'cognito-identity.amazonaws.com',     {      "StringEquals": {        "cognito-identity.amazonaws.com:aud": cwRumIdentityPool.ref      },      "ForAnyValue:StringLike": {        "cognito-identity.amazonaws.com:amr": "unauthenticated"      }    },    "sts:AssumeRoleWithWebIdentity"  )});

I give permission to the role to rum:PutRumEvents for the specific App Monitor that I will create in a later step.

cwRumUnauthenticatedRole.addToPolicy(new iam.PolicyStatement({  effect: iam.Effect.ALLOW,  actions: [    "rum:PutRumEvents"  ],  resources: [    `arn:aws:rum:${cdk.Aws.REGION}:${cdk.Aws.ACCOUNT_ID}:appmonitor/${applicationName}`  ]}));

I attach the role to the Identity Pool for unauthenticated users.

const cwRumIdentityPoolRoleAttachment = new cognito.CfnIdentityPoolRoleAttachment(this,   'cw-rum-identity-pool-role-attachment',   {    identityPoolId: cwRumIdentityPool.ref,    roles: {      "unauthenticated": cwRumUnauthenticatedRole.roleArn    }  });

Finally, I create the App Monitor and pass in the role as the guest role.

const cwRumAppMonitor = new rum.CfnAppMonitor(this, 'cw-rum-app-monitor', {  domain: domainName,  name: applicationName,  appMonitorConfiguration: {    allowCookies: true,    enableXRay: false,    sessionSampleRate: 1,    telemetries: ['errors', 'performance', 'http'],    identityPoolId: cwRumIdentityPool.ref,    guestRoleArn: cwRumUnauthenticatedRole.roleArn  },  cwLogEnabled: true,});

Instal CloudWatch RUM web client

To start collecting data you must install the CloudWatch RUM web client in your application.

Log into the Console, navigate to CloudWatch and then RUM under Application Monitoring. Find your App Monitor and follow the installation instructions under Configuration tab.

Enable DNSSEC signing in Amazon Route 53 using AWS CDK

Panagiotis Katsaroumpas, PhD — Sat, 30 Apr 2022 06:32:29 GMT

DNSSEC (Domain Security Extensions) adds security features to the DNS protocol so that DNS resolvers can verify that the data came from the specific zone and validate that it has not been tampered with in transit. In this post I will explain how to enable DNSSEC signing in Amazon Route 53, not via the Console, but using AWS CDK.

Preparation

Consider the preparation steps mentioned under Step 1 in this developer guide.

Customer Managed Key

First, you will need to create a stack that creates a Customer Managed Key in AWS KMS (Key Management Service). It is important that this stack and the key is created in N.Virginia region (us-east-1).

These are the imports needed (this is for CDK V1, make the necessary changes for CDK V2)

import * as cdk from '@aws-cdk/core';import * as iam from '@aws-cdk/aws-iam';import * as kms from '@aws-cdk/aws-kms';

Create the key with the right cryptographic configuration (ECC_NIST_P256) and the correct cryptographic operations for which the key can be used (SIGN_VERIFY).

const dnssecKeyAlias = 'example-com-dnssec-key';const dnssecKey = new kms.Key(this, 'dnssec-key', {  enableKeyRotation: false,  removalPolicy: cdk.RemovalPolicy.DESTROY,  alias: dnssecKeyAlias,  keySpec: kms.KeySpec.ECC_NIST_P256,  keyUsage: kms.KeyUsage.SIGN_VERIFY,});

Give Route 53 DNSSEC Service the necessary permissions in order to use the key.

dnssecKey.addToResourcePolicy(new iam.PolicyStatement({  sid: "Allow Route 53 DNSSEC Service",  effect: iam.Effect.ALLOW,  principals: [    new iam.ServicePrincipal("dnssec-route53.amazonaws.com")  ],  actions: [    "kms:DescribeKey",    "kms:GetPublicKey",    "kms:Sign"  ],  resources: ["*"],  conditions: {    "StringEquals": {      "aws:SourceAccount": cdk.Aws.ACCOUNT_ID    }  }}));dnssecKey.addToResourcePolicy(new iam.PolicyStatement({  sid: "Allow Route 53 DNSSEC to CreateGrant",  effect: iam.Effect.ALLOW,  principals: [    new iam.ServicePrincipal("dnssec-route53.amazonaws.com")  ],  actions: [    "kms:CreateGrant"  ],  resources: ["*"],  conditions: {    "StringEquals": {      "aws:SourceAccount": cdk.Aws.ACCOUNT_ID    },    "Bool": {      "kms:GrantIsForAWSResource": true    }  }}));

Enable DNSSEC signing

Now we go back to the stack where you have defined your hosted zone. This can be the same or a different stack, and it does not need to be deployed in N.Virginia. The imports needed are

import * as cdk from '@aws-cdk/core';import * as route53 from '@aws-cdk/aws-route53';

In my case, the two stacks are in two different regions, so I have hard-coded the key alias. You can use other ways of sharing the key ARN between stacks, for example, if they are in the same region you can export and then import the ARN.

const dnssecKeyAlias = 'example-com-dnssec-key';const zone = new route53.PublicHostedZone(this, 'zone-example-com', {  zoneName: 'example.com'});

Create a Key Signing Key (KSK)

const keySigningKey = new route53.CfnKeySigningKey(this, 'route-53-key-signing-key', {  hostedZoneId: zone.hostedZoneId,  keyManagementServiceArn: `arn:aws:kms:us-east-1:${cdk.Aws.ACCOUNT_ID}:alias/${dnssecKeyAlias}`,  name: 'ExampleComKeySigningKey',  status: 'ACTIVE',});

and then associate the KSK with the hosted zone.

const dnssec = new route53.CfnDNSSEC(this, 'zone-example-com-dnssec', {  hostedZoneId: zone.hostedZoneId});dnssec.node.addDependency(keySigningKey);

Verify in the console that DNSSEC Signing has been enabled for your hosted zone.

Establish a chain of trust

This is the only manual step and will vary depending on your domain registrar or whether you own the parent domain.

In the console, in the screen shown above, click View Information to create DS record and follow the instructions to Establish a chain of trust. On this page, you can find the public key, the key type (flags field), the signing algorithm and the DS record.

If Route 53 is your registrar, in a new tab go to Registered Domains, open your domain page, and under DNSSEC status click Manage Keys. Select the correct algorithm (shown on the information page) and copy and paste your public key.

If you own the parent domain, then you should create a DS (Delegation Signer) record in the parent zone that you control.

const parentZone = route53.HostedZone.fromLookup(this, 'parent-zone', {  domainName: 'example.com'});new route53.DsRecord(this, 'my-subdomain-delegation-signer', {  zone: parentZone,  recordName: 'my-subdomain',  values: [    ''  ],  ttl: cdk.Duration.seconds(3600)});

Verify that DNSSEC is enabled

Finally, you can verify that DNSSEC is enabled with the command line tool dig.

Find your name servers

dig codiply.com. NS

and then query one of them with +dnssec (prepend @ before the name server address)

dig codiply.com.  +dnssec @ns-477.awsdns-59.com.

If DNSSEC is enabled, in the first few lines of the response you should see ad among the flags listed.

;; flags: qr aa rd ad;

Congratulations, DNSSEC is enabled for your domain!

Zip Archive for key prefix with S3 Object Lambda

Panagiotis Katsaroumpas, PhD — Sat, 23 Oct 2021 22:13:58 GMT

S3 Object Lambda allows you to run code when an object is requested from S3. You can return a transformed version of the actual file stored in the S3 bucket, or you can even return objects that do not exist in S3 and are dynamically created at request time. In this post I show how you can create a zip archive containing all files under a specific key prefix.

The code

This github repository contains a CDK project with an example stack you can deploy into your own account.

Implementation

At a high level, you will need the following

An S3 bucket
A standard S3 Access Point
An execution role for the Lambda function
The Lambda Function
An Object Lambda Access Point that will be using the standard S3 Access Point as the Supporting Access Point

I create the standard access point and name it deepdive-zip-archive-standard-access-point

The role of the Lambda function will need to have the permissions to do s3-object-lambda:WriteGetObjectResponse so that it can write the response. Normally, that's all you need, because the function will receive an event with a field getObjectContext containing an inputS3Url that has embedded credentials to read the underlying object. However, in this case, we want to list the objects with the specific key prefix and read possibly more than one object. For that reason, we give the Lambda function read-only access to the bucket (via the supporting access point).

So the Lambda execution role will need the following policy (plus the AWSLambdaBasicExecutionRole managed policy).

{    "Version": "2012-10-17",    "Statement": [        {            "Action": "s3-object-lambda:WriteGetObjectResponse",            "Resource": "*",            "Effect": "Allow"        },        {            "Action": [                "s3:List*",                "s3:Get*"            ],            "Resource": [                "arn:aws:s3:::accesspoint/deepdive-zip-archive-standard-access-point",                "arn:aws:s3:::accesspoint/deepdive-zip-archive-standard-access-point/object/*"            ],            "Effect": "Allow"        }    ]}

This is the Lambda function code itself:

import osimport boto3import zipfilefrom io import BytesIOfrom urllib.parse import urlparseACCOUNT_ID = os.environ['ACCOUNT_ID']ACCESS_POINT_ALIAS = os.environ['ACCESS_POINT_ALIAS']s3_client = boto3.client('s3')s3_resource = boto3.resource('s3')s3_paginator = s3_client.get_paginator('list_objects')def main(event, context):    object_get_context = event["getObjectContext"]    print(object_get_context)    request_route = object_get_context["outputRoute"]    request_token = object_get_context["outputToken"]    s3_url = object_get_context["inputS3Url"]    prefix = urlparse(s3_url).path[1:]    in_memory_zip = BytesIO()    with zipfile.ZipFile(in_memory_zip, mode='w', compression=zipfile.ZIP_DEFLATED) as zip:        page_iterator = s3_paginator.paginate(Bucket=ACCESS_POINT_ALIAS, Prefix=prefix)        for page in page_iterator:            if 'Contents' in page:                for entry in page['Contents']:                    key = entry['Key']                    body = s3_resource.Object(ACCESS_POINT_ALIAS, key).get()['Body'].read()                    zip.writestr(key, body)    s3_client.write_get_object_response(        Body=in_memory_zip.getvalue(),        RequestRoute=request_route,        RequestToken=request_token)    return {'status_code': 200}

I am extracting the key of the requested object from inputS3Url. This key is used as the key prefix. I list all objects starting with the given prefix, and then read them one by one adding them to a zip archive in memory. At the end, I write the zip file at the output route using the output token (both provided in the event passed into the lambda).

Once I have created the standard access point and the lambda function, I can create the S3 Object Lambda Access Point

Testing it

I have populated the underlying S3 bucket with some files within different prefixes.

Now, it is just a matter of requesting a prefix as if requesting the specific key from S3. I am using the AWS CLI for this, and to make it work, I need to use the full ARN of the Object Lambda Access Point as the bucket parameter.

aws s3api get-object --key 2020 --bucket arn:aws:s3-object-lambda:::accesspoint/deepdive-zip-archive-object-lambda 2020.zip

In this case, I request the key 2020 that does not exist. The lambda function zips all objects with key starting with that prefix and returns the archive.

Limitations

With the specific implementation, because I am creating the zip file in memory, I am limited by the 10GB memory limit of Lambda.

Conclusion

With S3 Object Lambda we can dynamically create S3 objects at request time and this allows us to create, for example, a ZIP archive with all objects under a specific key prefix.

Amazon Neptune Jupyter Notebooks with persistence via EFS

Panagiotis Katsaroumpas, PhD — Sun, 19 Sep 2021 16:53:52 GMT

Neptune Notebooks allow you to easily populate and query your Amazon Neptune graph database in an interactive way using Jupyter Notebooks. This post describes how to set up a Notebook Instance layer and a persistence layer with EFS in AWS CDK. This allows you to delete and recreate the Notebook Instance while preserving your notebooks in EFS. Moreover, you can share the file system across many Notebook Instances.

CDK project

I present here only the relevant constructs. This code is part of a CDK project written in TypeScript that can be found in this github repository. If you found this article because you are implementing this architecture but you are working with CloudFormation, the code should still be relevant, it is quite readable and easy to translate to CloudFormation.

The two constructs below will need to be part of 2 different layers/stacks, as these stacks will have different lifecycles.

The Persistence Layer contains your EFS file system, and will only be deployed once and not deleted
The Notebook Instance Layer can be created for only the time you use the notebook instance and then deleted. The beauty of infrastructure as code is that you can recreate it with a single command. You can also create several instances if you wish, as EFS can be attached to multiple instances and act as a shared storage.

Notebook Instance Persistence Construct

This is the construct that creates the persistence layer. Some important points:

You will need to have a VPC already. The Elastic File System will be launched within that VPC.
Access is controlled via Security Groups, I am creating one for the EFS file system and one that can be used by clients connecting to the file system.
I store the client security group and the file system ID in public properties of the construct so that they can be injected into the notebook instance construct later on. See the github repo for more details of how this is done.

import * as cdk from '@aws-cdk/core';import * as ec2 from '@aws-cdk/aws-ec2';import * as efs from '@aws-cdk/aws-efs';import { DeploymentConfig } from '../config/deployment-config';import { Constants } from '../constants/constants';export interface NeptuneNotebookPersistenceProps {  readonly deployment: DeploymentConfig;  readonly vpc: ec2.Vpc;  readonly encrypted: boolean;  readonly enableAutomaticBackups: boolean;}export class NeptuneNotebookPersistence extends cdk.Construct {  public efsClientSecurityGroup: ec2.SecurityGroup;  public efsFileSystemId: string;  constructor(scope: cdk.Construct, id: string, props: NeptuneNotebookPersistenceProps) {    super(scope, id);    const efsClientSecurityGroup = new ec2.SecurityGroup(this, 'efs-client-sg', {      vpc: props.vpc,      securityGroupName: `${props.deployment.Prefix}-neptune-notebook-efs-client`,      description: `Security group for Neptune Notebook EFS clients for project ${props.deployment.Project} in ${props.deployment.Environment}`,    });    const efsSecurityGroup = new ec2.SecurityGroup(this, 'efs-sg', {      vpc: props.vpc,      securityGroupName: `${props.deployment.Prefix}-neptune-notebook-efs`,      description: `Security group for Neptune Notebook EFS for project ${props.deployment.Project} in ${props.deployment.Environment}`,    });    efsSecurityGroup.addIngressRule(      efsClientSecurityGroup,       ec2.Port.tcp(Constants.EFS_PORT),      'EFS port');    const fileSystem = new efs.FileSystem(this, 'file-system', {      fileSystemName: `${props.deployment.Prefix}-neptune-notebook-efs`,      vpc: props.vpc,      vpcSubnets: props.vpc.selectSubnets({        subnetType: ec2.SubnetType.PRIVATE      }),      securityGroup: efsSecurityGroup,      performanceMode: efs.PerformanceMode.GENERAL_PURPOSE,      encrypted: props.encrypted,      enableAutomaticBackups: props.enableAutomaticBackups,      removalPolicy: cdk.RemovalPolicy.DESTROY    });    this.efsClientSecurityGroup = efsClientSecurityGroup;    this.efsFileSystemId = fileSystem.fileSystemId  }}

I have extracted some constants in the following class

export class Constants {  static get NEPTUNE_PORT(): number {    return 8182;  }  static get EFS_PORT(): number {    return 2049;  }}

while the DeploymentConfig interface contains several properties, but only one is relevant here and this is the project name that I use as the prefix for all resources.

export interface DeploymentConfig{    readonly Project: string;}

This allows me to create several stacks side by side without naming conflicts. This is explained in more detail in this post.

Notebook Instance Construct

This is the construct that creates the Neptune Notebook Instance. Some important points here:

The Database Cluster is created in the same project and is injected in the props of the construct together with other information
A Neptune notebook instance is actually a SageMaker notebook instance, but there are some naming conventions that makes it appear in the Neptune UI.
The name of the notebook instance needs to start with aws-neptune-.
The notebook instances need to be tagged with tags aws-neptune-cluster-id and aws-neptune-resource-id.
I am using a notebook instance lifecycle configuration script to mount the EFS volume (see below for the script if you are looking for just this script).
The notebook instance needs the right permissions to fetch AWS-provided notebooks from S3, connect to your DB cluster and write logs to CloudWatch.

import * as cdk from '@aws-cdk/core';import * as ec2 from '@aws-cdk/aws-ec2';import * as iam from '@aws-cdk/aws-iam';import * as neptune from '@aws-cdk/aws-neptune';import * as sagemaker from '@aws-cdk/aws-sagemaker';import { DeploymentConfig } from '../config/deployment-config';import { Constants } from '../constants/constants';import { ServicePrincipals } from '../constants/service-principals';import { NeptuneNotebookConfig } from '../config/sections/neptune-notebook';export interface NeptuneNotebookProps {  readonly deployment: DeploymentConfig;  readonly neptuneNotebookConfig: NeptuneNotebookConfig;  readonly vpc: ec2.Vpc;  readonly neptuneCluster: neptune.DatabaseCluster;  readonly databaseClientSecurityGroup: ec2.SecurityGroup;  readonly efsClientSecurityGroup: ec2.SecurityGroup;  readonly efsFileSystemId: string;}export class NeptuneNotebook extends cdk.Construct {  private readonly props: NeptuneNotebookProps;  constructor(scope: cdk.Construct, id: string, props: NeptuneNotebookProps) {    super(scope, id);    this.props = props;    const notebookRole = this.defineNotebookRole();    const lifecycleConfigName = `${this.props.deployment.Prefix}-notebook-instance-lifecycle-config`;    this.defineNotebookInstanceLifecycleConfig(lifecycleConfigName);    this.defineNotebookInstance(notebookRole, lifecycleConfigName);  }  private defineNotebookRole(): iam.Role {    const role = new iam.Role(this, 'notebook-role', {      roleName: `${this.props.deployment.Prefix}-neptune-notebook-role`,      assumedBy: new iam.ServicePrincipal(ServicePrincipals.SAGEMAKER)    });    role.addToPolicy(new iam.PolicyStatement({      effect: iam.Effect.ALLOW,      actions: [        's3:GetObject',        's3:ListBucket'      ],      resources: [        'arn:aws:s3:::aws-neptune-notebook',        'arn:aws:s3:::aws-neptune-notebook/*'      ]    }));    role.addToPolicy(new iam.PolicyStatement({      effect: iam.Effect.ALLOW,      actions: ['neptune-db:connect'],      resources: [`arn:aws:neptune-db:${cdk.Aws.REGION}:${cdk.Aws.ACCOUNT_ID}:${this.props.neptuneCluster.clusterResourceIdentifier}/*`]    }));    role.addToPolicy(new iam.PolicyStatement({      effect: iam.Effect.ALLOW,      actions: [        'logs:CreateLogDelivery',        'logs:CreateLogGroup',        'logs:CreateLogStream',        'logs:DeleteLogDelivery',        'logs:Describe*',        'logs:GetLogDelivery',        'logs:GetLogEvents',        'logs:ListLogDeliveries',        'logs:PutLogEvents',        'logs:PutResourcePolicy',        'logs:UpdateLogDelivery'      ],      resources: [`arn:aws:logs:${cdk.Aws.REGION}:${cdk.Aws.ACCOUNT_ID}:log-group:/aws/sagemaker/NotebookInstances:*`]    }));    return role;  }  private defineNotebookInstanceLifecycleConfig(name: string): sagemaker.CfnNotebookInstanceLifecycleConfig {    const persistentPath = `/home/ec2-user/SageMaker/${this.props.neptuneNotebookConfig.PersistentDirectory}`;    const efsDns = `${this.props.efsFileSystemId}.efs.${cdk.Aws.REGION}.amazonaws.com`    const lifecycleConfig = new sagemaker.CfnNotebookInstanceLifecycleConfig(this, 'notebook-instance-lifecycle-config', {      notebookInstanceLifecycleConfigName: `${this.props.deployment.Prefix}-notebook-instance-lifecycle-config`,      onCreate: [{        content: cdk.Fn.base64(`#!/bin/bashset -emkdir ${persistentPath}`)      }],      onStart: [{        content: cdk.Fn.base64(`#!/bin/bashset -esudo -u ec2-user -i <<'EOF'echo "export GRAPH_NOTEBOOK_AUTH_MODE=DEFAULT" >> ~/.bashrcecho "export GRAPH_NOTEBOOK_HOST=${this.props.neptuneCluster.clusterEndpoint.hostname}" >> ~/.bashrcecho "export GRAPH_NOTEBOOK_PORT=${Constants.NEPTUNE_PORT}" >> ~/.bashrcecho "export NEPTUNE_LOAD_FROM_S3_ROLE_ARN=''" >> ~/.bashrcecho "export AWS_REGION=${cdk.Aws.REGION}" >> ~/.bashrcaws s3 cp s3://aws-neptune-notebook/graph_notebook.tar.gz /tmp/graph_notebook.tar.gzrm -rf /tmp/graph_notebooktar -zxvf /tmp/graph_notebook.tar.gz -C /tmp/tmp/graph_notebook/install.shEOFmount -t nfs -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=120,retrans=2 ${efsDns}:/ ${persistentPath}chmod go+rw ${persistentPath}`)      }]    });    return lifecycleConfig  }  private defineNotebookInstance(    role: iam.Role,     lifecycleConfigName: string): sagemaker.CfnNotebookInstance {      const notebookInstance = new sagemaker.CfnNotebookInstance(this, 'notebook-instance', {        // Name has to start with 'aws-neptune-'        notebookInstanceName: `aws-neptune-${this.props.deployment.Prefix}-neptune-notebook-instance`,        instanceType: this.props.neptuneNotebookConfig.InstanceType,        roleArn: role.roleArn,        lifecycleConfigName: lifecycleConfigName,        rootAccess: 'Enabled',        subnetId: this.props.vpc.privateSubnets[0].subnetId,        securityGroupIds:[          this.props.databaseClientSecurityGroup.securityGroupId,          this.props.efsClientSecurityGroup.securityGroupId        ],        tags: [          new cdk.Tag('aws-neptune-cluster-id', this.props.neptuneCluster.clusterIdentifier),          new cdk.Tag('aws-neptune-resource-id', this.props.neptuneCluster.clusterResourceIdentifier)        ]      });      return notebookInstance;  }}

The relevant Service Principal is the following

export class ServicePrincipals {  static get SAGEMAKER(): string {    return 'sagemaker.amazonaws.com';  }}

Conclusion

Practising infrastructure as code and using AWS CDK, we have created an EFS file system and mounted it to a SageMaker Notebook Instance configured in a way that will make it a Neptune Notebook Instance. The EFS file system acts as persistence so that stored notebooks will be preserved when the notebook instance is deleted and recreated. It also allows to create a shared file system for several notebook instances to access the notebooks stored in it.

Spark Job on Serverless Kubernetes Cluster with Fargate

Panagiotis Katsaroumpas, PhD — Wed, 11 Aug 2021 19:56:55 GMT

This post describes how to run Spark applications on a serverless Amazon EKS (Elastic Kubernetes Service) cluster with AWS Fargate.

It might be useful for understanding

Infrastructure as Code and how to create an EKS cluster with the AWS CDK in TypeScript
How to configure a cluster with serverless compute capacity provided by Fargate
How to install and configure the Spark Operator on EKS
How to deploy a Spark job and configure permissions to access an S3 bucket

GitHub Repo

The full project can be found in this GitHub repo. In the code below, you might see details that are specific to how I have structured and configured the project. The full repo might contain the answer to something that doesn't look right.

Creating the Cluster

Creating an EKS cluster with Fargate is as simple as

const cluster = new eks.FargateCluster(this, 'eks-cluster', {  vpc: props.vpc,  version: eks.KubernetesVersion.V1_21,  clusterName: `${props.deployment.Prefix}-cluster`,  endpointAccess: eks.EndpointAccess.PUBLIC_AND_PRIVATE.onlyFrom(...props.deployment.AllowedIpRanges)});props.deployment.AdminUserArns.forEach(userArn => {  const user = iam.User.fromUserArn(this, userArn, userArn);  cluster.awsAuth.addUserMapping(user, { groups: [ 'system:masters' ]});});

In the configuration, I have included

Allowed CIDR ranges to access the endpoint
A set of users that are given master access

Without the second point, you will see the following message in the Management Console

Your current user or role does not have access to Kubernetes objects on this EKS cluster
This may be due to the current user or role not having Kubernetes RBAC permissions to describe cluster resources or not having an entry in the clusters auth config map.

Spark Operator

The cluster comes with a Fargate profile for pods running in the default namespace. The Spark operator will be installed in the spark-operator namespace. I create a Fargate profile for this namespace

const sparkOperatorNamespace = 'spark-operator';const fargateProfile = props.cluster.addFargateProfile('spark-operator-fargate-profile', {  fargateProfileName: 'spark-operator',  selectors: [ { namespace: sparkOperatorNamespace }]});

I install the Helm Chart for the Spark Operator

const sparkOperatorRelease = 'spark-operator-release';const sparkOperatorChart = props.cluster.addHelmChart('spark-operator', {  chart: 'spark-operator',  release: sparkOperatorRelease,  repository: 'https://googlecloudplatform.github.io/spark-on-k8s-operator',  version: props.version,  namespace: sparkOperatorNamespace,  createNamespace: true,  wait: true,  timeout: cdk.Duration.minutes(15)});

Pods will only be scheduled on Fargate if they are annotated with eks.amazonaws.com/compute-type: fargate. For that reason, I patch the deployment so that the Spark Operator controller can run on Fargate.

const sparkOperatorDeploymentPatch = new eks.KubernetesPatch(this, 'spark-operator-patch', {  cluster: props.cluster,  resourceName: `deployment/${sparkOperatorRelease}`,  resourceNamespace: sparkOperatorNamespace,  applyPatch: { spec: { template: { metadata: { annotations: { 'eks.amazonaws.com/compute-type': 'fargate' }} } } },  restorePatch: { }});sparkOperatorDeploymentPatch.node.addDependency(sparkOperatorChart);

Spark Service Account

I create a Service Account named spark to be used by the Spark application

const sparkServiceAccountName = 'spark'const sparkServiceAccount = props.cluster.addServiceAccount('spark-service-account', {  name: sparkServiceAccountName,  namespace: sparkApplicationNamespace});

I make sure that the Service Account has the right permissions so that the driver can launch pods for the executors.

const sparkApplicationNamespace = 'default';const sparkRoleName = 'spark-role';const sparkRole = props.cluster.addManifest('spark-role-manifest', {  apiVersion: 'rbac.authorization.k8s.io/v1',  kind: 'Role',  metadata: {    name: sparkRoleName,    namespace: sparkApplicationNamespace  },  rules: [    {       apiGroups: [""],      resources: ["pods"],      verbs: ["*"]    },    {       apiGroups: [""],      resources: ["services"],      verbs: ["*"]    },    {       apiGroups: [""],      resources: ["configmaps"],      verbs: ["*"]    }  ]});sparkRole.node.addDependency(sparkServiceAccount);const sparkRoleBinding = props.cluster.addManifest('spark-role-binding-manifest', {  apiVersion: 'rbac.authorization.k8s.io/v1',  kind: 'RoleBinding',  metadata: {    name: 'spark',    namespace: sparkApplicationNamespace  },  subjects: [    {       kind: 'ServiceAccount',      name: sparkServiceAccountName,      namespace: sparkApplicationNamespace    }  ],  roleRef: {    kind: 'Role',    name: sparkRoleName,    apiGroup: 'rbac.authorization.k8s.io'  }});sparkRoleBinding.node.addDependency(sparkRole);

If you can see in the Management Console that the Spark Operator has 1 Ready pod, then everything has worked. Select your cluster and check under Workloads.

Kubectl configuration

The deployed CDK stack outputs the command to update your kubectl configuration and connect to the EKS cluster. It will look something like this

aws eks update-kubeconfig --name spark-eks-cluster --region eu-west-1 --role-arn arn:aws:iam::1234567890:role/spark-eks-core-stack-eksclusterMastersRoleCD54321A-RK2GQQ9RCPRO

IAM roles for Service Accounts

This is best described in the documentation but I will show you some of the pieces of the puzzle.

If I describe the service account with kubectl describe sa spark, then it is annotated with an IAM role

This role has a trust policy and it can be assumed by the OIDC provider of the cluster

This provider can be seen in Identity Providers within IAM

All this has been created automagically by the CDK.

Adding permissions to the Service Role

I can easily add permissions to the Spark Service Account like this (this is in a construct I call sparkOperator, see GitHub repo for the details).

sparkOperator.sparkServiceAccount.addToPrincipalPolicy(new iam.PolicyStatement({  effect: iam.Effect.ALLOW,  actions: ["s3:*"],  resources:[    dataLake.bucket.bucketArn,    dataLake.bucket.arnForObjects("*"),  ]}));

Running a Spark Job

As a test, I run a PySpark Job that reads and writes back to S3.

The Dockerfile and the application code can be found here
See also this post for building the base docker image.

I build the image

const image = new DockerImageAsset(this, `docker-image-asset-${props.jobName}`, {  directory: `./assets/docker-images/${props.jobName}`,  buildArgs: {    AWS_SDK_BUNDLE_VERSION: props.sparkConfig.AwsSdkBundleVersion,    HADOOP_VERSION: props.sparkConfig.HadoopVersion,    SPARK_VERSION: props.sparkConfig.Version  }});

and I add the manifest to the cluster. I am running a SparkApplication but this could have also been a ScheduledSparkApplication.

props.cluster.addManifest(`spark-job-${props.jobName}`, {  apiVersion: 'sparkoperator.k8s.io/v1beta2',  kind: 'SparkApplication',  metadata: {    name: props.jobName,    namespace: 'default'  },  spec: {    sparkVersion: props.sparkConfig.Version,    type: 'Python',    pythonVersion: '3',    mode: 'cluster',    image: image.imageUri,    imagePullPolicy: 'Always',    mainApplicationFile: 'local:///opt/spark-job/application.py',    sparkConf: { },    hadoopConf: {      'fs.s3a.impl': 'org.apache.hadoop.fs.s3a.S3AFileSystem',      'fs.s3a.aws.credentials.provider': 'com.amazonaws.auth.WebIdentityTokenCredentialsProvider'    },    driver: {      envVars: props.environment ?? {},      cores: 1,      coreLimit: "1200m",      memory: "512m",      labels: {        version: props.sparkConfig.Version      },      serviceAccount: props.serviceAccount.serviceAccountName    },    executor: {      envVars: props.environment ?? {},      cores: 1,      instances: 2,      memory: "512m",      labels: {        version: props.sparkConfig.Version      }    }  }});

It is important to specify the 2 hadoopConf settings above in order to access S3.

Starting times

The driver stayed 1 minute in the Pending state

then ContainerCreating

and 100 seconds later it was Running

The executors have been Pending for another 60 seconds

Then the first executor started creating

and later the second executor started creating

The first executor started running 3 minutes after the application was scheduled (age of the driver)

while the second executor took 2 minutes to start running (similar to the driver)

So all drivers were up and running almost 4 whole minutes after the job was scheduled.

Fargate pros and cons

The main advantage of Fargate is that you really pay for what you use. There is no need to manage and auto-scale servers. Also there is no need to efficiently pack your pods within container instances in order to minimise waste

The main disadvantage I see using Fargate out-of-the-box (without any optimisation), is that the startup time is up to 2 minutes. This means 2 minutes for the driver, and another 2 minutes for the executors, giving us a total of 3-4 minutes for the application to start. This might be acceptable or not depending on the workflow. For example, it might be acceptable for a batch job running hourly.

Conclusion

The Spark Operator allows us to run SparkApplications or ScheduledSparkApplications on Kubernetes. With Amazon EKS and AWS Fargate we can run Spark applications on a Serverless Kubernetes Cluster. The AWS CDK allows us to easily provision a cluster, install the Spark Operator and schedule Spark Applications in a reusable and repeatable way. Permissions can be set up to access resources in S3 via IAM roles associated to Service Accounts.

Spark Operator container Image for Amazon EKS

Panagiotis Katsaroumpas, PhD — Wed, 11 Aug 2021 16:39:19 GMT

This is how to create the necessary docker images to run Spark on Amazon EKS (Elastic Kubernetes Service) using Spark on k8s Operator. This is because the provided images for Hadoop 3 did not work out of the box with the IAM role associated with the Service Account. This is necessary for example for reading and writing to S3.

Create Amazon ECR repositories

I store the base images in Amazon ECR but you can do the same with a different container registry if you wish.

In the AWS Management Console, navigate to Elastic Container Registry and create 2 repositories.

The repo names are expected to have specific names

/spark
/spark-py

I used spark-operator as the namespace.

I choose to create public repositories, but you can create private repositories as well.

For public repos, you can log in to ECR with

aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin public.ecr.aws

for private repos, I login with

aws ecr get-login-password --region  | docker login --username AWS --password-stdin .dkr.ecr..amazonaws.com

Build Spark base image

I clone the Spark repository

git clone git@github.com:apache/spark.git

I checkout a specific version

git checkout v3.1.1

I build the project with a specific Hadoop version (3.3.1 in this case). It is important to build it with Kubernetes support by including the corresponding flag.

./build/mvn -Pkubernetes -Dhadoop.version=3.3.1 -DskipTests clean package

The Hadoop version dictates the version of hadoop-aws, which is also 3.3.1 in this case
This in turn dictates the version of aws-java-sdk-bundle, which is 1.11.901.
A recent version of the AWS SDK is needed so that it supports the com.amazonaws.auth.WebIdentityTokenCredentialsProvider.

I build and tag the docker image (including the python profile)

./bin/docker-image-tool.sh -r public.ecr.aws/z2m5w4m3/spark-operator -t v3.1.1-hadoop3.3.1 -p ./resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/python/Dockerfile build

I push the images

./bin/docker-image-tool.sh -r public.ecr.aws/z2m5w4m3/spark-operator -t v3.1.1-hadoop3.3.1 push

and the image tags appear on ECR

Build Spark application image

Finally, I build on top of the Spark base image a new docker image that additionally includes

the correct version of hadoop-aws library
the correct version of aws-java-sdk-bundle
your application code

For a PySpark application, here is an example Dockerfile, where application.py is stored beside the Dockerfile

ARG SPARK_VERSION=3.1.1ARG HADOOP_VERSION=3.3.1FROM ubuntu:bionic as downloaderARG HADOOP_VERSION=3.3.1ARG AWS_SDK_BUNDLE_VERSION=1.11.901RUN apt-get update && apt-get install -y \  wget \  && rm -rf /var/lib/apt/lists/*RUN wget https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/${HADOOP_VERSION}/hadoop-aws-${HADOOP_VERSION}.jar -P /tmp/spark-jars/RUN wget https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-bundle/${AWS_SDK_BUNDLE_VERSION}/aws-java-sdk-bundle-${AWS_SDK_BUNDLE_VERSION}.jar -P /tmp/spark-jars/FROM public.ecr.aws/z2m5w4m3/spark-operator/spark-py:v${SPARK_VERSION}-hadoop${HADOOP_VERSION}USER rootCOPY --from=downloader /tmp/spark-jars/* $SPARK_HOME/jars/COPY application.py /opt/spark-job/application.py

Similarly to the base image, you can push this to a private repository in ECR.

Running a Spark Job

I create a manifest file my-spark-app.yaml

apiVersion: "sparkoperator.k8s.io/v1beta2"kind: SparkApplicationmetadata:  name: my-pyspark-app  namespace: defaultspec:  type: Python  pythonVersion: "3"  mode: cluster  image: ""  imagePullPolicy: Always  mainApplicationFile: "local:///opt/spark-job/application.py"  sparkVersion: "3.1.1"  hadoopConf:    fs.s3a.impl: org.apache.hadoop.fs.s3a.S3AFileSystem    fs.s3a.aws.credentials.provider: com.amazonaws.auth.WebIdentityTokenCredentialsProvider  driver:    cores: 1    coreLimit: "1200m"    memory: "512m"    labels:      version: 3.1.1    serviceAccount: spark  executor:    cores: 1    instances: 1    memory: "512m"    labels:      version: 3.1.1

where the image should be set to the URI of your private ECR repo that holds the image from the previous section. It is important not to forget to include

spec:  hadoopConf:    fs.s3a.impl: org.apache.hadoop.fs.s3a.S3AFileSystem    fs.s3a.aws.credentials.provider: com.amazonaws.auth.WebIdentityTokenCredentialsProvider

The spark Service Account has an associated IAM role that permits access to S3 or other AWS resources. Creating an EKS cluster and Service Accounts can be easily done with AWS CDK. See this post for more details, or at this github repo.

Finally, I apply the manifest

kubectl apply -f my-spark-app.yaml

Conclusion

It is possible to use IAM roles to write to S3 from a Spark Job running with the Spark Operator. These roles are associated to a Service Account. For this, it is necessary to include a recent version of aws-java-sdk-bundle, which requires to build the Spark docker image from the source, with the necessary version of Hadoop.

AWS Lambda function in Scala with Container Image

Panagiotis Katsaroumpas, PhD — Sun, 04 Jul 2021 10:21:26 GMT

In this post, I will describe how you can write your code in Scala, package it in a Docker Container, and run it serverless with AWS Lambda.

Scala project

First, create a Scala project using sbt as the build tool. I have created a github repository with the final result at the end of this post.

Creating a Fat Jar

AWS Lambda does not offer a runtime for Scala. However, I can create a fat Jar and run it with the Java runtime. This works, because a fat Jar is built with all the project dependencies packaged in one big Jar, including all Scala libraries.

I install the sbt-assembly plugin by adding the following line to project/plugins.sbt

addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "1.0.0")

and in build.sbt I set the filename of the fat Jar to something that does not depend on the project name or Scala version

assembly / assemblyOutputPath := file("target/function.jar")

Now, I can create the fat jar by simply typing

sbt assembly

but there is no need to do this step manually because I will do this within docker.

Creating the handler

For the handler, I adapt the Java code from this example and I get the following.

package com.exampleimport java.util.{ Map => JavaMap }import com.amazonaws.lambda.thirdparty.com.google.gson.GsonBuilderimport com.amazonaws.services.lambda.runtime.{Context, RequestHandler}class LambdaHandler() extends RequestHandler[JavaMap[String, String], String] {  val gson = new GsonBuilder().setPrettyPrinting.create  override def handleRequest(event: JavaMap[String, String], context: Context): String = {    val logger = context.getLogger    logger.log(s"ENVIRONMENT VARIABLES: ${gson.toJson(System.getenv)}\n")    logger.log(s"CONTEXT: ${gson.toJson(context)}\n")    logger.log(s"EVENT: ${gson.toJson(event)}\n")    "Hello from Scala!"  }}

Create the Dockerfile

I add a Dockerfile at the root of the project

FROM mozilla/sbt as builderCOPY . /lambda/src/WORKDIR /lambda/src/RUN sbt assemblyFROM public.ecr.aws/lambda/java:11COPY --from=builder /lambda/src/target/function.jar ${LAMBDA_TASK_ROOT}/lib/CMD ["com.example.LambdaHandler::handleRequest"]

What I am doing here is the following

I am using a multi-stage build to build the fat Jar with sbt
Then I start using the official AWS docker image for lambda in Java
I copy over the fat Jar only
You will need to modify the CMD to the full name of your class and method

Build and publish the image

In what follows, you need to replace and with your selected region and your account number. I am assuming that you have installed the AWS CLI and have configured your credentials.

I create a repository in Amazon ECR called lambda-scala-example

aws ecr create-repository --region  --repository-name lambda-scala-example

I login to ECR with docker

aws ecr get-login-password --region  | docker login --username AWS --password-stdin .dkr.ecr..amazonaws.com

I build and tag the image

docker build . -t .dkr.ecr..amazonaws.com/lambda-scala-example

Finally, I push the image

docker push .dkr.ecr..amazonaws.com/lambda-scala-example

I can now see the image in ECR in the AWS Management Console (in the selected region)

The lambda function

I navigate to AWS Lambda in the same AWS region and I create a new function

I select the option Container image and enter a name. Then, I press Browse images to locate the container image URI

I select the image, leave all optional settings as they are and I create the function

Test the function

I test the function and it runs successfully

Conclusion

If Scala is your preferred language, you can easily use it with Lambda functions to create serverless microservices.

The complete example project can be found in this github repository.

Side-by-side deployments with AWS CDK

Panagiotis Katsaroumpas, PhD — Sun, 04 Apr 2021 08:59:13 GMT

AWS CDK allows you to provision AWS resources in a predictable and repeatable manner. Once you have scripted your infrastructure with CDK, the obvious next step is to deploy multiple copies in different environments. To achieve this, and depending on your definition of environment, you might need to think upfront about how you configure your CDK application and how you name your resources. In this post, I show you how to set up your CDK application in a way that allows you to perform side-by-side deployments in three common scenarios.

The 3 scenarios

These are the 3 different scenarios I want to cater for

deploy multiple copies in different accounts,
deploy multiple copies in different regions in the same account, and
deploy multiple copies side-by-side in the same account and region.

GitHub Repo

All code can be found in this GitHub repo: codiply/cdk-typescript-template-with-config. I will present the main ideas of my approach in this post, and refer you to the GitHub repo for the exact implementation of these ideas.

I have built my CDK template on top of the CDK sample app. I have also borrowed code from this post that describes several ways of configuring your CDK application.

My goal

My goal is to be able to deploy to a specific environment (for example dev) with the following command

cdk deploy --all -c config=dev

In case no value is given for the config, I choose to default to the default config. This is convenient when I develop a new application and I only have a single environment.

Configuration files

I am using YAML configuration files placed in folder config/. Each environment has 2 configuration files, for example for environment dev I have

dev.deployment.yamldev.yaml

The dev.deployment.yaml contains the account ID, the region, and a prefix that allows me to deploy stacks in the same region and account. Later on, I will explain how this prefix is used.

AWSAccountID: "123456789"AWSRegion: "eu-west-1"Prefix: "dev"

I have chosen to ignore this file in .gitignore and instead place a template for other developers to copy, modify and use in their deployments. This is also because I don't want to commit this file with my account ID. Depending on your needs, for example if you are in an organisation where everyone uses the same accounts for each environment, you might choose to put this file under source control.

The account and region information is passed into the CDK stacks as an environment. This gives me the peace of mind that I will not accidentally deploy my stacks to the wrong account by using the wrong AWS profile.

The dev.yaml file contains the configuration for the different stacks. Each stack has its own section.

Sns:  TopicName: 'my-sns-topic'Sqs:  QueueName: 'my-sqs-queue'

Loading the right config

To see how these configuration files are loaded, see bin/index.ts and lib/config/config.ts.

In the code, I end up with a Config that looks like this

export interface Config {    readonly Deployment: DeploymentConfig;    readonly Sns: SnsConfig;    readonly Sqs: SqsConfig;}export interface DeploymentConfig{    readonly AWSAccountID : string;    readonly AWSRegion : string;    readonly Prefix: string;}

Naming resources

In order to be able to do side-by-side deployments, I have to follow a religious naming process when

naming the stacks,
naming resources within a stack, and
naming the outputs that I export from stacks.

Stacks need to have unique names when deployed in the same region. For this reason, I prepend the prefix to all stack names.

new SqsQueueStack(app, `${config.Deployment.Prefix}SqsQueueStack`, config.Deployment, config.Sqs, { env: env});new SnsTopicStack(app, `${config.Deployment.Prefix}SnsTopicStack`, config.Deployment, config.Sns, { env: env });

Resources need to have unique names so that you can create several of them in the same region and account without conflicts. In some cases, names need to be unique across regions or even across accounts (for example S3 buckets).

const topic = new sns.Topic(this, 'SnsTopic', {  topicName: `${deployment.Prefix}-${config.TopicName}`});

For stack exports, and ignoring for a moment the fact that I wouldn't be able to export the same name twice, the names should be specific to the deployment, so that each deployment imports the values from the same deployment. In my example of an SNS topic with an SQS subscription, the goal is to import the SNS topic ARN from the same deployment.

I use the prefix in the export

new cdk.CfnOutput(this, 'export-sns-topic-arn', {   exportName: `${deployment.Prefix}-sns-topic-arn`,  value: topic.topicArn });

and use the same naming convention in the import

const topicArn = cdk.Fn.importValue(`${deployment.Prefix}-sns-topic-arn`);

Deploying side-by-side

In my example, I have split the CDK application into 2 stacks (one for SNS and one for SQS), and I have deployed 2 copies of them (dev and pro) in the same account and region. You can see the CloudFormation stacks side by side

I was able to create side-by-side SNS topics and SQS queues, avoiding any naming conflicts.

Working with an AWS profile

While working with a specific deployment, I prefer not to keep typing

cdk --profile my-aws-profile command>

For this reason, I have added an aws-profile.txt with the current AWS profile I am using, and instead of cdk, I use a cdk.sh script

#!/bin/bashset -euo pipefailcdk --profile $(cat ./config/aws-profile.txt) "${@}"

This way I just type

./ckd.sh command>

The aws-profile.txt file is also ignored, and a template is provided in the repo. This is a file that is specific to a developer and does not need to be under source control.

Try it for yourself

First, clone codiply/cdk-typescript-template-with-config. This template is creating an SNS topic and an SQS queue, so deploying this project will not cost you anything.

In case you haven't already, setup AWS CDK for Typescript.

Next, you will need to set up 2 deployments, one for dev and one for pro:

Make a copy of config/dev.deployment.template.yaml into config/dev.deployment.yaml and enter your AWS Acount ID and your region.
Do the same for config/pro.deployment.template.yaml, using the same account and region.

Then to deploy all the stacks, run

./cdk.sh deploy --all -c config=dev

and repeat for config=pro. If you are using an AWS profile that is not your default, add --profile .

To clean up, you can either

Run the same commands but with destroy instead of deploy, or
simply delete the stacks from CloudFormation in AWS Console.

Conclusion

AWS CDK allows you to script your AWS infrastructure and perform repeatable deployments. By configuring your CDK project and naming all resources wisely, you can deploy multiple copies of a stack side by side even in the same account and region.