Zip Archive for key prefix with S3 Object Lambda

S3 Object Lambda allows you to run code when an object is requested from S3. You can return a transformed version of the actual file stored in the S3 bucket, or you can even return objects that do not exist in S3 and are dynamically created at request time. In this post I show how you can create a zip archive containing all files under a specific key prefix.

The code

This github repository contains a CDK project with an example stack you can deploy into your own account.

Implementation

At a high level, you will need the following

An S3 bucket
A standard S3 Access Point
An execution role for the Lambda function
The Lambda Function
An Object Lambda Access Point that will be using the standard S3 Access Point as the Supporting Access Point

I create the standard access point and name it deepdive-zip-archive-standard-access-point

The role of the Lambda function will need to have the permissions to do s3-object-lambda:WriteGetObjectResponse so that it can write the response. Normally, that's all you need, because the function will receive an event with a field getObjectContext containing an inputS3Url that has embedded credentials to read the underlying object. However, in this case, we want to list the objects with the specific key prefix and read possibly more than one object. For that reason, we give the Lambda function read-only access to the bucket (via the supporting access point).

So the Lambda execution role will need the following policy (plus the AWSLambdaBasicExecutionRole managed policy).

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": "s3-object-lambda:WriteGetObjectResponse",
            "Resource": "*",
            "Effect": "Allow"
        },
        {
            "Action": [
                "s3:List*",
                "s3:Get*"
            ],
            "Resource": [
                "arn:aws:s3:<region>:<account-number>:accesspoint/deepdive-zip-archive-standard-access-point",
                "arn:aws:s3:<region>:<account-number>:accesspoint/deepdive-zip-archive-standard-access-point/object/*"
            ],
            "Effect": "Allow"
        }
    ]
}

This is the Lambda function code itself:

import os
import boto3
import zipfile
from io import BytesIO
from urllib.parse import urlparse

ACCOUNT_ID = os.environ['ACCOUNT_ID']
ACCESS_POINT_ALIAS = os.environ['ACCESS_POINT_ALIAS']

s3_client = boto3.client('s3')
s3_resource = boto3.resource('s3')
s3_paginator = s3_client.get_paginator('list_objects')

def main(event, context):
    object_get_context = event["getObjectContext"]

    print(object_get_context)

    request_route = object_get_context["outputRoute"]
    request_token = object_get_context["outputToken"]
    s3_url = object_get_context["inputS3Url"]

    prefix = urlparse(s3_url).path[1:]

    in_memory_zip = BytesIO()

    with zipfile.ZipFile(in_memory_zip, mode='w', compression=zipfile.ZIP_DEFLATED) as zip:
        page_iterator = s3_paginator.paginate(Bucket=ACCESS_POINT_ALIAS, Prefix=prefix)
        for page in page_iterator:
            if 'Contents' in page:
                for entry in page['Contents']:
                    key = entry['Key']
                    body = s3_resource.Object(ACCESS_POINT_ALIAS, key).get()['Body'].read()
                    zip.writestr(key, body)

    s3_client.write_get_object_response(
        Body=in_memory_zip.getvalue(),
        RequestRoute=request_route,
        RequestToken=request_token)

    return {'status_code': 200}

I am extracting the key of the requested object from inputS3Url. This key is used as the key prefix. I list all objects starting with the given prefix, and then read them one by one adding them to a zip archive in memory. At the end, I write the zip file at the output route using the output token (both provided in the event passed into the lambda).

Once I have created the standard access point and the lambda function, I can create the S3 Object Lambda Access Point

Testing it

I have populated the underlying S3 bucket with some files within different prefixes.

Now, it is just a matter of requesting a prefix as if requesting the specific key from S3. I am using the AWS CLI for this, and to make it work, I need to use the full ARN of the Object Lambda Access Point as the bucket parameter.

aws s3api get-object --key 2020 --bucket arn:aws:s3-object-lambda:<region>:<account number>:accesspoint/deepdive-zip-archive-object-lambda 2020.zip

In this case, I request the key 2020 that does not exist. The lambda function zips all objects with key starting with that prefix and returns the archive.

Limitations

With the specific implementation, because I am creating the zip file in memory, I am limited by the 10GB memory limit of Lambda.

Conclusion

With S3 Object Lambda we can dynamically create S3 objects at request time and this allows us to create, for example, a ZIP archive with all objects under a specific key prefix.

Zip Archive for key prefix with S3 Object Lambda

The code

Implementation

Testing it

Limitations

Conclusion

Comments

More from this blog

AI Agent with controlled access to AWS services in Strands Agents

Bedrock Knowledge Base with S3 Vector Index in AWS CDK

Bedrock Agent with open-ended Action using SQL

Breaking the Restaurant Reservation Agent

Restaurant Reservation Agent with Amazon Bedrock and AWS CDK

Command Palette

The code

Implementation

Testing it

Limitations

Conclusion

Comments

More from this blog