This article shows how to bundle up your R code into a Docker container and deploy it as an AWS Lambda function using lambdr. We provide a self-contained working example that can be used as a test/template for your own project.
It is not an exhaustive or authoritative resource on all things AWS Lambda and Docker. Rather, it is a guide intended to get you up and running.
You will need an AWS account. Remember, you are responsible for the account and any billing costs that are incurred.
You need to be comfortable with R and have at least a vague understanding of Dockerfiles, Docker images, and containers. If you are not familiar with Docker, see the article A Primer on Docker for lambdr.
For AWS, you’ll need a vague understanding of one or more of the AWS Console, AWS CLI, or AWS CDK, depending on which deployment examples you intend to follow later in this article.
In this section you will see everything that goes into making an absolutely minimal deployment-ready R project with lambdr.
Ideally, Lambda functions should be pretty simple and do their business briskly. In reality, they may take several minutes to run and interact with various other AWS resources such as databases and S3 buckets.
The example given here is basic enough to understand, and realistic enough to demonstrate some good project structure habits when making a Lambda. It purposefully doesn’t require access to other AWS resources, like S3 buckets or databases: Managing and granting permissions are far out of the scope of lambdr.
The example project is called flags. When given a full or partial country name it queries the REST Countries API and returns information about the country’s flag.
.
├── Dockerfile
└── R
├── functions.R
└── runtime.R
The minimal structure for a project only needs to contain two elements
R/
containing the R scriptsThe Dockerfile packages up the R scripts in a Linux distribution with R, any required R packages, and system dependencies for both R and the R packages.
One of the R scripts should always be called
runtime.R
.
First let’s look at the R code.
The functions in this file get sourced and used in
runtime.R
.
Don’t worry if you aren’t familiar with calling RESTful APIs. In
summary, create_request()
makes a request object to ask https://restcountries.com about a country’s flag. Then
perform_request()
sends the request to the website,
checking the response with the helper function
unsuccessful()
.
library(httr2)
create_request <- function(country) {
stopifnot("'country' must be a string containing a full or partial country name, e.g. 'ghana'" = nzchar(country))
country <- utils::URLencode(country)
base_url <- "https://restcountries.com/v3.1/name/"
httr2::request(base_url) |>
httr2::req_user_agent(
"lambdr example (https://github.com/mdneuzerling/lambdr/)"
) |>
httr2::req_url_path_append(country) |>
httr2::req_url_query(fields = "flags")
}
unsuccessful <- function(resp) {
body <- httr2::resp_body_json(resp)
msg <- sprintf("\nHTTP %s: %s.\n", body$status, body$message)
stop(
msg,
"Check supplied country name is valid and/or the server status."
)
}
perform_request <- function(req) {
resp <- req |>
httr2::req_error(is_error = \(x) FALSE) |>
httr2::req_perform()
if (resp[["status_code"]] != 200L) {
unsuccessful(resp)
}
return(httr2::resp_body_json(resp))
}
This file orchestrates the rest of the R code and starts up the lambdr runtime interface client.
runtime.R
is short and simple. There are three main
sections
handler()
function definitionYour Lambda function will likely1 take a JSON payload
with some inputs. lambdr will convert that JSON into an
R list and pass the items to handler()
.
For example, a payload for this flags
Lambda
could be {"country": "ghana"}
. lambdr
would convert it to list(country = "ghana")
, then pass it
to handler()
for us.
Note that if lambdr::start_lambda()
is called
interactively it will throw an error. This is intentional.
lambdr relies on the presence of environment variables
that are only available when in the deployed Lambda execution
environment. However, you may want to test your code in interactive
sessions during development, so we simply wrap the function in
if (!interactive())
.
library(lambdr)
library(logger)
source(file.path("R", "functions.R"))
logger::log_threshold(logger::DEBUG)
handler <- function(country) {
logger::log_info("Event received: ", country)
req <- create_request(country)
resp <- perform_request(req)
return(resp)
}
if (!interactive()) {
lambdr::start_lambda()
}
The Dockerfile builds on the one in A Primer on Docker for lambdr. That article explains each item in the Dockerfile step-by-step. Here, we emphasise what makes the Dockerfile below production-ready.
Essentially, it’s all about version control.
Once the project is ready for deployment you should pin the base
image to a specific version. To do this you provide the image’s
digest, which is a hash value. Here we use
@sha256:429...
which is for the amd64 version of
rocker/r-ver:4.4
at time of writing. Using the tag is not
sufficient because the image it refers to can be subject to change - but
the image specified by the digest will not.
To find a digest for an image you’ve already pulled you can use
docker images --digests
. Or you can get it from the
repository where you found the image.
The other version control aspect here is using the Posit Public Package Manager to get amd64 Ubuntu binaries from a snapshot of CRAN. This is simple, but not necessary. For example, you could use renv instead. But the point is, version the R packages - somehow!
FROM docker.io/rocker/r-ver:4.4@sha256:429c1a585ab3cd6b120fe870fc9ce4dc83f21793bf04d7fa2657346fffde28d1
# options(warn=2) will make the build error out if package doesn't install
RUN Rscript -e "options(warn = 2); install.packages('pak')"
# Using {pak} to install R packages: it resolves Ubuntu system dependencies AND
# the R dependency tree
RUN Rscript -e "options( \
warn = 2, \
repos = c(CRAN = 'https://p3m.dev/cran/__linux__/jammy/2024-07-06') \
); \
pak::pak( \
c( \
'httr2', \
'lambdr' \
) \
)"
# Lambda setup
RUN mkdir /R
COPY R/ /R
RUN chmod 755 -R /R
ENTRYPOINT Rscript R/runtime.R
CMD ["handler"]
That’s it! These three files are all you need
.
├── Dockerfile
└── R
├── functions.R
└── runtime.R
The following sections expand on considerations such as local testing and how to deploy the project into AWS.
For R you can use any base image you want. So long as you have all the system dependencies required for R and the R packages, you only need to have lambdr installed and used as in the minimal example given above.
We recommend Rocker’s r-ver images.
They offer a tested, versioned R stack. Newer versions can be used with the Posit Public Package Manager to install Ubuntu binaries of packages as at a certain date on CRAN. They also allow use of pak, which makes life very convenient in terms of finding and installing R package system dependencies.
The AWS Lambda ‘provided’ images.
They are minimal OS images. That means the base image is fairly small, which is positive. However, you have to install R and any dependencies yourself.
More importantly, at time of writing the current OS in the newer images is Amazon Linux 2, which is not based on a singular Linux distro, but rather a blend of multiple. That makes installing R package binaries and system dependencies more challenging, and slow.
The previous version of the OS (Amazon Linux 2023) is drastically different to Amazon Linux 2 and will no longer receive support from AWS by the end of summer 2025.
Up until this point we have only shown and discussed a Dockerfile for deployment. While you are doing development work, it is a good idea to have a dev Dockerfile.
The dev Dockerfile should mimic the deployment one as much as possible, but will also have any extra features you need to make your life easier as a developer.
The example given below can be added to the flags
project
.
├── Dockerfile
├── Dockerfile.dev
└── R
├── functions.R
└── runtime.R
FROM ghcr.io/rocker-org/devcontainer/r-ver:4.4@sha256:e99cfe63efd5d79f44146d8be8206019fd7a7230116aa6488097ee660d6aa5dc
# Install the Lambda Runtime Interface Emulator, which can be used for locally
# invoking the function.
# See https://github.com/aws/aws-lambda-runtime-interface-emulator for details
RUN apt-get update && apt-get -y install --no-install-recommends curl
RUN curl -Lo /usr/local/bin/aws-lambda-rie https://github.com/aws/aws-lambda-runtime-interface-emulator/releases/latest/download/aws-lambda-rie && \
chmod +x /usr/local/bin/aws-lambda-rie
# options(warn=2) will make the build error out if package doesn't install
RUN Rscript -e "options(warn = 2); install.packages('pak')"
# Using {pak} to install R packages: it resolves Ubuntu system dependencies AND
# the R dependency tree
RUN Rscript -e "options( \
warn = 2, \
repos = c(CRAN = 'https://p3m.dev/cran/__linux__/jammy/2024-07-06') \
); \
pak::pak( \
c( \
'httr2', \
'lambdr' \
) \
)"
# Lambda setup
RUN mkdir /R
# Needs to be set to use aws-lambda-rie. It is the path up to runtime.R
ENV LAMBDA_TASK_ROOT="/R"
# Optional for local testing
# 900s, i.e. 15 min, the max time a lambda can run for.
ENV AWS_LAMBDA_FUNCTION_TIMEOUT=900
The differences as compared to the deployment Dockerfile are
FROM ghcr.io/rocker-org/devcontainer/r-ver:4.4@sha256:e99cfe63efd5d79f44146d8be8206019fd7a7230116aa6488097ee660d6aa5dc
This time we are using a devcontainer base image, which “allows you to use a container as a full-featured development environment”. This is made by Rocker and it’s the same as the r-ver:4.4 image but with some extra tooling.
The most valuable of these is having radian and its dependencies installed, plus some of the other usual setup required to use VS Code as an IDE for R. Using the actual devcontainer VS Code extension is an exercise left for the reader, and with a word of warning: One of the authors of this vignette has seen some M2 MacBook Pros with 16GB of RAM struggling to run the devcontainer extension.
If you prefer RStudio as your IDE as an alternative you could use a Rocker RStudio Server image.
While you’re developing it’s a good idea to work out of the dev container.
One option for doing so is to have a “build script” that builds the image and runs a container.
.
├── build.sh
├── Dockerfile
├── Dockerfile.dev
└── R
├── functions.R
└── runtime.R
#!/bin/sh
docker stop flags && docker container rm flags
docker build -t flags:latest \
-f Dockerfile.dev .
docker run \
-p 9000:8080 \
-it \
--rm \
-v ~/.aws/:/root/.aws \
-v ./R:/R \
--name flags \
flags:latest \
bash
You might run this script by doing
e.g. bash build.sh
If a container already exists it will be stopped and removed. Note
that if there isn’t an image you’ll see an error
Error response from daemon: No such container: flags
– this
is expected.
The image will only build if Dockerfile.dev
has been
altered or is being built for the first time.
The options being given to docker run
are:
-p
Publishes port 9000 to host’s 8080. For local
testing using the Lambda RIE (see below)-it
Start the container with an interactive
terminal--rm
Remove the container when it is exited-v
Mount. Creates a live link between the host and
container file system
~/.aws
makes AWS credentials available in the
container. Can be useful if you need to use {paws}. If you don’t need AWS
creds in the container then you shouldn’t mount this volume./R
contains the lambda codeIn the dev Dockerfile we installed the AWS Runtime Interface Emulator (RIE).
The emulator gets as close to the environment of a Lambda as is possible without actually pushing up to AWS and invoking.
Two small shell scripts in local-testing
below, plus the
build script from the previous section (required for port forwarding)
are all we need to add:
.
├── local-testing
│ ├── event.sh
│ └── start-rie.sh
├── build.sh
├── Dockerfile
├── Dockerfile.dev
└── R
├── functions.R
└── runtime.R
start-rie.sh
starts the emulator with our handler:
Run bash local-testing/start-rie.sh
from inside
the running container. This will start the emulator, which runs
an HTTP endpoint, similar to what happens in the deployed
Lambda. The endpoint will be waiting for a payload to pass to
the R code and this makes the container’s terminal busy. If you need to
stop the process just press Ctrl
+ C
.
Sending a payload to the emulator is the equivalent of invoking the
Lambda. This is done with event.sh
:
#!/bin/bash
port="9000"
endpoint="http://localhost:$port/2015-03-31/functions/function/invocations"
curl -XPOST $endpoint -d '{"country":"'"$1"'"}'
Simply run bash local-testing/event.sh someCountry
replacing someCountry
with a partial or full country name.
The result will appear in your terminal. If you go look at the container
terminal all the logs (equivalent to what would appear in
CloudWatch) from the execution will be present. The RIE will
run until you stop it with Ctrl
+ C
.
If your own Lambda doesn’t actually take any parameters then your payload should be an empty json:
#!/bin/bash
port="9000"
endpoint="http://localhost:$port/2015-03-31/functions/function/invocations"
curl -XPOST $endpoint -d '{}'
And finally, one more option if using the devcontainer VS Code
extension. Because you can spawn multiple terminals from within the
container, you can start the RIE and send it a payload
from another terminal inside the container. In this case the port will
be 8080
, which is easy to add logic for in
event.sh
because of the devcontainer envvar
$DEVCONTAINER
:
At this point, you will have at minimum a project
with Dockerfile, some R code, and a runtime.R
that starts
lambdr
:
.
├── Dockerfile
└── R
├── functions.R
└── runtime.R
Deployment is the act of turning these files into an AWS Lambda function that can be invoked. Here we provide some rough instructions for two common ways of deploying: via the AWS Console (the website), and the AWS Cloud Development Kit (CDK).
For both options you will need to have the AWS CLI installed (instructions) as a prerequisite. The CLI is how you interact with AWS from the terminal.
The following instructions use the example project given earlier,
flags
.
The steps are:
First, in a terminal cd
to the project and build the
image:
Then, create a repository in ECR either by using the CLI or the
Console. If you do it in the Console you pretty much just go to the
ECR service, click Create
, and call it
flags
.
Or you can use the CLI:
Make a note of the URI, which is the resource identifier of the created repository.
The image can now be pushed to the repository. This part has to be
done via the CLI. You can get all the commands ready-made for you via
the Console by clicking on the repo name in ECR, then
View push commands
. Or, you can replace the username
123456789123
and the region region-name-1
in
the commands below to do the same thing.
Note: You don’t need a Docker account for the
docker login
command.
docker tag flags:latest 123456789123.dkr.ecr.region-name-1.amazonaws.com/flags
aws ecr get-login-password | docker login --username AWS --password-stdin 123456789123.dkr.ecr.region-name-1.amazonaws.com/flags
docker push 123456789123.dkr.ecr.region-name-1.amazonaws.com/flags:latest
Now that the image is in ECR it can be used to make the
Lambda function. You could make it from the command line, but
this requires an IAM Role to be configured and ready for the function.
That is beyond the scope of lambdr. If you have a Role and prefer to use
the CLI, see the examples in
aws lambda create-function help
In the Console go to the Lambda service. Click
Create a function
. Choose the container image option, give
it a name (flags
is fine) and choose the image from the
ECR repository. Everything else can be left default.
It will take a minute for the function to be made.
Once it is, you can test it by scrolling down and clicking on the
Test
tab. Edit the Event JSON
to be
{"country": "namibia"}
, then click the orange
Test
button. It should be successful. To see the response
click Details
, beneath the big green tick.
However, there’s a good chance of a timeout because the default value
for a Lambda is only 3 seconds. To increase it, click on the
Configuration
tab, then Edit
, and bump it up.
10 seconds is fine in this case.
Alternatively the Lambda can be invoked from the CLI:
aws lambda invoke --function-name flags \
--invocation-type RequestResponse --payload '{"country": "namibia"}' \
/tmp/response.json --cli-binary-format raw-in-base64-out
See the response with:
[{"flags":{"png":"https://flagcdn.com/w320/na.png","svg":"https://flagcdn.com/na.svg","alt":"The flag of Namibia features a white-edged red diagonal band that extends from the lower hoist-side corner to the upper fly-side corner of the field. Above and beneath this band are a blue and green triangle respectively. A gold sun with twelve triangular rays is situated on the hoist side of the upper triangle."}}]
The CDK allows you to programatically create application stacks and all of the associated resources they need, like Lambdas, Step Functions, and so on. It’s an alternative to clicking around in the AWS Console and means that your stack can (theoretically) be rebuilt at any time with just a few commands.
First install the CDK CLI.
Once the CDK CLI is installed:
Make a directory somewhere called LambdrExample
, then
cd
into the directory and run the following in the
terminal:
You will now have a bunch of files and folders containing boilerplate
code and libraries. We’re only interested in the bin
and
lib
directories. You also need to add a new directory
called lambda
, and to that, the flags
project,
like below:
.
├── bin
│ └── lambdr_example.ts
├── lambda
│ └── flags
│ ├── Dockerfile
│ └── R
│ ├── functions.R
│ └── runtime.R
└── lib
└── lambdr_example-stack.ts
Replace the contents of bin/lambdr_example.ts
with
this:
#!/usr/bin/env node
import "source-map-support/register";
import * as cdk from "aws-cdk-lib";
import { LambdrExampleStack } from "../lib/lambdr_example-stack.ts";
const app = new cdk.App();
new LambdrExampleStack(app, "LambdrExampleStack", {
// Your account number and region go in the env below
env: { account: "111122223333", region: "my-region-1" },
});
Replace the contents of lib/lambdr_example-stack.ts
with
this:
import * as cdk from "aws-cdk-lib";
import { Construct } from "constructs";
import * as lambda from "aws-cdk-lib/aws-lambda";
export class LambdrExampleStack extends cdk.Stack {
constructor(scope: Construct, id: string, props?: cdk.StackProps) {
super(scope, id, props);
this.createFlagsLambda();
}
createFlagsLambda(): lambda.IFunction {
const flagsLambda = new lambda.DockerImageFunction(this, "flags", {
functionName: "flags",
code: lambda.DockerImageCode.fromImageAsset("lambda/flags"),
timeout: cdk.Duration.seconds(15),
});
return flagsLambda;
}
}
You then need to run
Followed by
Some building will happen and then you will be prompted as to whether
you wish to deploy a couple of changes. If you feel comfortable, enter
y
and hit return.
Note: If you get a failure because no bucket and/or ECR
repository exist, this is probably because you have previously
bootstrapped your account and now have “stack drift”. To resolve this,
delete the CDKToolkit
stack from CloudFormation and
re-bootstrap. For more information, see SO here.
If all has gone well you should see a green tick and sparkles with deployment time.
Test by invoking from the CLI:
aws lambda invoke --function-name flags \
--invocation-type RequestResponse --payload '{"country": "namibia"}' \
/tmp/response.json --cli-binary-format raw-in-base64-out
See the response with:
[{"flags":{"png":"https://flagcdn.com/w320/na.png","svg":"https://flagcdn.com/na.svg","alt":"The flag of Namibia features a white-edged red diagonal band that extends from the lower hoist-side corner to the upper fly-side corner of the field. Above and beneath this band are a blue and green triangle respectively. A gold sun with twelve triangular rays is situated on the hoist side of the upper triangle."}}]
If you make changes to the Dockerfile or R code of the Lambda, simply
re-deploy with another cdk deploy
.
Delete the CDK stack using Cloudformation via the AWS Console, or via
the terminal with cdk destroy
. You should also delete the
ECR repository otherwise it will sit in your account and you will be
charged for usage (a very small amount, but still).
To clean up the resources made using this guide, use the AWS Console to find and delete items in
Though typically Lambdas will take an input, you can make them to simply be invoked with no arguments, e.g. a Lambda that scrapes the same website every day and runs on a schedule.↩︎