A Primer on Docker for lambdr

Introduction

This vignette is intended as a primer for the not-beginner¹ R user who has no (or minimal) experience with Docker.

One of the main ways to create AWS Lambda functions is with Docker images. Don’t worry if you don’t know what that is yet. What’s important to know is that for lambdr to work you need to have a valid Docker image.

The sections that follow will explain what a Docker image is and how to make one for use with lambdr.

Images, containers, and Dockerfiles

Terminology

An image is a bit like a simple, switched-off virtual computer. Or, as Docker puts it, “a standardized package that includes all of the files, binaries, libraries, and configurations to run a container.”

And what’s a container? Well, “A container is simply an isolated process with all of the files it needs to run.” That sounds kind of vague, and it is, so for the purposes of lambdr you can think of it as an instance of the virtual computer (image) that’s switched on and can run your R code. It’s independent of your own computing environment, but you can give it access to things like directories, so that there’s a live link between the file system in the container and your local code.

The image doesn’t come from nowhere. It’s created from a Dockerfile, which is “a text-based document” that “provides instructions to the image builder on the commands to run, files to copy, startup command, and more.”

It’s also very useful to know the following terms:

You build an image from a Dockerfile
You run a container from the image

This is what the flow looks like, from left to right:

For lambdr

For your R process, the image needs to contain a Linux distribution, your code, R itself, the R packages your code needs, and any system dependencies for Linux that R and the R packages require.

This is pretty similar to how most of us work locally. We have an operating system - typically macOS, Windows, or Linux. We install R. We install R packages. And we install any system dependencies that we need along the way, e.g. imagemagick, or Postgres.

Dockerfile example

Below is an example of a Dockerfile that could be used with lambdr. The purpose is just to show a minimal example that explains the basic concepts. It is not as an example Dockerfile to be used in production.

If you are confident that you understand Dockerfiles, images, and containers, and simply want a production-ready example to use as a reference, please see the article Placing an R Lambda Runtime in a Container.

The Dockerfile

Here is a full Dockerfile, followed by item-by-item explanations:

FROM docker.io/rocker/r-ver:4.4

RUN Rscript -e "options(warn = 2); install.packages('pak')"

RUN Rscript -e "options( \ 
    warn = 2, \
    repos = c(CRAN = 'https://p3m.dev/cran/__linux__/jammy/2024-07-30') \
    ); \ 
    pak::pak( \ 
    c( \ 
    'httr2', \
    'lambdr' \
    ) \
    )"

# Lambda setup
RUN mkdir /R
COPY R/ /R
RUN chmod 755 -R /R

ENTRYPOINT Rscript R/runtime.R
CMD ["handler"]

Instructions

Dockerfiles contain instructions. In this example there are instructions like FROM and RUN.

Base image and layers

The first instruction is:

FROM docker.io/rocker/r-ver:4.4

To understand FROM you need to know that Docker images are built up of layers. Each layer is added to the ones that preceded it. This even includes images someone else has built and put online. That’s great, because they can form the base on which you continue to build - which is also why they’re known as base images.

Base images are hosted in repositories. In our example, the repository is docker.io. The image in question is provided by rocker, who make reliable Docker containers for R environments. The image has a name, r-ver, and a tag, 4.4.

The instruction FROM simply tells the builder to download and use the base image as a starting point.

This particular base image has a version of Linux (Ubuntu ‘Jammy’), R, and the system dependencies required for using R. Information about the image was available on the rocker website (at time of writing).

Installing packages

The next instruction used is RUN. Every time RUN is used it makes a new image layer and executes whatever argument/s it has been given.

Here, it is used to install R packages - first pak, which is itself an R package manager, and then httr2 and lambdr by using pak:

RUN Rscript -e "options(warn = 2); install.packages('pak')"

RUN Rscript -e "options( \ 
    warn = 2, \
    repos = c(CRAN = 'https://p3m.dev/cran/__linux__/jammy/2024-07-30') \
    ); \ 
    pak::pak( \ 
    c( \ 
    'httr2', \
    'lambdr' \
    ) \
    )"

This is probably not how you’re used to executing R code. It looks like this for a few reasons.

Using Rscript, which is “A binary front-end to R, for use in scripting applications”. It can be given R files or, like in this case, in-line scripts to execute. Basically, it’s a non-interactive method of executing R code
Stringified R code littered with \. The \ escapes any newlines

A warning level and a ‘special’ CRAN repository are also supplied. These ensure that the Docker image will error out if there’s a problem installing packages, and that the packages are the latest amd64 binary versions for Ubuntu ‘Jammy’ as at the date supplied. For more detail about why these are set, see Placing an R Lambda Runtime in a Container.

For now, just know that RUN added two layers where R packages get installed.

Files and permissions

In this section the RUN instruction is used again, alongside COPY:

# Lambda setup
RUN mkdir /R
COPY R/ /R
RUN chmod 755 -R /R

A directory called /R is made in the image, then the local folder of R code is copied into the image folder. COPY uses the same syntax as cp, i.e. source destination.

Then, chmod 755 -R gets applied to the /R folder. In short, this just makes sure the files in the folder can be executed in the image.

Runtime and handler

The final instructions are ENTRYPOINT and CMD. At this point, all the R code, packages, and dependencies have been installed. All that’s left is to tell Lambda where to find the runtime interface client and handler function - both of which are new concepts explained below.

ENTRYPOINT Rscript R/runtime.R
CMD ["handler"]

The runtime.R file contains two things necessary for Lambda to run:

A function called handler()

The orchestrating function that takes input, does stuff, and returns output
The equivalent to ‘main’ in other programming languages

The lambdr function lambdr::start_lambda()

This starts the R runtime interface client, which will then listen for any incoming events and pass return values back out of the Lambda function

An example runtime.R is given in the article Placing an R Lambda Runtime in a Container. For this introduction all you need to know is that ENTRYPOINT needs to execute Rscript on R/runtime.R so that lambdr::start_lambda() gets called.

CMD simply takes the name of the handler function. You can call the handler whatever you like, but as there is only ever one handler per Lambda, by convention we just call it handler. The CMD gets passed as an argument to lambdr::start_lambda().

How to use the Dockerfile

You’re probably wondering “How do I use the Dockerfile!?”

We won’t get into that here.

Instead, the article Placing an R Lambda Runtime in a Container has sections about development versus deployment containers, and how to practically figure that stuff out for use with lambdr. We recommend reading the whole article.

Before moving on to the article, you may also want to look into the basics of Docker more generally. You can get a good feel for what using Docker looks like in practice by watching Docker Tutorial for Beginners by James Murphy. Or, if you want a hands-on tutorial there is the official Docker 101 Tutorial.

If you’re no longer a beginner, but aren’t sure if you’re an intermediate or advanced R programmer, perhaps you are a “not-beginner”! For more thoughts on this topic see Meghan Harris’s blog post How I Became A “Not-Beginner” in R ↩︎