--- title: "A Primer on Docker for lambdr" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{A Primer on Docker for lambdr} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ## Introduction This vignette is intended as a primer for the not-beginner^[If you're no longer a beginner, but aren't sure if you're an intermediate or advanced R programmer, perhaps you are a "not-beginner"! For more thoughts on this topic see Meghan Harris's blog post [How I Became A “Not-Beginner” in R](https://thetidytrekker.com/post/how-i-became-a-not-beginner-in-r/not-beginner)] R user who has no (or minimal) experience with Docker. One of the main ways to create _AWS Lambda_ functions is with Docker images. Don't worry if you don't know what that is yet. What's important to know is that **for lambdr to work you need to have a valid Docker image**. The sections that follow will explain what a Docker image is and how to make one for use with lambdr. ## Images, containers, and Dockerfiles ### Terminology An **image** is a bit like a simple, switched-off virtual computer. Or, as Docker [puts it](https://docs.docker.com/guides/docker-concepts/the-basics/what-is-an-image/), "a standardized package that includes all of the files, binaries, libraries, and configurations to run a container."
![](docker-image.svg)
And what's a **container**? [Well](https://docs.docker.com/guides/docker-concepts/the-basics/what-is-a-container/), "A container is simply an isolated process with all of the files it needs to run." That sounds kind of vague, and it is, so for the purposes of lambdr you can think of it as an instance of the virtual computer (image) that's switched on and can run your R code. It's independent of your own computing environment, but you can give it access to things like directories, so that there's a live link between the file system in the container and your local code.
![](docker-container.svg)
The image doesn't come from nowhere. It's created from a **Dockerfile**, [which is](https://docs.docker.com/guides/docker-concepts/building-images/writing-a-dockerfile/) "a text-based document" that "provides instructions to the image builder on the commands to run, files to copy, startup command, and more."
![](docker-dockerfile.svg)
It's also very useful to know the following terms: - You **build** an image from a Dockerfile - You **run** a container from the image This is what the flow looks like, from left to right:
![](dockerfile-image-container.svg)
### For lambdr For your R process, the image needs to contain a Linux distribution, your code, R itself, the R packages your code needs, and any system dependencies for Linux that R and the R packages require. This is pretty similar to how most of us work locally. We have an operating system - typically macOS, Windows, or Linux. We install R. We install R packages. And we install any system dependencies that we need along the way, e.g. imagemagick, or Postgres. ## Dockerfile example Below is an example of a Dockerfile that could be used with lambdr. The purpose is just to show a minimal example that explains the basic concepts. It is **not** as an example Dockerfile to be used in production. If you are confident that you understand Dockerfiles, images, and containers, and simply want a production-ready example to use as a reference, please see the article [Placing an R Lambda Runtime in a Container](lambda-runtime-in-container.html). ### The Dockerfile Here is a full Dockerfile, followed by item-by-item explanations: ```{.dockerfile} FROM docker.io/rocker/r-ver:4.4 RUN Rscript -e "options(warn = 2); install.packages('pak')" RUN Rscript -e "options( \ warn = 2, \ repos = c(CRAN = 'https://p3m.dev/cran/__linux__/jammy/2024-07-30') \ ); \ pak::pak( \ c( \ 'httr2', \ 'lambdr' \ ) \ )" # Lambda setup RUN mkdir /R COPY R/ /R RUN chmod 755 -R /R ENTRYPOINT Rscript R/runtime.R CMD ["handler"] ``` ### Instructions Dockerfiles contain **instructions**. In this example there are instructions like `FROM` and `RUN`. ### Base image and layers The first instruction is: ```{.dockerfile} FROM docker.io/rocker/r-ver:4.4 ``` To understand `FROM` you need to know that Docker images are built up of layers. Each layer is added to the ones that preceded it. This even includes images someone else has built and put online. That's great, because they can form the base on which you continue to build - which is also why they're known as **base images**. Base images are hosted in repositories. In our example, the repository is `docker.io`. The image in question is provided by [rocker](https://rocker-project.org/), who make reliable Docker containers for R environments. The image has a **name**, `r-ver`, and a **tag**, `4.4`. The instruction `FROM` simply tells the builder to download and use the base image as a starting point. This particular base image has a version of Linux (Ubuntu 'Jammy'), R, and the system dependencies required for using R. Information about the image was available on the rocker website (at time of writing). ### Installing packages The next instruction used is `RUN`. Every time `RUN` is used it makes a new image layer and executes whatever argument/s it has been given. Here, it is used to install R packages - first [pak](https://pak.r-lib.org/), which is itself an R package manager, and then `httr2` and `lambdr` by using `pak`: ```{.dockerfile} RUN Rscript -e "options(warn = 2); install.packages('pak')" RUN Rscript -e "options( \ warn = 2, \ repos = c(CRAN = 'https://p3m.dev/cran/__linux__/jammy/2024-07-30') \ ); \ pak::pak( \ c( \ 'httr2', \ 'lambdr' \ ) \ )" ``` This is probably not how you're used to executing R code. It looks like this for a few reasons. 1) Using `Rscript`, which is "A binary front-end to R, for use in scripting applications". It can be given R files or, like in this case, in-line scripts to execute. Basically, it's a non-interactive method of executing R code 2) Stringified R code littered with `\`. The `\` escapes any newlines A warning level and a 'special' CRAN repository are also supplied. These ensure that the Docker image will error out if there's a problem installing packages, and that the packages are the latest amd64 binary versions for Ubuntu 'Jammy' as at the date supplied. For more detail about why these are set, see [Placing an R Lambda Runtime in a Container](lambda-runtime-in-container.html). For now, just know that `RUN` added two layers where R packages get installed. ### Files and permissions In this section the `RUN` instruction is used again, alongside `COPY`: ```{.dockerfile} # Lambda setup RUN mkdir /R COPY R/ /R RUN chmod 755 -R /R ``` A directory called `/R` is made in the image, then the local folder of R code is copied into the image folder. `COPY` uses the same syntax as `cp`, i.e. `source` `destination`. Then, `chmod 755 -R` gets applied to the `/R` folder. In short, this just makes sure the files in the folder can be executed in the image. ### Runtime and handler The final instructions are `ENTRYPOINT` and `CMD`. At this point, all the R code, packages, and dependencies have been installed. All that's left is to tell _Lambda_ where to find the runtime interface client and `handler` function - both of which are new concepts explained below. ```{.dockerfile} ENTRYPOINT Rscript R/runtime.R CMD ["handler"] ``` The `runtime.R` file contains two things necessary for _Lambda_ to run: 1) A function called `handler()` - The orchestrating function that takes input, does stuff, and returns output - The equivalent to 'main' in other programming languages 2) The `lambdr` function `lambdr::start_lambda()` - This starts the R runtime interface client, which will then listen for any incoming events and pass return values back out of the _Lambda_ function An example `runtime.R` is given in the article [Placing an R Lambda Runtime in a Container](lambda-runtime-in-container.html). For this introduction all you need to know is that `ENTRYPOINT` needs to execute `Rscript` on `R/runtime.R` so that `lambdr::start_lambda()` gets called. `CMD` simply takes the name of the handler function. You can call the handler whatever you like, but as there is only ever one handler per _Lambda_, by convention we just call it `handler`. The `CMD` gets passed as an argument to `lambdr::start_lambda()`. ## How to use the Dockerfile You're probably wondering "How do I use the Dockerfile!?" We won't get into that here. Instead, the article [Placing an R Lambda Runtime in a Container](lambda-runtime-in-container.html) has sections about development versus deployment containers, and how to practically figure that stuff out for use with **lambdr**. We recommend reading the whole article. Before moving on to the article, you may also want to look into the basics of Docker more generally. You can get a good feel for what using Docker looks like in practice by watching [Docker Tutorial for Beginners](https://youtu.be/b0HMimUb4f0?si=W1wrIJgZ2XjADWWK) by [James Murphy](https://mcoding.io). Or, if you want a hands-on tutorial there is the official [Docker 101 Tutorial](https://www.docker.com/101-tutorial/).