The series is coming to an end. Just having a lambda accessible from the web with caching is not enough. At least if we want anybody to use it. How can you explain what the site does without some static HTML pages?

In today’s post, we will add a homepage. For https://placeruler.knappi.org, I used Astro to generate the site. But you may use any static site generator you like. In this post, we will just set up the infrastructure and assume that our page consists of a few HTML pages.

Dev environment

Firstly, let’s make sure that we can see the website when developing locally:

First, we create a new folder “website” in the project. Second, we add an “index.html” file. Something like

<h1>Terraform Lambda Example</h1>
<h2>SVG</h2>
<img src="api/100x100.svg" />

Third, we run a server to deliver the website. We could add that to our Lambda dev environment, but maybe it is better to keep them separate. The easiest way that cam to my mind was: npm install vite and use a simple vite.config with a proxy configuration:

import { defineConfig } from "vite";
export default defineConfig({
  server: {
    proxy: {
      "/api": "http://localhost:3000",
    },
  },
});

Now, we run npm run dev in the lambda directory and npx vite in the website directory, we go to http://localhost:5173 to see our website. Nice!

$Screenshot of a website with header \$

Now that we have the content, we can deploy our files (i.e., our single index.html) What do we need?

Create an S3 bucket for the website files. This is the most common approach to hosting a static site on AWS.
Change the CloudFront configuration: Requests starting with api/ should be delivered to the Lambda, everything else goes to the S3 bucket.

A bucket full of files

For the homepage itself, we will use an S3 bucket. That is the most common and battle-tested approach for static sites on AWS. The Terraform for that is simple:

locals {
  # Define the filename in a local variable so that it can be used in
  # the CloudFront configuration as well.
  s3_website_bucket = "terraform-lambda-example-site"
}

resource "aws_s3_bucket" "static-website" {
  bucket = local.s3_website_bucket
}

# Since we access this bucket through CloudFront,
resource "aws_s3_bucket_public_access_block" "static_site_bucket_public_access" {
  bucket = aws_s3_bucket.static-website.id

  # Prohibit the creation of public ACLs
  block_public_acls       = true
  # Prohibit the creation of public policies
  block_public_policy     = true
  # Ignore ACLs that would allows public access
  ignore_public_acls      = true
  # Only allow authorized users and AWS services to access the bucket
  restrict_public_buckets = true
}

The four settings block_public_acls, block_public_policy, ignore_public_acls and restrict_public_buckets are a way to make sure that the bucket is not publicly accessible (see the docs for details). We want to only allow access through CloudFront and only read-only access, except for ourselves.

The docs recommend…

The documentation recommends setting the property block_public_policy on the account-level. This ensures that users can’t use potential flaws in the ACLs or policy configuration to remove the settings from the bucket and then set a public policy. It looks like this Terraform resource is more suitable for that setting.

Now, let’s upload our file(s) using the AWS cli tool:

aws s3 cp ./website/index.html s3://terraform-lambda-example-site --cache-control max-age=0

Note that you can set the caching headers using the tool. This is helpful if you have static files with hashed names. In this example, we only have one file to upload. However, in the repository for “placeruler.knappi.org, I use Astro to build the site. I then upload the files using this script:

#!/usr/bin/env bash

set -e

# For the generated files with content-hashed names, we use a very long caching period, because
# the files should never be overwritten.
aws s3 cp ./dist/_astro s3://placeruler-website/_astro --cache-control max-age=31536000 --recursive
# When all the dependency-files are uploaded, We add index.html file without caching.
# The new version should be delivered instantly after an update.
aws s3 cp ./dist/index.html s3://placeruler-website --cache-control max-age=0

We can tweak the headers a little, depending on the use case. For example, it might not be super important that the index.html is delivered instantly, so we can use some caching here as well.

There is one thing that we should take care of as well, but I haven’t found a good solution yet. At some point, old assets (i.e., JavaScript files, images, and CSS files) should be removed again to save space. If we delete them instantly, and a user has a browser-tab open for a long time, they might try to load a deleted asset at time point. This is very unlikely but can happen theoretically. The ideal way would be to mark them for deletion when before adding the new files and delete them after a fixed period of time. Maybe this is possible using Object tagging I haven’t tried this yet.

Accessing S3 through CloudFront

When we created our CloudFront distribution in a previous post, it looked like this in the Terraform code:

resource "aws_cloudfront_distribution" "main" {
  enabled = true

  aliases = [
    "lambda-example.knappi.org",
  ]

  origin {
    origin_id   = local.cloudfront_lambda_origin_id
    domain_name = trimsuffix(trimprefix(aws_lambda_function_url.main_lambda_url.function_url, "https://"), "/")

    custom_origin_config {
      http_port                = 80
      https_port               = 443
      origin_protocol_policy   = "https-only"
      origin_ssl_protocols = [
        "TLSv1.2",
      ]
    }
  }

  default_cache_behavior {
    allowed_methods = ["DELETE", "GET", "HEAD", "OPTIONS", "PATCH", "POST", "PUT"]
    cached_methods = ["GET", "HEAD"]
    target_origin_id       = local.cloudfront_lambda_origin_id
    cache_policy_id        = aws_cloudfront_cache_policy.test_lambda_cache_policy.id
    viewer_protocol_policy = "redirect-to-https"
  }

  # [...]
}

Now we want to access the Lambda only through the /api/ path. The other requests should be served from the S3 bucket we just created. So, we need to add another origin to the distribution. The default_cache_behavior should use the S3 bucket as origin.

resource "aws_cloudfront_distribution" "main" {
  # [...]

  origin {
    origin_id = local.cloudfront_s3_origin_id
    domain_name = aws_s3_bucket.static-website.bucket_regional_domain_name
  }

  default_cache_behavior {
    allowed_methods = ["GET", "HEAD"]
    cached_methods = ["GET", "HEAD"]
    target_origin_id       = local.cloudfront_s3_origin_id
    cache_policy_id        = aws_cloudfront_cache_policy.s3_cache_policy.id
    viewer_protocol_policy = "redirect-to-https"
  }
}

The Lambda origin is accessed by an ordered_cache_behavior now, configured with the correct path:

# Send requests starting with "/api/" to the Lambda
ordered_cache_behavior {
  path_pattern           = "/api/*"
  # We need to set at least GET and HEAD, or an error will occur during deployment
  allowed_methods = ["GET", "HEAD"]
  cached_methods = ["GET", "HEAD"]
  target_origin_id       = local.cloudfront_lambda_origin_id
  cache_policy_id        = aws_cloudfront_cache_policy.test_lambda_cache_policy.id
  viewer_protocol_policy = "redirect-to-https"
}

We also need a cache policy to specify which cookies, query parameters, and headers should be forwarded.

resource "aws_cloudfront_cache_policy" "s3_cache_policy" {
  name = "example_s3_cache_policy"
  parameters_in_cache_key_and_forwarded_to_origin {
    cookies_config {
      cookie_behavior = "none"
    }
    query_strings_config {
      query_string_behavior = "none"
    }
    headers_config {
      header_behavior = "none"
    }
  }
}

Why isn’t this working?

This above code is enough to make the deployment work, but when we try to access the website, an error is returned:

<Error>
  <Code>AccessDenied</Code>
  <Message>Access Denied</Message>
  <RequestId>PD5DVR1TCBGWVA6W</RequestId>
  <HostId>M/HjptWBIbOR09rEtwyCjpWxnQfNQTc3aDQWQoAyQ6b9Gjmj+3C4ERrLqMaijacM/+sKIyIbNYU=</HostId>
</Error>

In fact, we need to do multiple things to make it work:

Deploy the S3 files to the correct bucket.

I was, of course, trying all this while writing this post. And I was a bit confused that neither https://lambda-example.knappi.org/ nor https://lambda-example.knappi.org/index.html worked. One of the things I did wrong was to deploy the files to the wrong bucket. I copied and pasted the deployment script from “placeruler” repository and deployed my index.html file to the placeruler bucket. A missing index.html file also results in an “Access Denied” error.

The reason why this happened was that I had two places where the bucket name was specified: The terraform files and the deployment script.

What if we could actually have a single source of truth here? I found one way to do that: We can actually access the values of outputs from CLI. We can add

output "s3_website_bucket" {
  value = local.s3_website_bucket
}

to the S3 bucket terraform file and then after running terraform apply, use the value of that output via

TARGET_BUCKET="$( terraform output -raw s3_website_bucket)"
aws s3 cp ./website/index.html "s3://${TARGET_BUCKET}" --cache-control max-age=0

This helped upload the file to the bucket. It didn’t solve the “Access Denied” error.

Origin Access Control

CloudFront needs to send authenticated requests to the S3 bucket. The recommended way to do this is Origin-Access-Control, so we need to specify an aws_cloudfront_origin_access_control resource:

resource "aws_cloudfront_origin_access_control" "cf-s3-oac" {
  name                              = "Terraform Example CloudFront S3 OAC"
  description                       = "CloudFront S3 OAC"
  origin_access_control_origin_type = "s3"
  signing_behavior                  = "always"
  signing_protocol                  = "sigv4"
}

We also need to configure the S3 origin to use this OAC resource:

origin {
  origin_id                = local.cloudfront_s3_origin_id
  origin_access_control_id = aws_cloudfront_origin_access_control.cf-s3-oac.id
  domain_name              = aws_s3_bucket.static-website.bucket_regional_domain_name
}

CloudFront is signing requests, and the bucket can verify the origin. But we still get “Access Denied” errors.

S3 Bucket Policy

We also need an S3 Bucket Policy that allows access from our CloudFront distribution to S3 objects in the bucket. We need to add this IAM policy to allow that:

data "aws_iam_policy_document" "s3_bucket_policy" {
  statement {
    # Allow retrieving objects...
    actions   = ["s3:GetObject"]
    # ...from the bucket we have added above...
    resources = ["${aws_s3_bucket.static-website.arn}/*"]
    # ...by the CloudFront service...
    principals {
      type        = "Service"
      identifiers = ["cloudfront.amazonaws.com"]
    }
    # ...but only if the ARN equals the ARN of our distribution.
    condition {
      test     = "StringEquals"
      variable = "AWS:SourceArn"
      values   = [aws_cloudfront_distribution.main.arn]
    }
  }
}

# Attach this policy document to our bucket.
resource "aws_s3_bucket_policy" "static_site_bucket_policy" {
  bucket = aws_s3_bucket.static-website.id
  policy = data.aws_iam_policy_document.s3_bucket_policy.json
}

This helps a little. We still get an “Access Denied” error when calling https://lambda-example.knappi.org/, but https://lambda-example.knappi.org/index.html works.

Default Root Object

Even if we’ve deployed the index.html file to the bucket, S3 doesn’t automatically use it as a default root object when / is requested. This can be done through a aws_s3_bucket_website_configuration, or in CloudFront. Since we don’t want to access the bucket as website directly, we use the default_root_object in the CloudFront distribution:

resource "aws_cloudfront_distribution" "main" {
  enabled             = true
  default_root_object = "index.html"
  # [...]
}

Another try and “Hooray:” We finally see the homepage, including the embedded placeholder images at https://lambda-example.knappi.org/

Conclusion

This post was way longer than I expected it to be. Thanks for reading up to the end. We have covered a couple of topics here.

CloudFront configuration with multiple origins.
Access control between CloudFront and S3.
Retrieving outputs from Terraform via CLI
Setting up and deploying files to and S3 bucket.

The branch 0034-cloudfront-s3-multi-origin of the example repository contains the state of the project up to and including the changes of this post. If you want to try it yourself, you can always use that project as a reference.

I hope you enjoyed that post and will be there for the next one as well.