GitHub Self hosted runners in AWS - part 1 - Fargate

2020-05-20

"GitHub Actions makes it easy to automate all your software workflows, now with world-class CI/CD."

That is how GitHub describe their built in CI/CD tooling. I must say that I really like it and this is something GitHub has been missing before. Many of the Git as a Service providers, GitLab, Bitbucket, Azure DevOps, have had a bundled CI/CD tooling. With GitHub you always needed to use an outside tool.

Your pipelines consists of two major components, Workflows and Actions. Workflows are the actual coordination, what event to trigger on, where to run it and so on. Actions are the core re-usable bit that do the actual work. The actions will be executed on a runner, which is what we concentrate on here.
You can choose from using a runner hosted by GitHub or you can host your own.

In this blog series I will focus on how to create and setup your own self hosted runner in AWS.

Why a self hosted runner?

Why should I bother with a self hosted runner if GitHub can manage them for me?

For me the answer is that I don't want to create a build user, assign that user long lived AWS secrets, and add those into my workflows. Yes they will be added as secrets, they will be encrypted, no they will not be hard-coded in the actual workflow scripts. But I don't want to have to create this build user in my AWS account and manage long lived permission tokens. I want to use roles that the runner can assume to get access to build and deploy.
Also this pattern, where the actual pipeline don't have access to the environment but we have a agent in the environment doing deployments, alá GitOps, is something I really like.

There is one thing we should remember GitHub do not recommend that self hosted runners are used for public repos, and neither do I. Remember that with a public anyone could trigger your runner and thereby run code on you machines.

Well buckle up and let's get our hands dirty.

Create a VPC

First of all we need to create a VPC where our runners can be hosted. I will not go in to the nitty gritty details how to do that. You can always get the CloudFormation templates from my GitHub repos
Let us create a very simple VPC with just one public subnet. In a production setup you would probably like to have the runners in private subnets for additional security. But for the sole purpose of simplicity we just run in one public subnets.

VPC:
Type: AWS::EC2::VPC
Properties:
EnableDnsSupport: true
EnableDnsHostnames: true
CidrBlock: !FindInMap ["SubnetConfig", "VPC", "CIDR"]
PublicSubnetOne:
Type: AWS::EC2::Subnet
Properties:
AvailabilityZone:
Fn::Select:
- 0
- Fn::GetAZs: { Ref: "AWS::Region" }
VpcId: !Ref "VPC"
CidrBlock: !FindInMap ["SubnetConfig", "PublicOne", "CIDR"]
MapPublicIpOnLaunch: true
InternetGateway:
Type: AWS::EC2::InternetGateway
Properties:
GatewayAttachement:
Type: AWS::EC2::VPCGatewayAttachment
Properties:
VpcId: !Ref "VPC"
InternetGatewayId: !Ref "InternetGateway"
NatGatewayIp:
Type: AWS::EC2::EIP
Properties:
Domain: "vpc"
PublicRouteTable:
Type: AWS::EC2::RouteTable
Properties:
VpcId: !Ref "VPC"
PublicRoute:
Type: AWS::EC2::Route
DependsOn: GatewayAttachement
Properties:
RouteTableId: !Ref "PublicRouteTable"
DestinationCidrBlock: "0.0.0.0/0"
GatewayId: !Ref "InternetGateway"
PublicSubnetOneRouteTableAssociation:
Type: AWS::EC2::SubnetRouteTableAssociation
Properties:
SubnetId: !Ref PublicSubnetOne
RouteTableId: !Ref PublicRouteTable

ECS cluster

In this part we focus on running the GitHub runners in Fargate. To do that we need create a ECS cluster. Once again I will not go in to the nitty gritty details how to do that. You can always get the CloudFormation templates from my GitHub repos

  ECSCluster:
Type: AWS::ECS::Cluster
Properties:
ClusterName: !Ref "ApplicationName"
ECSServiceRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Statement:
- Effect: Allow
Principal:
Service: [ecs.amazonaws.com]
Action: ["sts:AssumeRole"]
Path: /
Policies:
- PolicyName: ecs-service
PolicyDocument:
Statement:
- Effect: Allow
Action:
- "elasticloadbalancing:DeregisterInstancesFromLoadBalancer"
- "elasticloadbalancing:DeregisterTargets"
- "elasticloadbalancing:Describe*"
- "elasticloadbalancing:RegisterInstancesWithLoadBalancer"
- "elasticloadbalancing:RegisterTargets"
- "ec2:Describe*"
- "ec2:AuthorizeSecurityGroupIngress"
Resource: "*"

I also created a repository in ECR where I could store the Docker images used by Fargate to run my builds.

  ECRRepository:
Type: AWS::ECR::Repository
Properties:
RepositoryName: !Ref RepositoryName

Docker image and Fargate Task

With all that in place we can start creating the Docker image and the Fargate task for the actual runner.

FROM ubuntu:16.04

ENV DEBIAN_FRONTEND=noninteractive
RUN echo "APT::Get::Assume-Yes \"true\";" > /etc/apt/apt.conf.d/90assumeyes

RUN apt-get update \
&& apt-get install -y --no-install-recommends \
ca-certificates \
curl \
jq \
git \
iputils-ping \
libcurl3 \
libicu55 \
libunwind8 \
netcat


COPY ./start.sh .
RUN chmod +x start.sh
RUN ls -l

CMD ["./start.sh"]

And the start.sh file

#!/bin/bash
set -e

mkdir actions-runner && cd actions-runner
curl -O -L https://github.com/actions/runner/releases/download/v2.169.1/actions-runner-linux-x64-2.169.1.tar.gz
tar xzf ./actions-runner-linux-x64-2.169.1.tar.gz

token=<super secret GitHub token>

./config.sh --url <GitHub repo Url> --token $token --name "aws-runner-$(hostname)" --work _work
./run.sh

When trying to install and start the runner package it quickly exited with an error that it was not running as root. That I corrected by adding this line to the Dockerfile.

ENV RUNNER_ALLOW_RUNASROOT=true

The token used during configuration step I got from the GitHub web when adding a new runner. Locally this run super smooth and the runner registered and it was time to try it out in Fargate.
I started by creating the Fargate task.

TaskDefinition:
Type: AWS::ECS::TaskDefinition
Properties:
Cpu: !Ref Cpu
ExecutionRoleArn: !GetAtt ExecutionRole.Arn
Family: !Ref FamilyName
Memory: !Ref Ram
NetworkMode: awsvpc
RequiresCompatibilities:
- FARGATE
TaskRoleArn: !GetAtt TaskRole.Arn
ContainerDefinitions:
- Image: !Ref ContainerImage
LogConfiguration:
LogDriver: awslogs
Options:
awslogs-create-group: true
awslogs-group: "/ecs/github-builders"
awslogs-region: !Ref AWS::Region
awslogs-stream-prefix: "ecs"
Memory: 512
MemoryReservation: 512
Name: "github-builders"
PortMappings:
- ContainerPort: !Ref ContainerPort
ECSService:
Type: AWS::ECS::Service
Properties:
Cluster:
Fn::ImportValue: !Sub "${InfrastructureStack}:ClusterArn"
DeploymentConfiguration:
MaximumPercent: 200
MinimumHealthyPercent: 75
DesiredCount: 1
LaunchType: "FARGATE"
NetworkConfiguration:
AwsvpcConfiguration:
Subnets:
- Fn::ImportValue: !Sub "${InfrastructureStack}:PublicSubnetOne"
AssignPublicIp: ENABLED
SecurityGroups:
- !Ref TaskSecurityGroup
ServiceName: !Ref FamilyName
TaskDefinition: !Ref TaskDefinition

However, I quickly figured out that this will not scale. Having to add a one time token manually every time a new runner was going to register would not work. I needed a new game plan.....

Automatically registering the runner

The way forward to automatically registering the runner would be to get a token using the GitHub API. To be able to call the API the calls must be authenticated. I used a PAT (Personal Access Token) to authenticate the calls. The PAT was stored in Parameter Store and injected into the container using Secrets. So that Secrets part was added to TaskDefinition.

TaskDefinition:
Type: AWS::ECS::TaskDefinition
Properties:
....
ContainerDefinitions:
- Image: !Ref ContainerImage
Secrets:
- Name: PAT
ValueFrom: !Sub arn:aws:ssm:${AWS::Region}:${AWS::AccountId}:parameter/github_builders_pat

Also I needed to modify the startup script to fetch the token from GitHub API.

#!/bin/bash

# Get Token
token=$(curl -s -XPOST \
-H "authorization: token ${PAT}" \
https://api.github.com/repos/<GitHub User>/<GitHUb Repo>/actions/runners/registration-token |\
jq -r .token)

Now every time a new Task is started it will register with GitHub and are ready to start running builds.

Running and polling for jobs

After registration the runner start polling jobs from GitHub and started to run build after build. Adding additional Tasks made them register automatically, everything worked great. I started to remove Fargate Task, the problem then was that the runner did not deregister. Instead GitHub started to mark them as offline and I had to force remove them in the GitHib UI. Maybe not a major problem but it will not scale.

Things to think about

The GitHub runner will self update if there is a new version available. This is good, but not when running in a Fargate task. After the runner has updated it would restart and this caused the Task to shut down.
The ECS service would then bring up a new Task and when that new Task received the job, it would also auto update, since the Docker image still contains the old version. Suddenly we are in a endless restart loop without any build being completed.

GitHub Actions support Docker enabled actions. Meaning that you need to run Docker inside a Docker container. This is not a good approach, there are ways to do it but still it's a mess.

The final problem is that runners are not deregistered when the Task shut down. This will make a problem since I like to auto-scale number if runners.

Conclusion

Running the GitHub self hosted runner in a serverless way in Fargate was really nice on paper. However the self updating and the need to support running Docker makes this approach not to be as good as I thought. GitHub doesn't officially support running the self hosted runner in Docker and I start to understand why.

I don't see this as a failure, instead it was a great learning experience.

Next move is to run it on small EC2 instances that can come and go. You don't want to miss the coming part in this series of blogs!

Stay tuned!