Serverless self-service IoT certificate management - Part 2.

2024-12-22
This post cover image
Voice provided by Amazon Polly

This is the second part in the series about a Certificate Self Service setup for IoT projects. In this part we'll extend the API that we started in the first part. We'll add the possibility to create multiple intermediate CAs, and thereby the possibility to create server certificates from one CA and client/device certificates from a different.

Using multiple intermediate CAs is a good practice from a security and management perspective. If your intermediate CA would become compromised you can revoke and rotate only the affected certificates, signed by the compromised CA. If the same intermediate CA is used to sign all device certificates, in case of a a breach, all device certificates would need to be rotated. We can split device certificates into "cells" and handle each cell independently.

As a reminder!!

Get the source code

As the source code for this project is fairly large not all code is available in this post. To get the full source code and deploy it your self, visit Serverless-Handbook Self Service IoT Certificate management

Why Build a Self-Service API?

Once again I like to revisit why we like to create this self service system? Why not just use a private CA or IoT Core from AWS or a SaaS solution like DigiCert IoT Trust Manager.

For several of the SaaS solutions that exists you pay per certificate. In many of the teams I have been working with, related to projects in IoT, we have been issuing many certificates per day for testing. Several certs per device, per tenant, and so on. Automatic tests has generated certs over and over again. We have discovered that using some form of self signed certificates, with a self service API, made us more cost efficient. It also enabled us to test different scenarios with several intermediate CAs.

From a learning perspective, new engineers that didn't have that much experience with IoT and certificates could test and learn in a safe and good way, without breaking the bank.

So for my teams a self-service API for certificate management allowed:

  • Automation: Devices and servers can request and renew certificates programmatically.
  • Scalability: As our IoT environment grows, the API can handle the increasing demand for certificates.
  • Learning and Testing: Before adopting a managed service, building your own certificate system helps you understand how PKI works.

Architecture overview

Let's start by going back to the architecture and look at the overview for this setup. There are a some new parts introduced this time.

Image showing the architecture overview

First there is a certificate inventory introduced. Information about certificates are stored in this DynamoDB table, allowing for querying for certificates, based on the signing parent. To populate the inventory the Lambda functions responsible for creating certificates will post an event onto an Amazon EventBridge event-bus, that will invoke a StepFunction that populate the inventory. This StepFunction will use the newly released JSONata support.

Looking the creation flow it would look like this, depending on if it's a CA or leaf (server / client) certificate being created, everything below the dotted line is run asynchronously.

Image showing the architecture overview

Last a new Lambda function, responsible for listing and fetching certificates is created.

Update REST API

Now, let's look at the updated API, we'll add three new endpoints, one for creating new device certificates and two for query and fetching.

EndpointMethodDescription
/certificates/rootPOSTCreate a new Root CA.
/certificates/intermediatePOSTCreate a new Intermediate CA.
/certificates/serverPOSTCreate a new server certificate.
/certificates/devicePOSTCreate a new device / client certificate.
/certificatesGETList and search certificates
/certificates/{certificate}GETGet a single certificate

I decided to use separate paths (e.g., /certificates/root, /certificates/intermediate, /certificates/server, /certificates/device) rather than a single endpoint with a type parameter (e.g., /certificates with type as input) as I feel this aligns better with REST principles and improves the API’s readability and usability.

The /certificates/device is very similar to the other endpoints that create certificates, this will create a random uuid that will be the device ID.

For querying for certificates the /certificates endpoint accept two query parameters, parent which is the base64 encoded parent FQDN, e.g bbq.example.com and limit will restrict how many certs that are returned, a further extension would be to also include lastevaluatedkey parameter, allowing for a good pagination.

Add certificate inventory

To add the certificate inventory we start by extending the service with a DynamoDB table and Index that can be used to query for certificates and EventBridge and StepFuntions for updating the inventory.

DynamoDB table

We'll use the FQDN as partition key and the ParentFQDN as the sort key. In the index we'll use the ParentFQDN as the partition key. This gives us the possibility to query for a single certificate and all certificates signed by a specific CA.

  InventoryTable:
Type: AWS::DynamoDB::Table
Properties:
TableName: !Sub ${ApplicationName}-certificate-inventory
BillingMode: PAY_PER_REQUEST
AttributeDefinitions:
- AttributeName: FQDN
AttributeType: S
- AttributeName: ParentFQDN
AttributeType: S
KeySchema:
- AttributeName: FQDN
KeyType: HASH
- AttributeName: ParentFQDN
KeyType: RANGE
GlobalSecondaryIndexes:
- IndexName: parent-index
KeySchema:
- AttributeName: ParentFQDN
KeyType: HASH
- AttributeName: FQDN
KeyType: RANGE
Projection:
ProjectionType: ALL

EventBridge + StepFunctions

We prepared for this already in the first part. The event-bus is created by the template with common infrastructure.

To update the inventory we'll use an event driven approach where the Lambda functions will post an event as soon as the certificate is created. This will however create a eventually consistent solution, where a read after write might not get the latest result. As long as we are aware of this, it should not cause us any problems, and the benefits of using an event-driven approach outweighs that. By using event-driven architecture we can extend and decouple logic in the future.

The functions will post an event with this structure:

{
"Source": "certificates",
"DetailType": "created",
"Detail":
{
"FQDN": "domain name",
"Type": "Root/Intermediate/Server/Client",
"ParentFQDN": "Parent",
"ValidUntil": "Valid to date",
},
}

To create the StepFunction, we append it to CloudFormation template. We'll add an event matching our structure, so it will be invoked every time an certificate is created.

  CertificateCreatedExpress:
Type: AWS::Serverless::StateMachine
Properties:
DefinitionUri: certificate-created-statemachine/statemachine.asl.yaml
Tracing:
Enabled: true
Logging:
Destinations:
- CloudWatchLogsLogGroup:
LogGroupArn: !GetAtt CertificateCreatedStateMachineLogGroup.Arn
IncludeExecutionData: true
Level: ALL
DefinitionSubstitutions:
EventBridgeBusName:
Fn::ImportValue: !Sub ${CommonInfraStackName}:event-bus-name
InventoryTable: !Ref InventoryTable
ApplicationName: !Ref ApplicationName
Policies:
- Statement:
- Effect: Allow
Action:
- logs:*
Resource: "*"
- DynamoDBCrudPolicy:
TableName: !Ref InventoryTable
Events:
CertificateCreatedEvent:
Type: EventBridgeRule
Properties:
EventBusName:
Fn::ImportValue: !Sub ${CommonInfraStackName}:event-bus-name
Pattern:
source:
- certificates
detail-type:
- created
Type: EXPRESS

As of now our StateMachine definition is not that big, but it leaves room for us to extend on it later.

Comment: Certificate service - Store Certificate Info
QueryLanguage: JSONata
StartAt: Debug
States:
Debug:
Type: Pass
Next: Store Certificate Info
Assign:
FQDN: "{% $states.input.detail.FQDN %}"
ParentFQDN: "{% $states.input.detail.ParentFQDN %}"
Type: "{% $states.input.detail.Type %}"
ValidUntil: "{% $states.input.detail.ValidUntil %}"
Store Certificate Info:
Type: Task
Resource: arn:aws:states:::dynamodb:putItem
Arguments:
TableName: ${InventoryTable}
Item:
FQDN:
S: "{% $FQDN %}"
ParentFQDN:
S: "{% $ParentFQDN %}"
Type:
S: "{% $Type %}"
ValidUntil:
S: "{% $ValidUntil %}"
End: true

As you might see we use two of the new features recently released for StepFunctions. That is variables and JSONata.

Variables

With variables we can use Assign to create variables that are available in all states in the StepFunctions. So now we can create and assign data in an early state and use it through out. No need to recreate the information in every state. This is a very welcome addition. To demonstrate this I create variables in the very first state that I then use. When we use JSONata variables are access {% $variable-name %} , with JSONPath $variable-name.

JSONata

JSONata is also a new addition and you can set the QueryLanguage to either JSONPath (default) or JSONata to set the query language. You have to select either one of them, there is no possibility to mix and match. In JSONata we use {%%} instead of the traditional $.

Add client cert creation

Creating a device (client) certificate is almost the same as creating a server certificate. With the difference that I don't want to specify the full domain, instead I only specify the FQDN for the signing intermediate certificate and the logic generates a new UUID that is used. E.g resulting in uuid.clients.bbq.example.com.

We add a new Lambda function and add it to our API.

  LambdaGenerateDeviceCertificate:
Type: AWS::Serverless::Function
Properties:
CodeUri: Lambda/API/GenerateDeviceCert
Handler: handler.handler
Layers:
- !Ref UtilsLayer
Policies:
- S3FullAccessPolicy:
BucketName:
Fn::ImportValue: !Sub "${CommonInfraStackName}:certificate-bucket-name"
- EventBridgePutEventsPolicy:
EventBusName:
Fn::ImportValue: !Sub "${CommonInfraStackName}:event-bus-name"
Events:
CreateDeviceCertApi:
Type: Api
Properties:
Path: /certificates/device
Method: post
RestApiId: !Ref GenerateCertificatesApi

Introduce Lambda Layer

As we now introduce some common utility functions between five different Lambda Functions, I decided to put these in an Lambda Layer. I normally don't use Layers but this time I felt it would be a good approach and it made the code structure a bit easier. To create and use the layer we need to create the Layer version, and set the Lambda functions to use it.

We add the creation of the Layer to the template and update our Lambda functions.

  UtilsLayer:
Type: AWS::Serverless::LayerVersion
Properties:
LayerName: UtilsLayer
ContentUri: Lambda/Layer
CompatibleRuntimes:
- python3.12
Metadata:
BuildMethod: python3.12
Description: "Utils code for Lambda functions"

LambdaGenerateRootCA:
Type: AWS::Serverless::Function
Properties:
....
Layers:
- !Ref UtilsLayer
.....

List / Get certificates

Finally we add the possibility to list and get certificates.

  LambdaListGetCertificates:
Type: AWS::Serverless::Function
Properties:
CodeUri: Lambda/API/ListGetCertificates
Layers:
- !Ref UtilsLayer
Handler: handler.handler
Environment:
Variables:
DYNAMODB_TABLE: !Ref InventoryTable
DYNAMODB_INDEX: parent-index
Policies:
- S3FullAccessPolicy:
BucketName:
Fn::ImportValue: !Sub "${CommonInfraStackName}:certificate-bucket-name"
- EventBridgePutEventsPolicy:
EventBusName:
Fn::ImportValue: !Sub "${CommonInfraStackName}:event-bus-name"
- DynamoDBCrudPolicy:
TableName: !Ref InventoryTable
Events:
ListCertificatesApi:
Type: Api
Properties:
Path: /certificates
Method: get
RestApiId: !Ref GenerateCertificatesApi
GetCertificatesApi:
Type: Api
Properties:
Path: /certificates/{certificate}
Method: get
RestApiId: !Ref GenerateCertificatesApi

One major difference between this function and the other four is that this will not be a function with single responsibility. This function will be responsible for both listing certificates and fetching a single certificates. It will handle all the GET methods. This is one of many design approaches you can use when building an API with Lambda functions, single purpose, Lambdalith, read/write separation. I decided to use single purpose functions for several of the write functions, even if there are similarities between them, I felt they was different enough to be single purpose. For the read functionality I decided to put the logic in one function, as the functionality is very similar, getting one certificate or several is just the matter of a list.

This approach has also created a nice Command Query Responsibility Segregation (CQRS).

Conclusion

In this second part we extended our self service API with functionality to create device certificates, we introduced an inventory and the possibility to list and get certificates. Stay tuned for the next part where we will increase the security of the API and the certificate storage, we will also extend the functionality for listing and fetching certificates.

As a reminder!!

This is build for Learning. Use Managed Services for production
While this API is a great learning tool, services like AWS Private CA or Let’s Encrypt are better suited for production.

To get the full source code and deploy it your self, visit Serverless-Handbook Self Service IoT Certificate management

Final Words

Don't forget to follow me on LinkedIn and X for more content, and read rest of my Blogs

As Werner says! Now Go Build!


Post Quiz

Test what you just learned by doing this five question quiz - https://kvist.ai/334369.
Scan the QR code below or click the link above.

Powered by kvist.ai your AI generated quiz solution!