Letting an AI Agent Pick My Related Posts

I keep extending my blog pipeline with small automations. I proofread my posts with Amazon Nova, and I translate them automatically, and every step runs serverless and event-driven. This time I went after something that I have been thinking about for a long time.

I write a post and someone finds it, reads it all the way to the end, and then leaves. That last part is what I have been thinking about. A reader who just finished a post about Lambda is probably interested in reading my other serverless posts. I did nothing with that moment, no "read this next". Nothing. The reading ended at exactly the point where it should have continued.

So I built a pipeline that fixes exactly that, an AI agent that reads every new post, searches all my blog posts by meaning and not keywords, then picks three related posts, and explains each choice in one sentence to me. The setup is all serverless, and all event-driven, what did you expect? I combined Lambda Durable Functions, Strands Agents, Bedrock AgentCore Gateway, and MongoDB Atlas Vector Search into one pipeline.

All code for this post can be found on serverless handbook as usual

Why not just use tags?

My blog is static HTML and an easy route would be to use tags, like so many other static sites. I did a test, and it was just not good enough. I use tags like serverless which are rather broad, but I also have a few more narrow tags like lambda. But just because a post is tagged with serverless or lambda doesn't mean it might not share anything with other posts. Tags answer "what words did I label this with?", not "what is this post actually about?"

The next step would be to use content similarity. Here I can create vector embeddings for every post and return the three nearest neighbors, done! Well, yes, this is the backbone of the pipeline, but a raw top three closest neighbors often yields near-duplicate posts covering the exact same topic. Pure nearest neighbors matching can tell me that two posts are similar, but it doesn't tell me whether a reader who finished one would get anything out of the other.

So I extended the design, and the core of everything is that vector search retrieves data, and AI agent decides which relates the best. I start by narrowing down the number of posts using vector search, then an AI agent, playing the role of my editor, reads the posts and picks the three it thinks the readers would enjoy the most. It creates a one-sentence rationale for why it picked that post. That way I can review and refine the AI agent over time.

Architecture overview

When a post is published, the new blog pipeline posts a PostPublished event to an EventBridge bus.

{
  "Source": "....",
  "DetailType": "PostPublished",
  "Detail": {
    "slug": "lambda-durable-functions-101",
    "language": "en",
    "branch": "main",
    "commit_sha": "f8a2c1d3..."
  }
}

This will now invoke a Lambda Durable Function, which will coordinate the entire flow. As the coordination is mostly code steps, I decided to try Durable Functions instead of Step Functions, which has been my normal go-to in the past. With Durable Functions I get yet another tool in my toolbox.

Image showing the architecture overview, one durable Lambda orchestrating Bedrock, AgentCore Gateway, MongoDB Atlas, DSQL, and GitHub

The pipeline looks like this: it fetches the post markdown from GitHub, calls Bedrock to create an embedding using Amazon Titan Text Embeddings V2, and upserts it into a MongoDB Atlas collection. Next, the AI Agent is called to pick out the three related posts, starting with doing a vector search in MongoDB Atlas using a MCP tool hosted on AgentCore Gateway. Then the related posts references are added to DSQL, which is part of my home-built CMS system, and finally it updates the frontmatter of the post in GitHub and opens a PR.

I also needed to make sure that embeddings were created for all current posts; otherwise, it would not work for new posts. So I created a small one-time backfill setup.

Image showing the architecture overview for backfilling

That's a lot of moving parts for one function. I'll come back to why that works.

From Markdown to embeddings

The entire pipeline relies on one transformation, turning the blog post into embeddings so we programmatically can find similar posts.

An embedding model maps text to a fixed-length vector of floats. In my case, 1024 floats from Bedrock's Titan Text Embeddings v2. The model was trained so that texts about the same concepts land close together in that 1024-dimensional space, even when they share no vocabulary. My Durable Functions post and my Step Functions post end up as neighbors because both are about orchestrating workflows on AWS.

Image showing blog posts as points in a 2D semantic map with topic clusters and the three picked posts highlighted

What goes into the embedding matters a lot, just as the model. I don't embed the entire raw markdown; I first run a small normalization step that strips the YAML frontmatter, removes every code example, and drops images, and finally create a new structure that can be fed to the model.

TITLE: Lambda Durable Functions 101
TAGS: aws, lambda, serverless
SUMMARY: A practical walkthrough of building a durable workflow on AWS Lambda.
BODY: The cleaned up body

So why do I strip out all code? Well, code blocks are just "noise". Two posts on the same topic can have completely different code examples, so if I included the code these posts would now end up farther away from each other in the vector space. Code says nothing about similarity, so I include only the actual blog content.

The call to Bedrock to create the embedding is fairly small.

_MODEL_ID = "amazon.titan-embed-text-v2:0"

def embed_text(text: str, dimensions: int = 1024) -> list[float]:
    response = _bedrock_client().invoke_model(
        modelId=_MODEL_ID,
        contentType="application/json",
        accept="application/json",
        body=json.dumps({
            "inputText": text,
            "dimensions": dimensions,
            "normalize": True,
        }),
    )
    payload = json.loads(response["body"].read())
    return [float(x) for x in payload["embedding"]]

One flag that we need to look closer at is normalize: True. Think of each embedding as an arrow pointing somewhere in space. Posts about the same topic point in roughly the same direction. When comparing two posts, the direction is what matters, not how long the arrows are. The standard recipe for this is cosine similarity: multiply the two vectors together, then divide by the length of each arrow. That division is only there to cancel out the lengths. With normalize: True, Titan trims every vector to a length of exactly 1 before returning it. And dividing by 1 changes nothing. So the comparison shrinks to just the multiply step, a plain dot product. Same result, less math, and that adds up when every new post is compared against the entire back catalog.

The embed input is capped at 30,000 characters before this call, so very long posts get truncated instead of failing (Titan v2 accepts around 8K tokens).

At the same time a SHA-256 content hash is created and stored with the vector, with this I can check if older posts have changed and need to be updated.

new_hash = compute_hash(markdown)  # sha256 over the normalized body
existing = context.step(
    lambda _: find_by_id(slug, language),
    name="find_by_id",
)
if not force_recompute and existing and existing.get("content_hash") == new_hash:
    return {"skipped": True, "reason": "content_hash_unchanged"}

Storing the Vectors in MongoDB Atlas

The created vectors need to be stored somewhere where they can be queried properly. I did explore some different options like OpenSearch serverless and Aurora with pgvector. But in the end I decided to run this with MongoDB Atlas, which felt like the best choice for me. And since I can run a M0 cluster under the MongoDB free tier, that was an extra bonus.

MongoDB Atlas Vector Search runs HNSW under the hood, the same approximate nearest neighbor algorithm behind most production vector databases. Querying is a $vectorSearch aggregation stage, which means the vector index and the documents live in one system. One query returns neighbors with their titles, summaries, and tags. Exactly the data that the agent needs, no second lookup, no separate vector database to keep in sync.

pipeline = [
    {"$vectorSearch": {
        "index": "posts_vector_idx",
        "path": "embedding",
        "queryVector": embedding,           # 1024 dims, from Titan
        "numCandidates": max(100, k * 10),
        "limit": k,
        "filter": {"language": language},   # evaluated DURING traversal
    }},
    {"$match": {"slug": {"$nin": exclude_slugs}}},
    {"$project": {
        "_id": 0,
        "slug": 1, "language": 1, "title": 1,
        "summary": 1, "tags": 1, "category": 1,
        "score": {"$meta": "vectorSearchScore"},
    }},
]
return list(_collection().aggregate(pipeline))

That filter line is very important for me and my blogs. My posts exist in multiple languages, and the nearest neighbor of any post is reliably its own translation. Atlas lets me declare language as a filter field inside the vector index, so the HNSW traversal only ever considers same language vectors.

For authentication I use AWS IAM, no passwords anywhere. MongoDB Atlas accepts AWS IAM principals as database users via pymongo's MONGODB-AWS mechanism. My Lambda functions assume one dedicated IAM role through STS, a role that was federated with Atlas exactly once, and the temporary credentials flow straight into the MongoDB connection.

def _assume_atlas_role() -> dict:
    sts = boto3.client("sts")
    response = sts.assume_role(
        RoleArn=os.environ["ATLAS_ROLE_ARN"],
        RoleSessionName="related-posts-atlas",
        DurationSeconds=3600,
    )
    return response["Credentials"]


def _client() -> MongoClient:
    creds = _assume_atlas_role()
    return MongoClient(
        _connection_config()["srvUri"],   # ...authMechanism=MONGODB-AWS
        username=creds["AccessKeyId"],
        password=creds["SecretAccessKey"],
        authMechanismProperties={"AWS_SESSION_TOKEN": creds["SessionToken"]},
    )

Why do I use a dedicated role instead of the Lambda's own execution role? This is basically for two reasons: first of all, I want to be able to handle the setup of MongoDB Atlas in one flow. I don't want to set up part of Atlas, then need to deploy Lambda functions, pull the Role ARNs, and update Atlas. By separating this, I can also create one role that I authorize towards Atlas, then several Lambda functions can assume the same role and I can control that access properly with IAM permissions.

If you want to go deeper on passwordless authentication towards Atlas, I wrote about Outbound Identity Federation with MongoDB recently, which is a different technique than I use here.

Content editor AI with tools

Here's where the AI part sharpens. The agent is not "call an LLM with a prompt". It's a loop. Claude Sonnet 4.6 on Bedrock is called with a system prompt to find related posts, two tools, and it decides what to call, what to read, and when it's done.

Image showing the agent tool loop between Strands, AgentCore Gateway, the tool Lambda functions, and Atlas

The two tools are plain Lambda functions handled as MCP tools in AgentCore Gateway. vector_search returns the top-k candidates with titles, summaries, tags, and similarity scores. read_post_excerpt returns the first 1000 characters of a post's body, and the agent calls it only when two candidates look interchangeable from their summaries and it wants to break the tie by actually reading.

One deliberate design detail in vector_search is that the agent passes the post's slug, never an embedding. The tool Lambda resolves the stored vector from Atlas itself, so the model can't hallucinate a malformed 1024-float array into the index boundary.

As mentioned, tools are fronted by Bedrock AgentCore Gateway. The Gateway turns my Lambda functions into a tool catalog any MCP-aware agent can discover and call, handles the protocol so the Lambdas stay protocol free, and authenticates inbound calls with plain IAM SigV4, no OAuth server to run. Could I have skipped the Gateway and passed the tools as inline Python functions? Sure. But the next agent I build (and there will be one) gets to reuse the same tool catalog without me copying code around.

The Gateway and its targets are plain CloudFormation. Each target maps a Lambda to a tool name plus an input schema, and that schema is what the model sees when it decides how to call the tool.

RelatedPostsGateway:
  Type: AWS::BedrockAgentCore::Gateway
  Properties:
    Name: !Sub ${Application}-related-posts-gw
    ProtocolType: MCP
    RoleArn: !GetAtt GatewayExecutionRole.Arn
    AuthorizerType: AWS_IAM

VectorSearchTarget:
  Type: AWS::BedrockAgentCore::GatewayTarget
  Properties:
    GatewayIdentifier: !Ref RelatedPostsGateway
    Name: vector-search-target
    CredentialProviderConfigurations:
      - CredentialProviderType: GATEWAY_IAM_ROLE
    TargetConfiguration:
      Mcp:
        Lambda:
          LambdaArn: !GetAtt VectorSearchToolFunction.Arn
          ToolSchema:
            InlinePayload:
              - Name: vector_search
                Description: >
                  Vector search over the blog posts corpus. Resolves the
                  source post's stored embedding from Atlas by source_slug,
                  then returns up to k candidates by semantic similarity.
                InputSchema:
                  Type: object
                  Properties:
                    source_slug:
                      Type: string
                    language:
                      Type: string
                    k:
                      Type: integer
                    exclude_slugs:
                      Type: array
                      Items:
                        Type: string
                  Required:
                    - source_slug

And the tool Lambda knows nothing about MCP. The Gateway passes the tool input as the event, and the return value becomes the tool output.

def handler(event: dict, context) -> dict:
    source_slug = event["source_slug"]
    language = event.get("language") or "en"

    embedding = find_embedding(source_slug, language)
    candidates = vector_search_neighbors(
        embedding=embedding,
        k=int(event.get("k", 20)),
        language=language,
        exclude_slugs=event.get("exclude_slugs"),
    )
    return {"candidates": candidates, "count": len(candidates)}

Strands Agents is AWS's open-source agent framework, drives the agentic loop so I don't have to implement that on my own.

from mcp_proxy_for_aws.client import aws_iam_streamablehttp_client
from strands import Agent
from strands.models import BedrockModel
from strands.tools.mcp.mcp_client import MCPClient

mcp_client = MCPClient(
    lambda: aws_iam_streamablehttp_client(
        endpoint=os.environ["AGENTCORE_GATEWAY_URL"],
        aws_service="bedrock-agentcore",
    )
)
model = BedrockModel(
    model_id="eu.anthropic.claude-sonnet-4-6",  # EU cross-region inference profile
    temperature=0.0,
    streaming=False,
)

with mcp_client:
    tools = mcp_client.list_tools_sync()
    agent = Agent(
        model=model,
        tools=tools,
        system_prompt=_SYSTEM_PROMPT,
        callback_handler=_BoundedToolCallHandler(max_calls=8),
    )
    result = agent(
        f"source_slug: {source_slug}\n"
        f"source_title: {source_title}\n"
        f"source_summary: {source_summary}\n"
        f"language: {language}\n\n"
        "Pick 3 related posts."
    )

The model's behavior is shaped almost entirely by the system prompt. Persona, anti-pattern guidance, and a strict contract.

You are the editor for jimmydqv.com, a technical blog about AWS, serverless, and AI.
Your job is to pick exactly 3 related posts a reader would value most after
finishing the current post.
Prefer thematic depth over surface-level keyword overlap.
Each pick needs a one-sentence rationale that names the specific connection.

Workflow:
1. Call vector_search with source_slug=, language=, k=20,
   exclude_slugs=[]. The tool will resolve the embedding from Atlas
   internally, do NOT attempt to construct an embedding yourself.
2. If two candidates feel interchangeable, optionally call read_post_excerpt on one
   to break the tie.
3. Return STRICTLY this JSON shape and nothing else:
   {"picks": [{"slug": "...", "rationale": "..."}, ... exactly 3 items ...]}

parsed = json.loads(_strip_fences(raw))
picks_raw = parsed.get("picks", [])
if len(picks_raw) != 3:
    raise ValueError(f"Agent must return exactly 3 picks, got {len(picks_raw)}")

So, why did I opt for the somewhat more expensive Sonnet model and not the slimmer Haiku or even Nova 2? I tested several different models. Haiku and Nova 2 Light are both faster and cheaper but didn't reliably return three great picks; sometimes I got two, sometimes four, sometimes really bad picks. Sonnet produced picks the most consistently, and a call takes two to three seconds.

A typical run: one vector_search, occasionally one excerpt read, then three picks with rationales like "Both walk through the agent-tool loop on Bedrock; this post focuses on the durable orchestration around it."

Orchestrating with Lambda Durable Functions

The pipeline takes 30 to 90 seconds to run, it makes around six LLM calls, talks to four external systems. The classic answer is Step Functions, and I've used Step Functions for orchestration many times. This time I went with Lambda Durable Functions instead, and for an agent workload I'd make the same call again.

The model is simple. The whole workflow is ordinary Python in one handler, and every side effect is wrapped in context.step(...). Each step's result is checkpointed. If the invocation crashes, Bedrock throttles, times out, anything, a fresh invocation re-runs the handler from the top, but completed steps replay from their checkpoints instead of re-executing.

Image showing two invocation timelines where the second invocation replays checkpointed steps for free and resumes at the failed step

Turning a regular Lambda into a durable function is a SAM property. The orchestrator is declared like this.

RelatedPostsOrchestrator:
  Type: AWS::Serverless::Function
  Properties:
    CodeUri: lambda/orchestrator
    Handler: handler.handler
    Runtime: python3.13
    Timeout: 900
    MemorySize: 1024
    AutoPublishAlias: live
    DurableConfig:
      ExecutionTimeout: 3600
      RetentionPeriodInDays: 7

The handler itself reads like a list of steps.

@durable_execution
def handler(event: dict, context: DurableContext) -> dict:
    slug = event["slug"]
    language = event.get("language", "en")

    # run_id is non-deterministic; compute it inside a step
    # so replays reuse the same value.
    run_id = context.step(lambda _: str(uuid.uuid4()), name="generate_run_id")

    file_path = context.step(
        lambda _: lookup_post_file_path(slug, language),
        name="lookup_file_path",
    )
    fetched = context.step(
        lambda _: invoke_fetch_source(file_path, branch, git_ref),
        name="fetch_source",
    )
    # ... content-hash short-circuit ...
    embedding = context.step(
        lambda _: embed_text(embed_input),
        name="embed_post",
    )
    context.step(lambda _: upsert_post(doc), name="upsert_post")

    primary_picks = context.run_in_child_context(
        lambda child_ctx: child_ctx.step(
            lambda _: _picks_for_one_post(slug, doc["title"], doc["summary"], language),
            name="agent_pick_primary",
        ),
        name="primary_agent_picks",
    )
    if len(primary_picks) != 3:
        raise ExecutionError(
            f"Agent returned {len(primary_picks)} picks, need exactly 3"
        )
    # ... fan-out, persist, PR ...

Notice the run_id. Even a uuid.uuid4() has to live inside a step, because a replay would otherwise generate a different id and diverge from the checkpoint trajectory. The agent call runs in a child context since it orchestrates several MCP tool invocations behind the scenes, and the child context keeps the step tree tidy.

What the replay model brings here is concrete: when an agent call gets throttled, the retry does not re-fetch from GitHub, re-embed, or re-upsert. The expensive earlier work replays for free. And the agent logic itself, branching, JSON parsing, "re-run for five neighbors", is exactly the kind of code that fights you in a JSON state machine and reads naturally in Python. Fan-out is a few lines.

batch = context.parallel(
    [_make_backlink_fn(n) for n in backlink_targets],
    name="backlink_fan_out",
    config=ParallelConfig(max_concurrency=5),
)
# A single failed leg should not fail the pipeline; keep what succeeded.
for item in batch.succeeded():
    result = item.result
    if result and len(result.get("picks") or []) == 3:
        backlink_results.append(result)

The related posts links backwards

This is an interesting part of the solution and the pipeline.

Finding the related posts for a new post is the easy part and very straightforward. The harder part is, what if I add a new post that is now a better choice as a related post for an old post? That would mean I had to update older posts as well when a new one was added, linking it backwards. Because if I don't do this, a post I wrote a year ago would stay "frozen" for all time.

Image showing the new post in the center with five neighbor posts being re-evaluated in parallel, each with its own outcome

So after embedding the new post, the orchestrator asks Atlas for its ten nearest neighbors and re-runs the agent for the top five, each as an independent, parallel durable step. Each neighbor's editor run asks the full question from scratch: given the blog as it exists now, what are the best three? Sometimes the new post displaces an old pick. Sometimes nothing changes.

No nightly batch job, no cron. The work happens exactly when the blog post changes, scoped to the neighborhood that changed.

Landing the picks

The agent's output has to end up in two places, because the blog is a static 11ty site.

Amazon DSQL is the source of truth. One row per pick, written as DELETE-then-INSERT in a single transaction, so re-runs are idempotent and a failed write leaves the old picks intact. I use the same DSQL setup as in my AI Bartender series, IAM auth and all.

CREATE TABLE cms_content.related_posts (
    source_slug      TEXT NOT NULL,
    language         TEXT NOT NULL,
    position         SMALLINT NOT NULL,
    related_slug     TEXT NOT NULL,
    rationale        TEXT NOT NULL,
    similarity_score DOUBLE PRECISION,
    run_id           UUID NOT NULL,
    generated_at     TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    PRIMARY KEY (source_slug, language, position),
    CHECK (position BETWEEN 1 AND 3)
);

The write runs on a single connection, and the DELETE rolls back together with a failed INSERT, so the table is never half-written.

conn = connector.get_connection(token, user)
try:
    with conn.cursor() as cur:
        cur.execute(
            "DELETE FROM cms_content.related_posts "
            "WHERE source_slug = %(s)s AND language = %(l)s",
            {"s": source_slug, "l": language},
        )
        for position, pick in enumerate(picks, start=1):
            cur.execute(
                "INSERT INTO cms_content.related_posts "
                "(source_slug, language, position, related_slug, "
                " rationale, similarity_score, run_id) "
                "VALUES (%(src)s, %(lang)s, %(pos)s, %(rel)s, "
                "        %(rat)s, %(sim)s, %(run)s)",
                {...},
            )
    conn.commit()
except Exception:
    conn.rollback()
    raise

My CMS dashboard reads this table and shows the picks, with rationales, next to each post. The orchestrator's final step emits a completed event that flows through EventBridge to AppSync Events, so the dashboard updates in realtime when a run finishes. No polling anywhere.

A GitHub PR makes it visible on the blog. The pipeline injects a related_posts: block into the frontmatter of the posts, up to six files in one PR (the new post plus the refreshed neighbors). The next site build renders the section.

related_posts:
  - slug: "step-functions-vs-durable-functions"
    rationale: "Direct comparison of the orchestration approach used here against the SFN alternative."

Why a PR instead of committing straight to main? This is how all my different enhancement pipelines do it, it gives me as a human a final check before it goes live.

End result

In the end, the pipeline will add the related posts in the frontmatter, and the 11ty build process will pick them up, look up the cover image and description, and inject a section into the actual built post.

Image showing the new related section in a post

Final Words

I set out to keep readers reading, and ended up with something I find very interesting, a pattern. Vector search does the recall, cheap, fast, mathematically honest about similarity. An agent does the judgment, slower, but able to read two candidates and decide which one a human would actually want next, and say why. Durable execution wraps the whole thing; a six LLM call workflow can be treated as casually as a single function. And MongoDB Atlas quietly does the thing every AI pipeline needs: it keeps the vectors, the metadata, and the filtering in one queryable place, for free, with IAM identity instead of passwords.

Check out my other posts on jimmydqv.com and follow me on LinkedIn and X for more serverless content.

All code for this post can be found on serverless handbook as usual

Now Go Build!

If this saved you an afternoon, you can buy me a coffee.