~/blog/mercio-runtime

Inside mercio-runtime: How Mercy Executes Serverless Functions

June 1, 2026·18 min read·Mercy Engineering

serverless
workerd
v8
bullmq
architecture

Introduction

When you upload a JavaScript function to Mercy and hit its invoke URL, something has to actually run your code. That something is mercio-runtime — a lean, Bun-based service that manages a warm pool of workerd V8 isolates, routes HTTP invocations through BullMQ, and caches compiled bundles on local disk so cold starts happen at most once per function.

This post is a deep technical walk-through of its internals. We will cover the process lifecycle, the worker pool algorithm, how Cap'n Proto configuration is generated on the fly, why we chose BullMQ for the invocation layer, and how all of this fits together inside Docker. By the end you will understand exactly what happens — at every layer — between a caller sending an HTTP request and receiving a response.

Prerequisites: familiarity with Node.js/Bun event loops, Docker networking, and a high-level understanding of Redis job queues. Knowledge of Cap'n Proto or workerd is not required.

System Architecture

Mercy's Mercio product spans four major services. Understanding wheremercio-runtime sits in that landscape is the right starting point.

Figure 1 — Full system overview. The runtime consumes jobs from the BullMQ queue and proxies them to warm workerd processes.

The four services and their responsibilities:

apps/api — Receives the incoming HTTP request, enqueues an invocation job, and blocks waiting for the result viaQueueEvents.waitUntilFinished.
Redis / BullMQ — Acts as the invocation broker. The mercio-invocations queue decouples the API from the runtime and provides back-pressure, retry semantics, and observability.
apps/mercio-runtime — This service. Consumes invocation jobs, maintains the workerd pool, and forwards HTTP traffic to the correct isolate.
apps/worker — A separate build pipeline that takes user-uploaded zip files, runs esbuild, and uploads the resultingworker.mjs bundle to Cloudflare R2. The runtime never touches raw user code.

The Request Lifecycle

Let us trace a single invocation end-to-end, from the moment the caller's HTTP request hits the API to the moment the response arrives back.

Figure 2 — Sequence diagram of a complete function invocation, showing both warm (cache hit) and cold start paths.

Step 1 — API enqueues the job

The API handler at POST /mercio/:id serialises the full HTTP context — method, path, query string, headers, and body — into a BullMQ job payload and adds it to the mercio-invocations queue. It then callsQueueEvents.waitUntilFinished(jobId, 30_000), which subscribes to a Redis Pub/Sub channel and blocks the request handler until the job emits acompleted event or the 30-second timeout fires.

Step 2 — Runtime dequeues and routes

The BullMQ Worker inside mercio-runtime pulls the job off the queue (up to 8 concurrent jobs at once) and calls ensure(functionId)on the worker pool. If the function is already warm the call returns immediately with a port number. If it is cold, the pool starts a spawn sequence (described in detail below).

Step 3 — HTTP proxy to workerd

Once a port is known, the runtime issues a standardfetch to http://127.0.0.1:{port}{path}, forwarding all headers and the request body. workerd receives this, executes the user's handler, and returns a standard HTTP response. The runtime reads the response status, headers, and body, then completes the BullMQ job with those values as the return value.

Step 4 — API unblocks and responds

The waitUntilFinished call in the API resolves with the job return value, which is forwarded directly as the HTTP response to the original caller. The entire round trip — excluding cold start — is typically under 10 ms of overhead on top of the user function's own execution time.

workerd: Cloudflare's V8 Sandbox

workerd is Cloudflare's open-source serverless runtime — the same engine that powers Cloudflare Workers in production. It exposes each function as an isolated V8 context inside a separate OS process, preventing cross-function memory sharing entirely. Key properties that made it the right choice for Mercy:

Process-level isolation — each function runs in its own OS process. A crash or OOM in one function cannot affect another.
Node.js compatibility flag — thenodejs_compat compatibility flag enables Node.js built-in shims (node:fs, node:crypto,node:path, etc.) inside the V8 context, so most npm packages work without extra polyfills.
HTTP socket binding — workerd opens a TCP socket and serves HTTP natively, making the proxy model trivial to implement.
npm distribution — the workerd binary ships as an npm package, so bun install in the Dockerfile automatically downloads the correct Linux binary for the target architecture.

Cap'n Proto Worker Configuration

workerd is configured via Cap'n Proto schema files (.capnp), not command-line flags or environment variables. The runtime generates one config file per function, stored at /var/mercio/{functionId}/config.capnp. Here is exactly what that file looks like:

config.capnp

using Workerd = import "/workerd/workerd.capnp";

const config :Workerd.Config = (
  services = [
    (name = "main", worker = .mercioWorker),
  ],
  sockets = [
    (
      name    = "http",
      address = "127.0.0.1:PORT",
      http    = (),
      service = "main",
    ),
  ],
);

const mercioWorker :Workerd.Worker = (
  modules = [
    (name = "worker", esModule = embed "worker.mjs"),
  ],
  compatibilityDate  = "2025-01-01",
  compatibilityFlags = ["nodejs_compat"],
);

The embed "worker.mjs" directive instructs workerd to read the bundle from the filesystem at startup. The port is determined at runtime by asking the OS for a free ephemeral port — the runtime calls a helper that binds then immediately closes a TCP socket to obtain an available port before writing the config.

The compatibilityDate pins behaviour to a specific date-based version of the workerd runtime. Newer features become available by advancing this date; the date chosen (2025-01-01) activates a broad set of stable Node.js compatibility shims.

Worker Pool: Warm vs Cold Starts

The key insight behind mercio-runtime's performance is that workerd processes are kept alive between requests. A cold start happens at most once per function per pool eviction cycle. Every subsequent request to the same function hits a warm process.

Figure 3 — State machine for a single workerd process managed by the pool.

Cold start path

When ensure(functionId) is called and no entry exists in the pool map, the runtime:

Checks the disk cache at /var/mercio/{functionId}/worker.mjs.
If absent, downloads the bundle from Cloudflare R2 and writes it to disk.
Generates the config.capnp file with a fresh ephemeral port.
Spawns a workerd child process via Bun.spawn.
Polls http://127.0.0.1:{port}/ in a tight loop until a non-error response arrives (waitUntilReady).
Stores {port, process, lastUsed} in the pool map.

This whole sequence takes roughly 150–400 ms depending on whether the disk cache is warm and how quickly workerd starts up. Every subsequent invocation skips all of this.

LRU eviction

The pool is bounded by MERCIO_POOL_MAX (default 20). When a new cold start would exceed this limit, the pool finds the entry with the oldestlastUsed timestamp, sends SIGTERM to its process, and removes it from the map. This is a synchronous eviction — no async waiting — so the new spawn can proceed immediately.

Idle timeout

A cleanup task runs every 60 seconds. It iterates the pool map and kills any process whose lastUsed timestamp is older than MERCIO_IDLE_TTL_MS (default 5 minutes). This reclaims memory for functions that haven't been invoked recently without waiting for the pool to fill up.

R2 Storage and Local Disk Cache

User function bundles live in Cloudflare R2 — an S3-compatible object store. The runtime uses the AWS SDK's S3Client with the R2 endpoint to download them. But downloading a bundle on every cold start would be wasteful; instead, bundles are cached on a Docker volume at /var/mercio.

Figure 4 — Bundle resolution: disk cache checked first, R2 fetched only on a miss.

The Docker volume (mercio_cache:/var/mercio) persists across container restarts. In practice this means that after the first cold start for a given function, subsequent cold starts — even after a runtime container restart — skip the R2 download entirely.

Cache invalidation happens at the application layer: when a user redeploys a function, the build pipeline uploads a new worker.mjs under the same key in R2 and the cached disk file is removed so the next cold start fetches the fresh bundle. The function ID embedded in the path acts as the cache key.

BullMQ Job Schema and Concurrency Model

The invocation job payload that flows through the queue carries everything the runtime needs to execute and log the request:

typescript

type InvocationJobData = {
  id: string          // function (mercio) ID
  method: string      // HTTP method
  path: string        // request path
  query: string       // raw query string
  headers: Record<string, string>
  body: string        // serialised request body
  runId?: string      // present when invoked via a Mercob scheduled job
  timeoutMs?: number  // per-invocation timeout override
}

The BullMQ Worker is configured with concurrency: 8, meaning up to eight jobs can be processed simultaneously within a single runtime instance. Each job runs as an independent async task; there is no shared mutable state between them (the pool map access is synchronous and never interleaved due to JavaScript's single-threaded event loop).

Figure 5 — Multiple concurrent jobs sharing warm workerd processes. Two jobs for fn-A route to the same process; workerd queues them internally.

workerd processes a given function's requests serially — it has a single-threaded V8 event loop per isolate. Concurrent calls to the same function will queue up inside the single workerd process for that function. If you need true parallel execution for a single function you would need to run multiple runtime instances (horizontal scaling).

Why BullMQ over direct HTTP?

The indirection through a job queue gives us several properties that a direct HTTP call from the API to the runtime would not:

Back-pressure — if all 8 worker slots are busy, new jobs queue in Redis rather than piling up as in-flight HTTP connections.
Retries — transient failures (workerd crash, network blip) are automatically retried by BullMQ without the caller noticing.
Observability — the queue's waiting/active/completed/failed counts are available in real time, making it easy to spot bottlenecks.
Horizontal scaling — adding more runtime instances is a matter of deploying another container; they all consume from the same queue without any co-ordination.

Job Logging and the runId Flow

When a Mercio function is invoked by a Mercob scheduled job, the invocation payload includes a runId. After the workerd execution completes, the runtime POSTs the execution result (status, headers, truncated body, and duration) back to the API at /internal/job-logs with a Bearer token (WORKER_SECRET) for authentication.

This is what populates the per-run history visible in the Mercy dashboard. Direct invocations (those without a runId) skip this step — the result is returned entirely through the BullMQ job return value and never persisted.

Graceful Shutdown

Docker sends SIGTERM when stopping a container. The runtime's entry point registers a handler:

index.ts

process.on('SIGINT', async () => {
  logger.info('mercio-runtime shutting down')
  await worker.close()  // drain BullMQ, no new jobs accepted
  process.exit(0)
})

worker.close() tells BullMQ to finish any currently executing jobs before resolving. workerd child processes receive SIGTERM implicitly when the parent Bun process exits. Because the docker-compose restart policy is unless-stopped, the runtime will restart automatically after any non-zero exit — but the graceful drain ensures in-flight invocations complete first.

Conclusion

mercio-runtime is a small service with a focused job: keep user functions warm and get invocations through with minimal latency. The design centres on a few deliberate choices — one workerd process per function rather than shared isolates, BullMQ for back-pressure and retries instead of direct HTTP, and a persistent disk cache so R2 is hit as rarely as possible.

Together these choices let the runtime stay simple (~300 lines of TypeScript across four files) while delivering sub-10ms overhead on warm invocations and full V8-level isolation between functions. As Mercy scales, the horizontal scaling story is equally simple: add more runtime containers and they all share the same Redis queue with zero configuration.

If you want to dig further into the source code, explore the Mercio documentation or browse the example functions in apps/mercio-runtime/example/ to see the handler contract in action.

← All posts