~/blog/mercio-runtime
Inside mercio-runtime: How Mercy Executes Serverless Functions
- serverless
- workerd
- v8
- bullmq
- architecture
Introduction
When you upload a JavaScript function to Mercy and hit its invoke URL, something has to actually run your code. That something is mercio-runtime — a lean, Bun-based service that manages a warm pool of workerd V8 isolates, routes HTTP invocations through BullMQ, and caches compiled bundles on local disk so cold starts happen at most once per function.
This post is a deep technical walk-through of its internals. We will cover the process lifecycle, the worker pool algorithm, how Cap'n Proto configuration is generated on the fly, why we chose BullMQ for the invocation layer, and how all of this fits together inside Docker. By the end you will understand exactly what happens — at every layer — between a caller sending an HTTP request and receiving a response.
System Architecture
Mercy's Mercio product spans four major services. Understanding wheremercio-runtime sits in that landscape is the right starting point.
The four services and their responsibilities:
- apps/api — Receives the incoming HTTP request, enqueues an invocation job, and blocks waiting for the result via
QueueEvents.waitUntilFinished. - Redis / BullMQ — Acts as the invocation broker. The
mercio-invocationsqueue decouples the API from the runtime and provides back-pressure, retry semantics, and observability. - apps/mercio-runtime — This service. Consumes invocation jobs, maintains the workerd pool, and forwards HTTP traffic to the correct isolate.
- apps/worker — A separate build pipeline that takes user-uploaded zip files, runs esbuild, and uploads the resulting
worker.mjsbundle to Cloudflare R2. The runtime never touches raw user code.
The Request Lifecycle
Let us trace a single invocation end-to-end, from the moment the caller's HTTP request hits the API to the moment the response arrives back.
Step 1 — API enqueues the job
The API handler at POST /mercio/:id serialises the full HTTP context — method, path, query string, headers, and body — into a BullMQ job payload and adds it to the mercio-invocations queue. It then callsQueueEvents.waitUntilFinished(jobId, 30_000), which subscribes to a Redis Pub/Sub channel and blocks the request handler until the job emits acompleted event or the 30-second timeout fires.
Step 2 — Runtime dequeues and routes
The BullMQ Worker inside mercio-runtime pulls the job off the queue (up to 8 concurrent jobs at once) and calls ensure(functionId)on the worker pool. If the function is already warm the call returns immediately with a port number. If it is cold, the pool starts a spawn sequence (described in detail below).
Step 3 — HTTP proxy to workerd
Once a port is known, the runtime issues a standardfetch to http://127.0.0.1:{port}{path}, forwarding all headers and the request body. workerd receives this, executes the user's handler, and returns a standard HTTP response. The runtime reads the response status, headers, and body, then completes the BullMQ job with those values as the return value.
Step 4 — API unblocks and responds
The waitUntilFinished call in the API resolves with the job return value, which is forwarded directly as the HTTP response to the original caller. The entire round trip — excluding cold start — is typically under 10 ms of overhead on top of the user function's own execution time.
workerd: Cloudflare's V8 Sandbox
workerd is Cloudflare's open-source serverless runtime — the same engine that powers Cloudflare Workers in production. It exposes each function as an isolated V8 context inside a separate OS process, preventing cross-function memory sharing entirely. Key properties that made it the right choice for Mercy:
- Process-level isolation — each function runs in its own OS process. A crash or OOM in one function cannot affect another.
- Node.js compatibility flag — the
nodejs_compatcompatibility flag enables Node.js built-in shims (node:fs,node:crypto,node:path, etc.) inside the V8 context, so most npm packages work without extra polyfills. - HTTP socket binding — workerd opens a TCP socket and serves HTTP natively, making the proxy model trivial to implement.
- npm distribution — the workerd binary ships as an npm package, so
bun installin the Dockerfile automatically downloads the correct Linux binary for the target architecture.
Cap'n Proto Worker Configuration
workerd is configured via Cap'n Proto schema files (.capnp), not command-line flags or environment variables. The runtime generates one config file per function, stored at /var/mercio/{functionId}/config.capnp. Here is exactly what that file looks like:
using Workerd = import "/workerd/workerd.capnp";
const config :Workerd.Config = (
services = [
(name = "main", worker = .mercioWorker),
],
sockets = [
(
name = "http",
address = "127.0.0.1:PORT",
http = (),
service = "main",
),
],
);
const mercioWorker :Workerd.Worker = (
modules = [
(name = "worker", esModule = embed "worker.mjs"),
],
compatibilityDate = "2025-01-01",
compatibilityFlags = ["nodejs_compat"],
);The embed "worker.mjs" directive instructs workerd to read the bundle from the filesystem at startup. The port is determined at runtime by asking the OS for a free ephemeral port — the runtime calls a helper that binds then immediately closes a TCP socket to obtain an available port before writing the config.
compatibilityDate pins behaviour to a specific date-based version of the workerd runtime. Newer features become available by advancing this date; the date chosen (2025-01-01) activates a broad set of stable Node.js compatibility shims.Worker Pool: Warm vs Cold Starts
The key insight behind mercio-runtime's performance is that workerd processes are kept alive between requests. A cold start happens at most once per function per pool eviction cycle. Every subsequent request to the same function hits a warm process.
Cold start path
When ensure(functionId) is called and no entry exists in the pool map, the runtime:
- Checks the disk cache at
/var/mercio/{functionId}/worker.mjs. - If absent, downloads the bundle from Cloudflare R2 and writes it to disk.
- Generates the
config.capnpfile with a fresh ephemeral port. - Spawns a workerd child process via
Bun.spawn. - Polls
http://127.0.0.1:{port}/in a tight loop until a non-error response arrives (waitUntilReady). - Stores
{port, process, lastUsed}in the pool map.
This whole sequence takes roughly 150–400 ms depending on whether the disk cache is warm and how quickly workerd starts up. Every subsequent invocation skips all of this.
LRU eviction
The pool is bounded by MERCIO_POOL_MAX (default 20). When a new cold start would exceed this limit, the pool finds the entry with the oldestlastUsed timestamp, sends SIGTERM to its process, and removes it from the map. This is a synchronous eviction — no async waiting — so the new spawn can proceed immediately.
Idle timeout
A cleanup task runs every 60 seconds. It iterates the pool map and kills any process whose lastUsed timestamp is older than MERCIO_IDLE_TTL_MS (default 5 minutes). This reclaims memory for functions that haven't been invoked recently without waiting for the pool to fill up.
R2 Storage and Local Disk Cache
User function bundles live in Cloudflare R2 — an S3-compatible object store. The runtime uses the AWS SDK's S3Client with the R2 endpoint to download them. But downloading a bundle on every cold start would be wasteful; instead, bundles are cached on a Docker volume at /var/mercio.
The Docker volume (mercio_cache:/var/mercio) persists across container restarts. In practice this means that after the first cold start for a given function, subsequent cold starts — even after a runtime container restart — skip the R2 download entirely.
Cache invalidation happens at the application layer: when a user redeploys a function, the build pipeline uploads a new worker.mjs under the same key in R2 and the cached disk file is removed so the next cold start fetches the fresh bundle. The function ID embedded in the path acts as the cache key.
BullMQ Job Schema and Concurrency Model
The invocation job payload that flows through the queue carries everything the runtime needs to execute and log the request:
type InvocationJobData = {
id: string // function (mercio) ID
method: string // HTTP method
path: string // request path
query: string // raw query string
headers: Record<string, string>
body: string // serialised request body
runId?: string // present when invoked via a Mercob scheduled job
timeoutMs?: number // per-invocation timeout override
}The BullMQ Worker is configured with concurrency: 8, meaning up to eight jobs can be processed simultaneously within a single runtime instance. Each job runs as an independent async task; there is no shared mutable state between them (the pool map access is synchronous and never interleaved due to JavaScript's single-threaded event loop).
Why BullMQ over direct HTTP?
The indirection through a job queue gives us several properties that a direct HTTP call from the API to the runtime would not:
- Back-pressure — if all 8 worker slots are busy, new jobs queue in Redis rather than piling up as in-flight HTTP connections.
- Retries — transient failures (workerd crash, network blip) are automatically retried by BullMQ without the caller noticing.
- Observability — the queue's waiting/active/completed/failed counts are available in real time, making it easy to spot bottlenecks.
- Horizontal scaling — adding more runtime instances is a matter of deploying another container; they all consume from the same queue without any co-ordination.
Job Logging and the runId Flow
When a Mercio function is invoked by a Mercob scheduled job, the invocation payload includes a runId. After the workerd execution completes, the runtime POSTs the execution result (status, headers, truncated body, and duration) back to the API at /internal/job-logs with a Bearer token (WORKER_SECRET) for authentication.
This is what populates the per-run history visible in the Mercy dashboard. Direct invocations (those without a runId) skip this step — the result is returned entirely through the BullMQ job return value and never persisted.
Graceful Shutdown
Docker sends SIGTERM when stopping a container. The runtime's entry point registers a handler:
process.on('SIGINT', async () => {
logger.info('mercio-runtime shutting down')
await worker.close() // drain BullMQ, no new jobs accepted
process.exit(0)
})worker.close() tells BullMQ to finish any currently executing jobs before resolving. workerd child processes receive SIGTERM implicitly when the parent Bun process exits. Because the docker-compose restart policy is unless-stopped, the runtime will restart automatically after any non-zero exit — but the graceful drain ensures in-flight invocations complete first.
Conclusion
mercio-runtime is a small service with a focused job: keep user functions warm and get invocations through with minimal latency. The design centres on a few deliberate choices — one workerd process per function rather than shared isolates, BullMQ for back-pressure and retries instead of direct HTTP, and a persistent disk cache so R2 is hit as rarely as possible.
Together these choices let the runtime stay simple (~300 lines of TypeScript across four files) while delivering sub-10ms overhead on warm invocations and full V8-level isolation between functions. As Mercy scales, the horizontal scaling story is equally simple: add more runtime containers and they all share the same Redis queue with zero configuration.
If you want to dig further into the source code, explore the Mercio documentation or browse the example functions in apps/mercio-runtime/example/ to see the handler contract in action.