Overview
The benchmark compares three FHIR server implementations under identical workload, hardware, and database conditions. Each test run boots the full stack via Docker Compose, runs k6 scenarios sequentially against each server, and ships metrics to Prometheus.
FHIR servers under test
Aidbox
healthsamurai/aidboxone:edge
JVM, basic auth, 8 vCPU / 20 GB RAM
HAPI FHIR
hapiproject/hapi:latest
JVM, basic auth, 8 vCPU / 20 GB RAM
Medplum
medplum/medplum-server
Node.js, OAuth2, 8 replicas × 2 vCPU / 3 GB
Architecture diagram
Click any block to view the actual docker-compose snippet and config files for that service.
Monitoring stack
- Prometheus — receives k6 metrics via remote-write and scrapes exporters.
- Grafana — provisioned dashboards with anonymous Admin access.
- cAdvisor — per-container CPU / memory / IO metrics.
- postgres-exporter — PostgreSQL internals.
- OTel Collector — receives OpenTelemetry metrics from Medplum (which has no native Prometheus endpoint) and re-exposes them via a Prometheus exporter that Prometheus scrapes.
CI execution
Drone CI runs the full suite on every commit on a dedicated bare-metal node . Docker-in-Docker hosts the stack; persistent host volumes cache image layers between builds. Search-test datasets are stored on ZFS for fast snapshot-based resets.
Each run is tagged with runid,commit, andbranch labels, then snapshotted into a JSON report and uploaded to gs://samurai-public/fhir-server-performance-benchmark/. The dashboard reads those snapshots directly from GCS.