← Back to dashboard

Infrastructure

How the benchmark suite is wired together

Overview

The benchmark compares three FHIR server implementations under identical workload, hardware, and database conditions. Each test run boots the full stack via Docker Compose, runs k6 scenarios sequentially against each server, and ships metrics to Prometheus.

FHIR servers under test

Aidbox

healthsamurai/aidboxone:edge

JVM, basic auth, 8 vCPU / 20 GB RAM

HAPI FHIR

hapiproject/hapi:latest

JVM, basic auth, 8 vCPU / 20 GB RAM

Medplum

medplum/medplum-server

Node.js, OAuth2, 8 replicas × 2 vCPU / 3 GB

Architecture diagram

MONITORINGfetch dataset
FHIR HTTPSQLCacheMetrics scrape / remote-write

Click any block to view the actual docker-compose snippet and config files for that service.

Monitoring stack

  • Prometheus — receives k6 metrics via remote-write and scrapes exporters.
  • Grafana — provisioned dashboards with anonymous Admin access.
  • cAdvisor — per-container CPU / memory / IO metrics.
  • postgres-exporter — PostgreSQL internals.
  • OTel Collector — receives OpenTelemetry metrics from Medplum (which has no native Prometheus endpoint) and re-exposes them via a Prometheus exporter that Prometheus scrapes.

CI execution

Drone CI runs the full suite on every commit on a dedicated bare-metal node . Docker-in-Docker hosts the stack; persistent host volumes cache image layers between builds. Search-test datasets are stored on ZFS for fast snapshot-based resets.

Each run is tagged with runid,commit, andbranch labels, then snapshotted into a JSON report and uploaded to gs://samurai-public/fhir-server-performance-benchmark/. The dashboard reads those snapshots directly from GCS.