5,089,364events per second

Single node. Consumer hardware. Zero errors.

5,089,364events/second¹

16msp99 latency¹

0.00%error rate¹

1 nodeconsumer hardware²

Industry comparison

All competitor figures are from publicly documented benchmarks and official documentation. See source links for each.

Platform	Throughput	Hardware	Source
aacyn	5,089,364 evt/sec	1× mini PC (8C/16T)	This benchmark ↗
Datadog Agent	~200,000 metrics/sec	Cloud VM (4 vCPU)	Datadog docs ↗
Datadog Obs. Pipelines	~85,000 evt/sec	Multi-pod K8s	Datadog docs ↗
ClickHouse (logs)	~130,000 rows/sec	Single node (8C, 16GB)	GreptimeDB benchmark, 2024 ↗
Vector	~76 MiB/sec	File-to-TCP pipeline	vector.dev ↗

Apples-to-apples caveat: These are directionally valid but not identical workloads. Datadog measures DogStatsD metric intake (smaller payloads). ClickHouse measures row insertion via native protocol with full indexing. Vector measures byte throughput for data routing. aacyn measures FlatBuffer binary event ingestion into an in-memory columnar store via HTTP. Full comparison notes →

Binary vs JSON ingestion

Path B (FlatBuffers) bypasses JSON parsing entirely. The payload crosses the FFI boundary as a raw pointer — zero deserialization, zero V8 GC pressure.

Metric	JSON	Binary	Gain
Events/sec	314K	5.09M	16.2×
p95 Latency	218.79ms	12.73ms	17.2×
p99 Latency	—	16.12ms	—
Avg Latency	88.70ms	7.77ms	11.4×
Error Rate	0.00%	0.00%	—

AVX-512 scan performance

5 million events queried in under half a millisecond. AVX-512 processes 16 floats per CPU cycle over page-aligned columnar memory — no indexes, no hash lookups, no pointer chasing.

Scan Operation	Median	p99	Effective Rate
`scan_duration_max`	286μs	402μs	17.5B events/sec
`scan_error_count`	35μs	60μs	141.6B events/sec
`scan_duration_filter`	298μs	415μs	16.8B events/sec

All scans complete in < 415μs p99. The error count scan is particularly fast because is_errors is a uint8_t[] column that fits entirely in L2 cache.

Latency distribution

At 5 million events/second, every request from the 1st to the 99th percentile completes in under 17 milliseconds.

avg

7.77ms

p90

11.42ms

p95

12.73ms

p99

16.12ms

max

33.19ms

Hardware

No cloud cluster. No Kafka. One consumer-grade mini PC.

MachineMinisforum UM890 Pro

CPUAMD Ryzen 9 8945HS (8C/16T, Zen 4)

RAM32GB DDR5-5600

Storage1TB NVMe

OSUbuntu Server 24.04 (no GUI)

Methodology

We believe benchmark transparency is a prerequisite for trust. Here is exactly what we measured, how we measured it, and what we did not measure.

What was measured

End-to-end binary event ingestion through the full production stack. Each request sends a 1,656-byte pre-compiled FlatBuffer containing 100 events. The payload traverses HTTP parsing → Bun/Elysia routing → bun:ffi boundary → native C columnar store. Both k6 and the server run on the same machine — all traffic is localhost loopback, no network I/O is involved.

What was NOT measured

Disk I/O — The mmap store is memory-resident. Write-back is async.
Query latency — Only ingestion is benchmarked here.
Multi-node — Single node, no replication or sharding.
TLS — Plaintext localhost. Production TLS adds latency.
Network transfer — Load generator and server are co-located.
eBPF probe load — Probes attached but not generating traffic.

Load generator

Tool: k6 v0.55
VUs: 500 virtual users
Duration: 50s (10s ramp, 30s sustain, 10s drain)
Payload: Pre-compiled, sent from memory
Transport: HTTP/1.1 localhost loopback

Kernel tuning

net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65535
net.ipv4.ip_local_port_range = 1024 65535
net.ipv4.tcp_tw_reuse = 1
vm.max_map_count = 262144
CPU governor: performance