Engineering Notes

Deep dives into interesting problems I've solved. Each section covers the motivation and execution.

01

Trips: Camera to Browser

Motivation
I wanted an easy way to share trips with friends and family. A GoPro on the dash captures photos automatically, and my homelab turns them into a live feed they can follow along with - or replay later. Works for anything with GPS-tagged photos.
flowchart LR
    subgraph Capture
        GoPro[GoPro Hero]
        Queue[(SQLite)]
    end

    subgraph Process
        EXIF[EXIF + GPS]
        S3[(SeaweedFS)]
        NATS[(NATS JetStream)]
    end

    subgraph Deliver
        Proxy[imgproxy]
        CDN[Cloudflare CDN]
    end

    subgraph Display
        UI[trips.jomcgi.dev]
    end

    GoPro -->|27MP| Queue
    Queue --> EXIF
    EXIF -->|Images| S3
    EXIF -->|Events| NATS
    S3 --> Proxy
    Proxy --> CDN
    CDN --> UI
    NATS -->|WebSocket| UI

    style GoPro fill:#ff6b6b,stroke:#ff6b6b,color:#fff
    style Queue fill:#ffa502,stroke:#ffa502,color:#fff
    style EXIF fill:#ffd93d,stroke:#ffd93d,color:#000
    style S3 fill:#6bcb77,stroke:#6bcb77,color:#fff
    style NATS fill:#4d96ff,stroke:#4d96ff,color:#fff
    style Proxy fill:#9b59b6,stroke:#9b59b6,color:#fff
    style CDN fill:#e056fd,stroke:#e056fd,color:#fff
    style UI fill:#ff6b6b,stroke:#ff6b6b,color:#fff
View Live at trips.jomcgi.dev →
A. Capture

The GoPro shoots on interval while driving. A Python controller manages the camera over WiFi, handling connection drops and queueing downloads for later.

Async Camera
Python asyncio controller. GPS-triggered capture at configurable intervals. 27MP RAW with JPG fallback when storage is tight.
SQLite Queue
Persistent download queue survives restarts. Exponential backoff on WiFi drops. Resume exactly where we left off.
EXIF Extraction
Camera optics preserved: ISO, aperture, shutter speed, focal length. GPS coordinates embedded. Deterministic UUIDs from content hash.
B. Event Store

Trip points are events in NATS JetStream. The API replays the stream on startup to rebuild state. No database needed—just an append-only log.

Stream Replay
On startup, ephemeral consumer replays entire stream. Rebuilds in-memory cache from event history. ~200ms for 10k events.
Live Subscribe
After replay, durable consumer subscribes to new events. Cache stays current. Multiple API pods can subscribe without conflicts.
Tombstones
Deletions are events too. Tombstone message marks a point as deleted. Replay respects tombstones. No orphaned data.
C. Delivery

GoPro images are 27MB each. imgproxy generates thumbnails and display sizes on-the-fly. Cloudflare CDN caches everything at the edge—most requests never hit my homelab.

Deterministic Keys
UUID v5 from namespace + content hash. Same image always gets same key. Idempotent uploads. No duplicates ever.
imgproxy
On-the-fly resizing from SeaweedFS. /thumb/* → 300px, /display/* → 1920x1080. WebP/AVIF based on Accept header.
Cloudflare CDN
Immutable cache-control headers. Content-addressed keys mean cache invalidation is never needed. Edge-cached globally.
D. Display

The web interface at trips.jomcgi.dev shows the route on a map, photos by day, elevation profiles, and trip statistics. WebSocket connection for live updates during active trips.

MapLibre
Vector tiles with terrain hillshade. Route colored by day with offset calculations for overlapping paths. Smooth animations between points.
Live Updates
WebSocket broadcasts new points as they arrive. Viewer count tracking. Follow along in real-time during active trips.
Day-by-Day
Distance sparklines, elevation profiles, photo galleries per day. Rainbow route coloring shows progression through the journey.
02

Ships: Real-Time AIS Vessel Tracking

Motivation
Living near the coast, I wanted to see what ships are passing by in real-time. AIS (Automatic Identification System) data is publicly broadcast by vessels, but there's no simple way to visualize it locally. I built a pipeline that streams AIS data through my cluster to a live map.
flowchart LR
    subgraph Ingest
        AIS[AISStream.io]
        Svc[ais-ingest]
    end

    subgraph Store
        NATS[(NATS JetStream)]
    end

    subgraph Serve
        API[ships-api]
        WS[WebSocket]
    end

    subgraph Display
        UI[ships.jomcgi.dev]
        Map[MapLibre]
    end

    AIS -->|WebSocket| Svc
    Svc -->|Publish| NATS
    NATS -->|Subscribe| API
    API --> WS
    WS --> UI
    UI --> Map

    style AIS fill:#ff6b6b,stroke:#ff6b6b,color:#fff
    style Svc fill:#ffa502,stroke:#ffa502,color:#fff
    style NATS fill:#ffd93d,stroke:#ffd93d,color:#000
    style API fill:#6bcb77,stroke:#6bcb77,color:#fff
    style WS fill:#4d96ff,stroke:#4d96ff,color:#fff
    style UI fill:#9b59b6,stroke:#9b59b6,color:#fff
    style Map fill:#e056fd,stroke:#e056fd,color:#fff
View Live at ships.jomcgi.dev →
Execution
AIS Ingest
Python service connects to AISStream.io WebSocket. Filters for Pacific Northwest bounding box. Publishes position reports to NATS JetStream.
Event Sourcing
Same pattern as Trips: NATS JetStream as the source of truth. API replays stream on startup to rebuild vessel state. No database needed.
Ships API
Python service with REST endpoints and WebSocket streaming. SQLite for position history. Position deduplication reduces noise from stationary vessels.
MapLibre UI
React frontend with MapLibre GL. Vessels rendered as directional arrows based on heading. Click for details: ship type, speed, course, destination.
03

Sextant: Type-Safe Operator State Machines

Motivation
Kubernetes operators are state machines, but we write them as imperative reconciliation loops. Every operator I wrote had the same bugs: invalid state transitions, forgotten error handling, missing metrics. I wanted to define the state machine declaratively and generate the boilerplate.
flowchart LR
    YAML[YAML Schema]
    Parse[Parse & Validate]
    Gen[Code Generator]
    Types[types.go]
    Trans[transitions.go]
    Metrics[metrics.go]
    Status[status.go]

    YAML --> Parse --> Gen
    Gen --> Types
    Gen --> Trans
    Gen --> Metrics
    Gen --> Status

    style YAML fill:#ff6b6b,stroke:#ff6b6b,color:#fff
    style Parse fill:#ffa502,stroke:#ffa502,color:#fff
    style Gen fill:#ffd93d,stroke:#ffd93d,color:#000
    style Types fill:#6bcb77,stroke:#6bcb77,color:#fff
    style Trans fill:#4d96ff,stroke:#4d96ff,color:#fff
    style Metrics fill:#9b59b6,stroke:#9b59b6,color:#fff
    style Status fill:#e056fd,stroke:#e056fd,color:#fff
Execution
Compile-Time Safety
Each state is a Go struct. Transitions return the next state type. Try to go from Pending to Ready without passing through Creating? Compiler error.
Forced Idempotency
Transition methods require request IDs. You can't transition without the ID, which forces you to call the external API first.
Guard Conditions
Go expressions embedded in YAML, evaluated at transition time. Invalid expressions fail at compile time, not runtime.
Generated Metrics
Prometheus counters, histograms, gauges. state_duration_seconds for SLOs. Automatic cleanup on resource deletion.
states:
  - name: Pending
    initial: true
  - name: Creating
    fields:
      requestID: string
  - name: Ready
    terminal: true

transitions:
  - from: Pending
    to: Creating
    action: StartCreation
    params:
      - requestID: string
04

Cloudflare Operator

Motivation
Every new service meant clicking through the Cloudflare dashboard: create DNS record, create Zero Trust application, update tunnel config. I wanted to annotate a Deployment and have everything provisioned automatically. Zero Trust ingress without the toil.
flowchart LR
    Deploy[Deployment]
    Watch[Operator Watch]
    DNS[DNS Record]
    ZT[Zero Trust App]
    Config[Tunnel Config]
    CF[(Cloudflare)]

    Deploy -->|Annotations| Watch
    Watch --> DNS
    Watch --> ZT
    Watch --> Config
    DNS --> CF
    ZT --> CF
    Config --> CF

    style Deploy fill:#ff6b6b,stroke:#ff6b6b,color:#fff
    style Watch fill:#ffa502,stroke:#ffa502,color:#fff
    style DNS fill:#ffd93d,stroke:#ffd93d,color:#000
    style ZT fill:#6bcb77,stroke:#6bcb77,color:#fff
    style Config fill:#4d96ff,stroke:#4d96ff,color:#fff
    style CF fill:#9b59b6,stroke:#9b59b6,color:#fff
Execution
Annotation-Driven
cloudflare.ingress.hostname and cloudflare.zero-trust.policy annotations trigger reconciliation. No CRDs to manage.
State Machine
Built with Sextant. States: Pending → CreatingDNS → CreatingZTApp → UpdatingConfig → Ready. Each step idempotent.
Finalizers
Cleanup on Deployment deletion. DNS records, ZT apps, and tunnel routes removed. No orphaned Cloudflare resources.
Drift Detection
Periodic reconciliation detects manual Cloudflare changes. Operator is source of truth. Dashboard edits get reverted.
metadata:
  annotations:
    cloudflare.ingress.hostname: myapp.jomcgi.dev
    cloudflare.zero-trust.policy: joe-only
05

Stargazer: Dark Sky Location Finder

Motivation
Finding good stargazing spots requires combining multiple data sources: light pollution maps, road access, elevation for horizon clearance, and weather forecasts. I built a pipeline that scores locations based on all these factors and updates continuously.
flowchart TB
    subgraph Acquire
        LP[Light Pollution Atlas]
        Roads[OSM Road Network]
        Elev[SRTM Elevation]
        Weather[MET Norway API]
    end

    subgraph Process
        Dark[Dark Region Extract]
        Buffer[Road Buffering]
        Zones[Zone Classification]
    end

    subgraph Score
        Cloud[Cloud Cover]
        Humid[Humidity]
        Wind[Wind Speed]
        Final[Final Score]
    end

    LP --> Dark
    Roads --> Buffer
    Elev --> Zones
    Dark --> Final
    Buffer --> Final
    Zones --> Final
    Weather --> Cloud --> Final
    Weather --> Humid --> Final
    Weather --> Wind --> Final

    style LP fill:#ff6b6b,stroke:#ff6b6b,color:#fff
    style Roads fill:#ffa502,stroke:#ffa502,color:#fff
    style Elev fill:#ffd93d,stroke:#ffd93d,color:#000
    style Weather fill:#6bcb77,stroke:#6bcb77,color:#fff
    style Dark fill:#4d96ff,stroke:#4d96ff,color:#fff
    style Buffer fill:#9b59b6,stroke:#9b59b6,color:#fff
    style Zones fill:#e056fd,stroke:#e056fd,color:#fff
    style Cloud fill:#ff6b6b,stroke:#ff6b6b,color:#fff
    style Humid fill:#ffa502,stroke:#ffa502,color:#fff
    style Wind fill:#ffd93d,stroke:#ffd93d,color:#000
    style Final fill:#6bcb77,stroke:#6bcb77,color:#fff
Execution
16-Task DAG
Parallel acquisition of light pollution atlas, OSM roads, SRTM elevation, and MET Norway weather. Tasks scheduled by dependency.
Spatial Analysis
Dark region extraction from light pollution raster. Road network buffering for accessibility. Zone classification by sky quality.
Weather Scoring
Cloud cover, humidity, fog probability, wind speed, dew point. Configurable weights. Updated hourly from forecast API.
Final Score
Composite score 0-100. Factors: darkness (40%), accessibility (20%), horizon (15%), weather (25%). Filterable by threshold.
06

Bazel: One Way to Build Everything

Motivation
I got tired of using different build commands for every project. I wanted one system that works the same everywhere—laptop, CI, Claude Code in the cluster. Everything's vendored, so there's nothing to install beyond Bazel itself.
flowchart LR
    subgraph Anywhere
        Laptop[Laptop]
        CI[CI]
        Claude[Claude Code]
    end

    Fmt[format]

    subgraph Build
        Code[Formatters]
        Helm[Manifests]
        OCI[Images]
    end

    Cache[(BuildBuddy)]

    subgraph Output
        Git[Git]
        Reg[Registry]
    end

    Laptop --> Fmt
    CI --> Fmt
    Claude --> Fmt
    Fmt --> Code
    Fmt --> Helm
    Fmt --> OCI
    Code --> Cache
    Helm --> Cache
    OCI --> Cache
    Cache --> Git
    Cache --> Reg

    style Laptop fill:#ff6b6b,stroke:#ff6b6b,color:#fff
    style CI fill:#ffa502,stroke:#ffa502,color:#fff
    style Claude fill:#ffd93d,stroke:#ffd93d,color:#000
    style Fmt fill:#6bcb77,stroke:#6bcb77,color:#fff
    style Code fill:#4d96ff,stroke:#4d96ff,color:#fff
    style Helm fill:#4d96ff,stroke:#4d96ff,color:#fff
    style OCI fill:#4d96ff,stroke:#4d96ff,color:#fff
    style Cache fill:#9b59b6,stroke:#9b59b6,color:#fff
    style Git fill:#e056fd,stroke:#e056fd,color:#fff
    style Reg fill:#e056fd,stroke:#e056fd,color:#fff
Execution
format
One command for formatters, manifests, lock files. Runs in parallel, seconds if nothing changed.
Custom Rules
Starlark for Go, Python, APKO images. Few lines per service, multi-platform containers out.
Vendored Tools
Helm, crane, ruff, shellcheck—pinned in one lock file. Nothing to install, works anywhere Bazel runs.
BuildBuddy
Remote cache so unchanged code doesn't rebuild. 80 cores on free tier, CI under a minute.

Custom Starlark Rulesets

Three rulesets that extend Bazel into domains it doesn't cover out of the box. Each includes a Gazelle extension for automatic BUILD file generation.

rules_helm
Lint, template, package, and OCI-push Helm charts. Includes an ArgoCD application macro that wires up manifests, image updater, and Semgrep scanning per overlay.
rules_semgrep
Hermetic Semgrep Pro scanning. Vendors the OCaml engine as OCI artifacts, bypasses the Python wrapper. Three rule types: source files, Helm manifests, transitive targets via aspect.
rules_wrangler
Cloudflare Pages deployment via Wrangler. Builds static sites and pushes to Cloudflare in one target.

GitOps Manifests

Helm charts render through Bazel. Output goes to the source tree so Git tracks it. PR diffs show exactly what's changing in the cluster.

Cached
Each chart is a genrule. Unchanged charts skip. 20+ services render in parallel.
In Git
Manifests committed, not generated at deploy. I review what's going to the cluster before it goes.

Multi-Platform Images

arm64 on my laptop, amd64 in CI. Same rules build both and push a multi-platform index.

APKO
Alpine images from YAML. Lock files pin versions. Small, fast.
One Target
Define once, Bazel handles platform transitions and index creation.
07

rules_semgrep: Hermetic Static & Supply Chain Analysis

Motivation
Semgrep on managed CI took 2+ minutes per diff scan, 5+ minutes for full scans. Rule registry fetches made results non-deterministic. I needed scans that run in seconds, produce identical results from identical inputs, and only re-run when something actually changes. Bazel's content-addressed cache gives all three — but Semgrep had no Bazel integration.
flowchart TD
    subgraph Daily["Daily Update"]
        PyPI[PyPI Wheels]
        API[Semgrep API]
        Extract[Extract semgrep-core]
        GHCR[GHCR]
    end

    subgraph Bazel["Bazel Analysis"]
        OCI[oci_archive]
        Engine[Engine Binary]
    end

    subgraph Test["Bazel Test"]
        Core[semgrep-core]
        Srcs[Source Files]
        Rules[Rule YAML]
        Lockfiles[Lockfiles]
        Result[Pass / Fail]
    end

    PyPI --> Extract --> GHCR
    API --> GHCR
    GHCR -->|digest| OCI --> Engine
    Engine --> Core
    Srcs --> Core
    Rules --> Core
    Lockfiles -->|SCA| Core
    Core -->|cached| Result

    style PyPI fill:#ff6b6b,stroke:#ff6b6b,color:#fff
    style API fill:#ff6b6b,stroke:#ff6b6b,color:#fff
    style Extract fill:#ffa502,stroke:#ffa502,color:#fff
    style GHCR fill:#ffd93d,stroke:#ffd93d,color:#000
    style OCI fill:#6bcb77,stroke:#6bcb77,color:#fff
    style Engine fill:#6bcb77,stroke:#6bcb77,color:#fff
    style Core fill:#4d96ff,stroke:#4d96ff,color:#fff
    style Srcs fill:#4d96ff,stroke:#4d96ff,color:#fff
    style Rules fill:#4d96ff,stroke:#4d96ff,color:#fff
    style Result fill:#9b59b6,stroke:#9b59b6,color:#fff
How It Works
No Python
Extracts the semgrep-core OCaml binary from PyPI wheels and vendors it as an OCI artifact on GHCR. Bypasses the Python wrapper entirely — no pip install, no 2-4s startup tax.
Digest-Pinned
Engine binaries and Pro rule packs are pinned to sha256 digests. A daily GitHub Action updates digests and opens a PR. Same inputs, same results, every time.
Three Rules
semgrep_test for source files, semgrep_manifest_test for Helm-rendered YAML, semgrep_target_test for transitive deps via aspect. All support optional SCA lockfile scanning. Gazelle auto-generates all of them.
Pro Analysis
Cross-file taint tracking with --pro. Degrades gracefully — missing credentials mean SKIP, not FAIL. Local dev works even without GHCR access.
Supply Chain
SCA lockfile scanning detects CVEs in third-party dependencies. With Pro reachability, traces whether vulnerable code paths are actually invoked. Gazelle auto-detects lockfiles from @pip//@npm// dep prefixes — zero config.

Results

Cached Diff
30 seconds. Down from 2+ minutes on managed infrastructure.
New Rules
50 seconds. Cache invalidated only for affected targets.
Cold Cache
4 minutes for all tests, all images, all scans on a fresh runner.
08

Self-Hosted AI Stack

Motivation
Running local inference means zero API costs for routine tasks and no data leaving the cluster. A 4090 runs Hermes 4.3-36B comfortably, and a knowledge graph turns RSS feeds into semantic search that Claude Code can query via MCP.
flowchart LR
    subgraph Inference
        LLM[llama-cpp]
        GPU[4090 GPU]
    end

    subgraph Knowledge
        KG[Knowledge Graph]
        RSS[RSS Feeds]
        S3[(SeaweedFS)]
        Qdrant[(Qdrant)]
    end

    subgraph Consumers
        Claude[Claude Code]
        MCP[MCP Server]
    end

    GPU --> LLM
    RSS --> KG
    KG --> S3
    KG --> Qdrant
    KG --> LLM
    MCP --> KG
    Claude --> MCP
    Claude --> LLM

    style LLM fill:#ff6b6b,stroke:#ff6b6b,color:#fff
    style GPU fill:#ffa502,stroke:#ffa502,color:#fff
    style KG fill:#e056fd,stroke:#e056fd,color:#fff
    style RSS fill:#ffd93d,stroke:#ffd93d,color:#000
    style S3 fill:#ffd93d,stroke:#ffd93d,color:#000
    style Qdrant fill:#6bcb77,stroke:#6bcb77,color:#fff
    style Claude fill:#4d96ff,stroke:#4d96ff,color:#fff
    style MCP fill:#9b59b6,stroke:#9b59b6,color:#fff
Execution
llama-cpp
Hermes 4.3-36B IQ4_XS on a 4090. OpenAI-compatible API. 32k context, flash attention, quantized KV cache. Shared inference backend for all local AI services.
Knowledge Graph
RSS feeds stored in SeaweedFS. Embeddings generated into Qdrant vectors. MCP server exposes semantic search to Claude Code.
Network Isolation
Provider-based NetworkPolicies. Knowledge graph: internal only, no internet. llama-cpp: cluster-only, no external egress.
Hardening
Non-root UID 1000, read-only filesystem, dropped capabilities, seccomp profiles. Kyverno auto-injects OTEL sidecars for observability.
09

OCI Model Cache Operator

Motivation
HuggingFace models are huge and slow to download. I wanted to reference a model in a pod spec the same way you reference a container image—and have it just work. The operator caches models in an OCI registry and streams them to pods without touching disk.
flowchart LR
    subgraph Webhook
        Pod[Pod Create]
        Mutator[PodMutator]
        Rewrite[Rewrite Ref]
        Gate[Scheduling Gate]
    end

    subgraph Operator
        CR[ModelCache CR]
        Resolve[Resolve]
        Sync[Sync Job]
    end

    subgraph External
        HF[HuggingFace]
        OCI[(OCI Registry)]
    end

    Pod --> Mutator
    Mutator -->|HF API| Rewrite
    Rewrite --> Gate
    Mutator --> CR
    CR --> Resolve
    Resolve -->|hf2oci| Sync
    Sync -->|Stream| HF
    Sync -->|Push| OCI
    Sync -->|Ready| Gate

    style Pod fill:#ff6b6b,stroke:#ff6b6b,color:#fff
    style Mutator fill:#ffa502,stroke:#ffa502,color:#fff
    style Rewrite fill:#ffa502,stroke:#ffa502,color:#fff
    style Gate fill:#ffd93d,stroke:#ffd93d,color:#000
    style CR fill:#6bcb77,stroke:#6bcb77,color:#fff
    style Resolve fill:#4d96ff,stroke:#4d96ff,color:#fff
    style Sync fill:#9b59b6,stroke:#9b59b6,color:#fff
    style HF fill:#e056fd,stroke:#e056fd,color:#fff
    style OCI fill:#ff6b6b,stroke:#ff6b6b,color:#fff
Execution
PodMutator
Webhook intercepts pods with hf.co/ volume references. Calls cached HF API to resolve the OCI ref at admission time (pod spec is immutable after this point). Creates ModelCache CR and adds scheduling gate if model not yet synced.
State Machine
Built with Sextant. Pending → Resolving → Syncing → Ready, with Failed state. Guards distinguish permanent errors from transient failures for automatic retry.
hf2oci
Streams HuggingFace models to OCI layers. HTTP response → tar → io.Pipe → registry push. Zero disk I/O. Supports Safetensors and GGUF formats.
Smart Naming
HuggingFace baseModels API resolves derivative models to their base. Derivatives share the base repo path for OCI layer deduplication. TTL cache for API responses.
Admission-Time Resolution
Pod spec is immutable after admission. Webhook calls the cached HF API to compute the GHCR ref synchronously, ensuring the correct OCI path is baked into the pod before it's created.
Pod Ungating
Ready state triggers scheduling gate removal. Volume ref was already rewritten at admission time — pod schedules normally with the cached model available.
volumes:
  - name: model
    image:
      reference: hf.co/NousResearch/Hermes-3-Llama-3.1-8B-GGUF