Engineering Notes

Deep dives into interesting problems I've solved. Each section covers the motivation and execution.

Trips: Camera to Browser

Motivation

I wanted an easy way to share trips with friends and family. A GoPro on the dash captures photos automatically, and my homelab turns them into a live feed they can follow along with - or replay later. Works for anything with GPS-tagged photos.

flowchart LR
    subgraph Capture
        GoPro[GoPro Hero]
        Queue[(SQLite)]
    end

    subgraph Process
        EXIF[EXIF + GPS]
        S3[(SeaweedFS)]
        NATS[(NATS JetStream)]
    end

    subgraph Deliver
        Proxy[imgproxy]
        CDN[Cloudflare CDN]
    end

    subgraph Display
        UI[trips.jomcgi.dev]
    end

    GoPro -->|27MP| Queue
    Queue --> EXIF
    EXIF -->|Images| S3
    EXIF -->|Events| NATS
    S3 --> Proxy
    Proxy --> CDN
    CDN --> UI
    NATS -->|WebSocket| UI

    style GoPro fill:#ff6b6b,stroke:#ff6b6b,color:#fff
    style Queue fill:#ffa502,stroke:#ffa502,color:#fff
    style EXIF fill:#ffd93d,stroke:#ffd93d,color:#000
    style S3 fill:#6bcb77,stroke:#6bcb77,color:#fff
    style NATS fill:#4d96ff,stroke:#4d96ff,color:#fff
    style Proxy fill:#9b59b6,stroke:#9b59b6,color:#fff
    style CDN fill:#e056fd,stroke:#e056fd,color:#fff
    style UI fill:#ff6b6b,stroke:#ff6b6b,color:#fff

View Live at trips.jomcgi.dev →

A. Capture

The GoPro shoots on interval while driving. A Python controller manages the camera over WiFi, handling connection drops and queueing downloads for later.

Async Camera

Python asyncio controller. GPS-triggered capture at configurable intervals. 27MP RAW with JPG fallback when storage is tight.

SQLite Queue

Persistent download queue survives restarts. Exponential backoff on WiFi drops. Resume exactly where we left off.

EXIF Extraction

Camera optics preserved: ISO, aperture, shutter speed, focal length. GPS coordinates embedded. Deterministic UUIDs from content hash.

B. Event Store

Trip points are events in NATS JetStream. The API replays the stream on startup to rebuild state. No database needed—just an append-only log.

Stream Replay

On startup, ephemeral consumer replays entire stream. Rebuilds in-memory cache from event history. ~200ms for 10k events.

Live Subscribe

After replay, durable consumer subscribes to new events. Cache stays current. Multiple API pods can subscribe without conflicts.

Tombstones

Deletions are events too. Tombstone message marks a point as deleted. Replay respects tombstones. No orphaned data.

C. Delivery

GoPro images are 27MB each. imgproxy generates thumbnails and display sizes on-the-fly. Cloudflare CDN caches everything at the edge—most requests never hit my homelab.

Deterministic Keys

UUID v5 from namespace + content hash. Same image always gets same key. Idempotent uploads. No duplicates ever.

imgproxy

On-the-fly resizing from SeaweedFS. /thumb/* → 300px, /display/* → 1920x1080. WebP/AVIF based on Accept header.

Cloudflare CDN

Immutable cache-control headers. Content-addressed keys mean cache invalidation is never needed. Edge-cached globally.

D. Display

The web interface at trips.jomcgi.dev shows the route on a map, photos by day, elevation profiles, and trip statistics. WebSocket connection for live updates during active trips.

MapLibre

Vector tiles with terrain hillshade. Route colored by day with offset calculations for overlapping paths. Smooth animations between points.

Live Updates

WebSocket broadcasts new points as they arrive. Viewer count tracking. Follow along in real-time during active trips.

Day-by-Day

Distance sparklines, elevation profiles, photo galleries per day. Rainbow route coloring shows progression through the journey.

Ships: Real-Time AIS Vessel Tracking

Motivation

Living near the coast, I wanted to see what ships are passing by in real-time. AIS (Automatic Identification System) data is publicly broadcast by vessels, but there's no simple way to visualize it locally. I built a pipeline that streams AIS data through my cluster to a live map.

flowchart LR
    subgraph Ingest
        AIS[AISStream.io]
        Svc[ais-ingest]
    end

    subgraph Store
        NATS[(NATS JetStream)]
    end

    subgraph Serve
        API[ships-api]
        WS[WebSocket]
    end

    subgraph Display
        UI[jomcgi.dev/app/ships]
        Map[MapLibre]
    end

    AIS -->|WebSocket| Svc
    Svc -->|Publish| NATS
    NATS -->|Subscribe| API
    API --> WS
    WS --> UI
    UI --> Map

    style AIS fill:#ff6b6b,stroke:#ff6b6b,color:#fff
    style Svc fill:#ffa502,stroke:#ffa502,color:#fff
    style NATS fill:#ffd93d,stroke:#ffd93d,color:#000
    style API fill:#6bcb77,stroke:#6bcb77,color:#fff
    style WS fill:#4d96ff,stroke:#4d96ff,color:#fff
    style UI fill:#9b59b6,stroke:#9b59b6,color:#fff
    style Map fill:#e056fd,stroke:#e056fd,color:#fff

View Live at jomcgi.dev/app/ships →

Execution

AIS Ingest

Python service connects to AISStream.io WebSocket. Filters for Pacific Northwest bounding box. Publishes position reports to NATS JetStream.

Event Sourcing

Same pattern as Trips: NATS JetStream as the source of truth. API replays stream on startup to rebuild vessel state. No database needed.

Ships API

Python service with REST endpoints and WebSocket streaming. SQLite for position history. Position deduplication reduces noise from stationary vessels.

MapLibre UI

React frontend with MapLibre GL. Vessels rendered as directional arrows based on heading. Click for details: ship type, speed, course, destination.

Sextant: Type-Safe Operator State Machines

Motivation

Kubernetes operators are state machines, but we write them as imperative reconciliation loops. Every operator I wrote had the same bugs: invalid state transitions, forgotten error handling, missing metrics. I wanted to define the state machine declaratively and generate the boilerplate.

flowchart LR
    YAML[YAML Schema]
    Parse[Parse & Validate]
    Gen[Code Generator]
    Types[types.go]
    Trans[transitions.go]
    Metrics[metrics.go]
    Status[status.go]

    YAML --> Parse --> Gen
    Gen --> Types
    Gen --> Trans
    Gen --> Metrics
    Gen --> Status

    style YAML fill:#ff6b6b,stroke:#ff6b6b,color:#fff
    style Parse fill:#ffa502,stroke:#ffa502,color:#fff
    style Gen fill:#ffd93d,stroke:#ffd93d,color:#000
    style Types fill:#6bcb77,stroke:#6bcb77,color:#fff
    style Trans fill:#4d96ff,stroke:#4d96ff,color:#fff
    style Metrics fill:#9b59b6,stroke:#9b59b6,color:#fff
    style Status fill:#e056fd,stroke:#e056fd,color:#fff

Execution

Compile-Time Safety

Each state is a Go struct. Transitions return the next state type. Try to go from Pending to Ready without passing through Creating? Compiler error.

Forced Idempotency

Transition methods require request IDs. You can't transition without the ID, which forces you to call the external API first.

Guard Conditions

Go expressions embedded in YAML, evaluated at transition time. Invalid expressions fail at compile time, not runtime.

Generated Metrics

Prometheus counters, histograms, gauges. state_duration_seconds for SLOs. Automatic cleanup on resource deletion.

states:
  - name: Pending
    initial: true
  - name: Creating
    fields:
      requestID: string
  - name: Ready
    terminal: true

transitions:
  - from: Pending
    to: Creating
    action: StartCreation
    params:
      - requestID: string

Cloudflare Operator

Motivation

Every new service meant clicking through the Cloudflare dashboard: create DNS record, create Zero Trust application, update tunnel config. I wanted to annotate a Deployment and have everything provisioned automatically. Zero Trust ingress without the toil.

flowchart LR
    Deploy[Deployment]
    Watch[Operator Watch]
    DNS[DNS Record]
    ZT[Zero Trust App]
    Config[Tunnel Config]
    CF[(Cloudflare)]

    Deploy -->|Annotations| Watch
    Watch --> DNS
    Watch --> ZT
    Watch --> Config
    DNS --> CF
    ZT --> CF
    Config --> CF

    style Deploy fill:#ff6b6b,stroke:#ff6b6b,color:#fff
    style Watch fill:#ffa502,stroke:#ffa502,color:#fff
    style DNS fill:#ffd93d,stroke:#ffd93d,color:#000
    style ZT fill:#6bcb77,stroke:#6bcb77,color:#fff
    style Config fill:#4d96ff,stroke:#4d96ff,color:#fff
    style CF fill:#9b59b6,stroke:#9b59b6,color:#fff

Execution

Annotation-Driven

cloudflare.ingress.hostname and cloudflare.zero-trust.policy annotations trigger reconciliation. No CRDs to manage.

State Machine

Built with Sextant. States: Pending → CreatingDNS → CreatingZTApp → UpdatingConfig → Ready. Each step idempotent.

Finalizers

Cleanup on Deployment deletion. DNS records, ZT apps, and tunnel routes removed. No orphaned Cloudflare resources.

Drift Detection

Periodic reconciliation detects manual Cloudflare changes. Operator is source of truth. Dashboard edits get reverted.

metadata:
  annotations:
    cloudflare.ingress.hostname: myapp.jomcgi.dev
    cloudflare.zero-trust.policy: joe-only

Stargazer: Dark Sky Location Finder

Motivation

Finding good stargazing spots requires combining multiple data sources: light pollution maps, road access, elevation for horizon clearance, and weather forecasts. I built a pipeline that scores locations based on all these factors and updates continuously.

flowchart TB
    subgraph Acquire
        LP[Light Pollution Atlas]
        Roads[OSM Road Network]
        Elev[SRTM Elevation]
        Weather[MET Norway API]
    end

    subgraph Process
        Dark[Dark Region Extract]
        Buffer[Road Buffering]
        Zones[Zone Classification]
    end

    subgraph Score
        Cloud[Cloud Cover]
        Humid[Humidity]
        Wind[Wind Speed]
        Final[Final Score]
    end

    LP --> Dark
    Roads --> Buffer
    Elev --> Zones
    Dark --> Final
    Buffer --> Final
    Zones --> Final
    Weather --> Cloud --> Final
    Weather --> Humid --> Final
    Weather --> Wind --> Final

    style LP fill:#ff6b6b,stroke:#ff6b6b,color:#fff
    style Roads fill:#ffa502,stroke:#ffa502,color:#fff
    style Elev fill:#ffd93d,stroke:#ffd93d,color:#000
    style Weather fill:#6bcb77,stroke:#6bcb77,color:#fff
    style Dark fill:#4d96ff,stroke:#4d96ff,color:#fff
    style Buffer fill:#9b59b6,stroke:#9b59b6,color:#fff
    style Zones fill:#e056fd,stroke:#e056fd,color:#fff
    style Cloud fill:#ff6b6b,stroke:#ff6b6b,color:#fff
    style Humid fill:#ffa502,stroke:#ffa502,color:#fff
    style Wind fill:#ffd93d,stroke:#ffd93d,color:#000
    style Final fill:#6bcb77,stroke:#6bcb77,color:#fff

Execution

16-Task DAG

Parallel acquisition of light pollution atlas, OSM roads, SRTM elevation, and MET Norway weather. Tasks scheduled by dependency.

Spatial Analysis

Dark region extraction from light pollution raster. Road network buffering for accessibility. Zone classification by sky quality.

Weather Scoring

Cloud cover, humidity, fog probability, wind speed, dew point. Configurable weights. Updated hourly from forecast API.

Final Score

Composite score 0-100. Factors: darkness (40%), accessibility (20%), horizon (15%), weather (25%). Filterable by threshold.

Bazel: One Way to Build Everything

Motivation

I got tired of using different build commands for every project. I wanted one system that works the same everywhere—laptop, CI, Claude Code in the cluster. Everything's vendored, so there's nothing to install beyond Bazel itself.

flowchart LR
    subgraph Anywhere
        Laptop[Laptop]
        CI[CI]
        Claude[Claude Code]
    end

    Fmt[format]

    subgraph Build
        Code[Formatters]
        Helm[Manifests]
        OCI[Images]
    end

    Cache[(BuildBuddy)]

    subgraph Output
        Git[Git]
        Reg[Registry]
    end

    Laptop --> Fmt
    CI --> Fmt
    Claude --> Fmt
    Fmt --> Code
    Fmt --> Helm
    Fmt --> OCI
    Code --> Cache
    Helm --> Cache
    OCI --> Cache
    Cache --> Git
    Cache --> Reg

    style Laptop fill:#ff6b6b,stroke:#ff6b6b,color:#fff
    style CI fill:#ffa502,stroke:#ffa502,color:#fff
    style Claude fill:#ffd93d,stroke:#ffd93d,color:#000
    style Fmt fill:#6bcb77,stroke:#6bcb77,color:#fff
    style Code fill:#4d96ff,stroke:#4d96ff,color:#fff
    style Helm fill:#4d96ff,stroke:#4d96ff,color:#fff
    style OCI fill:#4d96ff,stroke:#4d96ff,color:#fff
    style Cache fill:#9b59b6,stroke:#9b59b6,color:#fff
    style Git fill:#e056fd,stroke:#e056fd,color:#fff
    style Reg fill:#e056fd,stroke:#e056fd,color:#fff

Execution

format

One command for formatters, manifests, lock files. Runs in parallel, seconds if nothing changed.

Custom Rules

Starlark for Go, Python, APKO images. Few lines per service, multi-platform containers out.

Vendored Tools

Helm, crane, ruff, shellcheck—pinned in one lock file. Nothing to install, works anywhere Bazel runs.

BuildBuddy

Remote cache so unchanged code doesn't rebuild. 80 cores on free tier, CI under a minute.

Custom Starlark Rulesets

Three rulesets that extend Bazel into domains it doesn't cover out of the box. Each includes a Gazelle extension for automatic BUILD file generation.

rules_helm

Lint, template, package, and OCI-push Helm charts. Includes an ArgoCD application macro that wires up manifests, image updater, and Semgrep scanning per overlay.

rules_semgrep

Hermetic Semgrep Pro scanning. Vendors the OCaml engine as OCI artifacts, bypasses the Python wrapper. Three rule types: source files, Helm manifests, transitive targets via aspect.

rules_wrangler

Cloudflare Pages deployment via Wrangler. Builds static sites and pushes to Cloudflare in one target.

GitOps Manifests

Helm charts render through Bazel. Output goes to the source tree so Git tracks it. PR diffs show exactly what's changing in the cluster.

Cached

Each chart is a genrule. Unchanged charts skip. 20+ services render in parallel.

In Git

Manifests committed, not generated at deploy. I review what's going to the cluster before it goes.

Multi-Platform Images

arm64 on my laptop, amd64 in CI. Same rules build both and push a multi-platform index.

APKO

Alpine images from YAML. Lock files pin versions. Small, fast.

One Target

Define once, Bazel handles platform transitions and index creation.

rules_semgrep: Hermetic Static & Supply Chain Analysis

Motivation

Semgrep on managed CI took 2+ minutes per diff scan, 5+ minutes for full scans. Rule registry fetches made results non-deterministic. I needed scans that run in seconds, produce identical results from identical inputs, and only re-run when something actually changes. Bazel's content-addressed cache gives all three — but Semgrep had no Bazel integration.

flowchart TD
    subgraph Daily["Daily Update"]
        PyPI[PyPI Wheels]
        API[Semgrep API]
        Extract[Extract semgrep-core]
        GHCR[GHCR]
    end

    subgraph Bazel["Bazel Analysis"]
        OCI[oci_archive]
        Engine[Engine Binary]
    end

    subgraph Test["Bazel Test"]
        Core[semgrep-core]
        Srcs[Source Files]
        Rules[Rule YAML]
        Lockfiles[Lockfiles]
        Result[Pass / Fail]
    end

    PyPI --> Extract --> GHCR
    API --> GHCR
    GHCR -->|digest| OCI --> Engine
    Engine --> Core
    Srcs --> Core
    Rules --> Core
    Lockfiles -->|SCA| Core
    Core -->|cached| Result

    style PyPI fill:#ff6b6b,stroke:#ff6b6b,color:#fff
    style API fill:#ff6b6b,stroke:#ff6b6b,color:#fff
    style Extract fill:#ffa502,stroke:#ffa502,color:#fff
    style GHCR fill:#ffd93d,stroke:#ffd93d,color:#000
    style OCI fill:#6bcb77,stroke:#6bcb77,color:#fff
    style Engine fill:#6bcb77,stroke:#6bcb77,color:#fff
    style Core fill:#4d96ff,stroke:#4d96ff,color:#fff
    style Srcs fill:#4d96ff,stroke:#4d96ff,color:#fff
    style Rules fill:#4d96ff,stroke:#4d96ff,color:#fff
    style Result fill:#9b59b6,stroke:#9b59b6,color:#fff

How It Works

No Python

Extracts the semgrep-core OCaml binary from PyPI wheels and vendors it as an OCI artifact on GHCR. Bypasses the Python wrapper entirely — no pip install, no 2-4s startup tax.

Digest-Pinned

Engine binaries and Pro rule packs are pinned to sha256 digests. A daily GitHub Action updates digests and opens a PR. Same inputs, same results, every time.

Three Rules

semgrep_test for source files, semgrep_manifest_test for Helm-rendered YAML, semgrep_target_test for transitive deps via aspect. All support optional SCA lockfile scanning. Gazelle auto-generates all of them.

Pro Analysis

Cross-file taint tracking with --pro. Degrades gracefully — missing credentials mean SKIP, not FAIL. Local dev works even without GHCR access.

Supply Chain

SCA lockfile scanning detects CVEs in third-party dependencies. With Pro reachability, traces whether vulnerable code paths are actually invoked. Gazelle auto-detects lockfiles from @pip//@npm// dep prefixes — zero config.

Results

Cached Diff

30 seconds. Down from 2+ minutes on managed infrastructure.

New Rules

50 seconds. Cache invalidated only for affected targets.

Cold Cache

4 minutes for all tests, all images, all scans on a fresh runner.

Self-Hosted AI Stack

Motivation

Running local inference means zero API costs for routine tasks and no data leaving the cluster. A 4090 runs Hermes 4.3-36B comfortably, and a knowledge graph turns RSS feeds into semantic search that Claude Code can query via MCP.

flowchart LR
    subgraph Inference
        LLM[llama-cpp]
        GPU[4090 GPU]
    end

    subgraph Knowledge
        KG[Knowledge Graph]
        RSS[RSS Feeds]
        S3[(SeaweedFS)]
        Qdrant[(Qdrant)]
    end

    subgraph Consumers
        Claude[Claude Code]
        MCP[MCP Server]
    end

    GPU --> LLM
    RSS --> KG
    KG --> S3
    KG --> Qdrant
    KG --> LLM
    MCP --> KG
    Claude --> MCP
    Claude --> LLM

    style LLM fill:#ff6b6b,stroke:#ff6b6b,color:#fff
    style GPU fill:#ffa502,stroke:#ffa502,color:#fff
    style KG fill:#e056fd,stroke:#e056fd,color:#fff
    style RSS fill:#ffd93d,stroke:#ffd93d,color:#000
    style S3 fill:#ffd93d,stroke:#ffd93d,color:#000
    style Qdrant fill:#6bcb77,stroke:#6bcb77,color:#fff
    style Claude fill:#4d96ff,stroke:#4d96ff,color:#fff
    style MCP fill:#9b59b6,stroke:#9b59b6,color:#fff

Execution

llama-cpp

Hermes 4.3-36B IQ4_XS on a 4090. OpenAI-compatible API. 32k context, flash attention, quantized KV cache. Shared inference backend for all local AI services.

Knowledge Graph

RSS feeds stored in SeaweedFS. Embeddings generated into Qdrant vectors. MCP server exposes semantic search to Claude Code.

Network Isolation

Provider-based NetworkPolicies. Knowledge graph: internal only, no internet. llama-cpp: cluster-only, no external egress.

Hardening

Non-root UID 1000, read-only filesystem, dropped capabilities, seccomp profiles. Kyverno auto-injects OTEL sidecars for observability.

OCI Model Cache Operator

Motivation

HuggingFace models are huge and slow to download. I wanted to reference a model in a pod spec the same way you reference a container image—and have it just work. The operator caches models in an OCI registry and streams them to pods without touching disk.

flowchart LR
    subgraph Webhook
        Pod[Pod Create]
        Mutator[PodMutator]
        Rewrite[Rewrite Ref]
        Gate[Scheduling Gate]
    end

    subgraph Operator
        CR[ModelCache CR]
        Resolve[Resolve]
        Sync[Sync Job]
    end

    subgraph External
        HF[HuggingFace]
        OCI[(OCI Registry)]
    end

    Pod --> Mutator
    Mutator -->|HF API| Rewrite
    Rewrite --> Gate
    Mutator --> CR
    CR --> Resolve
    Resolve -->|hf2oci| Sync
    Sync -->|Stream| HF
    Sync -->|Push| OCI
    Sync -->|Ready| Gate

    style Pod fill:#ff6b6b,stroke:#ff6b6b,color:#fff
    style Mutator fill:#ffa502,stroke:#ffa502,color:#fff
    style Rewrite fill:#ffa502,stroke:#ffa502,color:#fff
    style Gate fill:#ffd93d,stroke:#ffd93d,color:#000
    style CR fill:#6bcb77,stroke:#6bcb77,color:#fff
    style Resolve fill:#4d96ff,stroke:#4d96ff,color:#fff
    style Sync fill:#9b59b6,stroke:#9b59b6,color:#fff
    style HF fill:#e056fd,stroke:#e056fd,color:#fff
    style OCI fill:#ff6b6b,stroke:#ff6b6b,color:#fff

Execution

PodMutator

Webhook intercepts pods with hf.co/ volume references. Calls cached HF API to resolve the OCI ref at admission time (pod spec is immutable after this point). Creates ModelCache CR and adds scheduling gate if model not yet synced.

State Machine

Built with Sextant. Pending → Resolving → Syncing → Ready, with Failed state. Guards distinguish permanent errors from transient failures for automatic retry.

hf2oci

Streams HuggingFace models to OCI layers. HTTP response → tar → io.Pipe → registry push. Zero disk I/O. Supports Safetensors and GGUF formats.

Smart Naming

HuggingFace baseModels API resolves derivative models to their base. Derivatives share the base repo path for OCI layer deduplication. TTL cache for API responses.

Admission-Time Resolution

Pod spec is immutable after admission. Webhook calls the cached HF API to compute the GHCR ref synchronously, ensuring the correct OCI path is baked into the pod before it's created.

Pod Ungating

Ready state triggers scheduling gate removal. Volume ref was already rewritten at admission time — pod schedules normally with the cached model available.

volumes:
  - name: model
    image:
      reference: hf.co/NousResearch/Hermes-3-Llama-3.1-8B-GGUF