An Overview of Loki, My Favorite Personal Project

Stephen Castle2 months ago

I want to take a break from the tutorials today and talk about a personal project I've been working on for a few years now called Loki. Loki is the internal code name for a codebase that produces two related products. The first is Lowkey Media Viewer, a free, minimalist Electron desktop app for viewing and curating images, video, audio, and comic book archives. The second is Lowkey Media Server, a companion Go HTTP server with a job queue and a web UI that does long-running things like auto-tagging with ONNX models, Whisper transcription, FFmpeg conversion, and Ollama-powered image descriptions.

It started as just the viewer. I wanted something that could open a directory full of media files and let me flip through them without a bunch of cloud sync, library imports, or other ceremony getting in the way. Over time it grew tags, categories, a command palette, an ELO-rating "battle mode" for ranking pairs of files, transcript-based video navigation, and a bunch of other curation features. Eventually I wanted to run some of the heavier processing on a different machine than the one I was browsing on, and the media server was born. They live in the same repo, and that turned out to be one of the most interesting design decisions in the project.

Let me walk through some of the more interesting technical choices and how the architecture is set up. If you want to follow along, the source is on GitHub and the binaries are at lowkeyviewer.com.

Two Products, One Renderer

The repo has roughly this shape.

loki/
├── src/                # Electron app
│   ├── main/           # Main process (Node)
│   └── renderer/       # React + XState SPA
├── media-server/       # Go module: HTTP API, job queue, web UI
│   ├── tasks/          # Self-registering task implementations
│   ├── jobqueue/       # SQLite-backed job + workflow DAG engine
│   ├── storage/        # Local + S3-compatible storage registry
│   ├── runners/        # Whisper, ONNX, Ollama, FFmpeg wrappers
│   └── loki-static/    # Built renderer embedded at compile time
└── docs/

The interesting part is src/renderer/. That React SPA is the renderer process for the Electron desktop app, but it's also what the Go server serves at /. Same code, two totally different runtime environments. The desktop app talks to native APIs through Electron IPC. The web app talks to the Go server over HTTP. Everything else — the components, the state machine, the routing — is identical.

The trick that makes this work is a thin abstraction layer called platform.ts. It detects which environment it's running in, and then exposes a single set of functions that the rest of the codebase can call without caring about the difference.

// platform.ts (excerpt)
export const isElectron =
  typeof window !== 'undefined' &&
  typeof (window as any).electron !== 'undefined';

export const capabilities = {
  fileSystemAccess: true,
  clipboard: isElectron,
  windowControls: isElectron,
  autoUpdate: isElectron,
  shutdown: isElectron,
};

Anywhere in the renderer that wants to call out to the host platform goes through this file. Calls like invoke('load-file-metadata', path) get routed either to Electron's IPC channel or to a corresponding HTTP endpoint via a small mapping table. Capabilities flags handle the parts that just don't exist on the web — there's no way to control a browser window's chrome from a web page the way you can from an Electron renderer, so capabilities.windowControls is false and the components that render minimize/maximize buttons check that flag before showing anything.

The reason this works so well is that the renderer never had to be rewritten. The Electron app was already organized around IPC calls, which feel a lot like HTTP requests once you squint. Adding the web target was mostly a matter of writing the channel-to-endpoint map and a small adapter for response shapes. Everything else just worked.

The Frontend State Machine

The renderer is a fairly complex SPA — there's a library view, a viewer view, sidebars, the command palette, the battle mode, modal dialogs, drag-and-drop, the works. To keep all of that coherent without descending into a tangle of useState calls, almost everything goes through a single XState machine in src/renderer/state.tsx. I picked XState early because the viewer has a lot of states that need to be strictly mutually exclusive — you can't be both initializing a library and rendering one at the same time, and the transitions between those states need to handle a long list of edge cases like "what if the user picked a directory but then the OS file picker errored," or "what if the database returned no results but we have a stored session to restore from."

A finite state machine handles those questions explicitly instead of letting you fall into impossible states by accident. The downside is that the machine is now a 3000 line file. That's a real tradeoff. It's not always easy to keep up with, but it's incredibly readable when you do go in there because every transition is named and every action is explicit. The size also tells you something honest about how much logic the app actually has, even if it doesn't look that complicated from the outside.

The rest of the frontend is pretty boring on purpose. React Query for server state. react-virtual for the big grids. fuse.js for fuzzy search in the command palette. react-dnd for drag and drop. None of it is groundbreaking, and that's exactly what I wanted — the interesting choices live in the state machine and the platform layer, not in the off-the-shelf pieces.

A Go Server for the Heavy Stuff

I picked Go for the media server for a few specific reasons. First, the work it does — running Whisper, running ONNX taggers, shelling out to FFmpeg, downloading from yt-dlp — is intrinsically about coordinating long-running subprocesses, and Go's goroutines and contexts are an extremely natural fit for that. Second, Go cross-compiles to a single binary that I can drop on a server or a friend's machine without making them install Node, set up dependencies, or care about my version pinning. Third, I just like writing Go. That's a real reason. This is a personal project.

The server has its own SQLite database, its own HTTP handlers, and three different main.go files gated by Go build tags.

//go:build windows
// +build windows

package main

There's also main_darwin.go and main_linux.go. They share most of their code through packages like tasks, jobqueue, and storage, but the per-OS files handle things like system tray integration and the platform-specific paths where binaries get installed. The build tag approach keeps OS-specific code at the top level where it's easy to find, instead of scattered behind runtime if runtime.GOOS == "windows" checks.

A SQLite-Backed Job Queue with Workflow DAGs

The most interesting piece of the server is probably the jobqueue package. Every long-running task — transcoding a video, tagging a folder of images, transcribing an audio file — is modeled as a Job. Jobs have a state (pending, in_progress, completed, cancelled, error), a command, arguments, input, a list of dependency job IDs, and stdout that gets streamed back to the client. The whole thing is persisted to SQLite so that jobs survive server restarts.

type Job struct {
  ID            string             `json:"id"`
  Command       string             `json:"command"`
  Arguments     []string           `json:"arguments"`
  Input         string             `json:"input"`
  Dependencies  []string           `json:"dependencies"`
  State         JobState           `json:"state"`
  Ctx           context.Context    `json:"-"`
  Cancel        context.CancelFunc `json:"-"`
  CreatedAt     time.Time          `json:"created_at"`
  ClaimedAt     time.Time          `json:"claimed_at"`
  CompletedAt   time.Time          `json:"completed_at"`
  // ...
}

The dependency field is what turns the queue into a DAG. When you submit a job, you can list other job IDs that have to finish first. A worker won't pick up a job until all of its dependencies are in the completed state, and if any of them error out, the queue cancels all of the still-pending dependents instead of running them with bad input. This is how workflows like "ingest these YouTube videos, then transcribe them, then auto-tag them, then move the files to long-term storage" get built up from small reusable pieces.

Workflows themselves are persisted in their own table so you can save them by name and replay them later. The shape of a workflow is just a list of nodes with edges between them — pretty much the simplest possible DAG you can imagine.

Self-Registering Tasks

The work that a job actually does is defined in the tasks package. Every task has the same simple signature.

type TaskFn func(j *jobqueue.Job, q *jobqueue.Queue, r *sync.Mutex) error

It takes the job, the queue (for pushing stdout back to clients and signaling completion), and a shared mutex. It returns an error. That's it. Here's the smallest one in the codebase, used for testing flow control and just sitting there for five seconds.

func waitFn(j *jobqueue.Job, q *jobqueue.Queue, mu *sync.Mutex) error {
  ctx := j.Ctx
  for i := 0; i < 5; i++ {
    select {
    case <-ctx.Done():
      q.PushJobStdout(j.ID, "Task was canceled")
      _ = q.CancelJob(j.ID)
      return ctx.Err()
    case <-time.After(1 * time.Second):
      q.PushJobStdout(j.ID, "Waiting in task...")
    }
  }
  q.CompleteJob(j.ID)
  return nil
}

Notice how the function listens on j.Ctx.Done(). Every task gets a context that the queue can cancel from the outside. If a user clicks "cancel" in the web UI, the context is cancelled, the task notices, cleans up, and returns. This makes cancellation pleasant instead of treacherous.

Tasks register themselves at startup in a single init() function in registry.go.

func init() {
  RegisterTask("wait", "Wait", nil, waitFn)
  RegisterTask("remove", "Remove Media", nil, removeFromDB)
  RegisterTask("autotag", "Auto Tag (ONNX)", nil, autotagTask)
  RegisterTask("metadata", "Generate Metadata", metadataOptions, metadataTask)
  RegisterTask("hls", "HLS Transcode", hlsOptions, hlsTask)
  RegisterTask("ingest", "Ingest Media Files", ingestOptions, ingestTask)
  RegisterTask("ffmpeg-thumbnail", "FFmpeg Thumbnail", ffmpegThumbnailOptions, ffmpegThumbnailTask)
  // ...a few dozen more
}

Adding a new task is implementing the function, then adding one line to that init block. The HTTP handler and the web UI both pick it up automatically because they iterate over the registry. That kind of low-friction extension point is the thing I'm proudest of in the server — when I get an idea for a new processing step, I can have it running in a workflow in about ten minutes.

A Storage Registry, Not Hardcoded Paths

The other small piece I want to mention is the storage registry. Out of the box the server reads its media from one or more local directory roots that you point at with LOWKEY_ROOT_<N> environment variables. But it can also read from and write to S3-compatible buckets configured via a JSON array in LOWKEY_ROOTS. All of this is hidden behind a small storage.Registry type that tasks use to resolve paths. Nothing in the task code hardcodes filesystem paths or makes assumptions about being on local disk. That made it surprisingly easy to add the S3 support after the fact, and it'll keep being easy to bolt on new backends later.

Embedding the Renderer at Compile Time

One last detail. The Go binary embeds the entire built React SPA at compile time using Go's //go:embed directive.

//go:embed loki-static/**
var embeddedSPA embed.FS

The build script (npm run build:server) builds the renderer with webpack, copies it into media-server/loki-static/, then runs go build. The result is a single executable with everything inside it. No "make sure your public/ directory is in the right place" deployment story, no Docker volume mounts for static files. Just one binary. This is the part of writing Go that I miss the most when I'm working in any other ecosystem.

Where It's Going

The pieces I'm most excited about right now are workflow templates that users can share, better automatic library organization using LLMs (right now Ollama already powers per-image descriptions, but I want to push that further into structured metadata), and a smarter video transcript viewer that lets you jump around inside a long video by searching the spoken word. Behind the scenes a lot of cleanup work is happening on the storage layer to make multi-machine setups more pleasant — running the server on a fileserver while the viewer connects from a laptop is already possible, but it has some sharp edges I want to file down.

If any of that sounds interesting, the project is open source and contributions are welcome. And if you've used Lowkey Media Viewer for something cool I'd genuinely love to hear about it. There's a contact link on lowkeyviewer.com.